My R & Data Science Story

21 June 2017

As a doctoral student in Applied Social & Community Psychology… and then investigating and dissecting the actual source code of any function that ever gave a warning or error message, my training, and consequently my future career prospects, has been quite varied in terms of the range and scope of substantive, methodological, and analytic topics and skills covered. There is a single constant across these sectors of my training, however: learning to code and realizing myself as an emerging data scientist more than, pretty much, anything else. I actually started learning to code several years ago as a freshman in college wanting to major in graphic design after having been introduced to the coded graphic design world (i.e., XML & graphic vectorization) and web-design (i.e., HTML(3) & CSS(3)). This initial toe-dip into the programming ocean was short-lived, however, as I soon learned that a major in graphic design, at least at the institution I was attending, (a) required an ability to draw (both with an actual pencil/pen on paper and with a computer mouse directed at a screen), at which I was then and will forever be only minimally adept, and (b) primarily consisted of learning to create graphics using graphical user interfaces. I wanted to create those interfaces and their content (using code) for others, not me, to use. I also found that world to be somewhat deprived the analytic depth I also sought in a career.… and then investigating and dissecting the actual source code of any function that ever gave a warning or error message I eventually migrated toward a dual major in English (Advanced Rhetoric & Composition) and Psychology for my Bachelors degree, which led me to studying violence-related topics, which led me to pursuing a career in violence prevention.

Then came 9:00am on the Friday of my exceedingly long first week as a graduate student. This particular morning was the first of many hair-pulling Friday mornings to come, as Fridays would be the designated weekly ‘lab day’ for the first of three advanced statistics and research methods courses everyone is required to pass in order to move forward in the PhD program. The stakes alone were enough to stress myself and my fellow cohort members. However, the faculty member teaching the first two courses in the required quantitative methods series decided to teach the courses using R as the applied statistical software of choice, rather than the, noxious, but traditional and unfortunately ubiquitous, SPSS. Within about a week, I was hooked and felt weirdly at home in that R, for me, represented some place to which I ultimately belonged (despite it being a language, not a place), yet I simultaneously felt like an awkward new student and like an elephant in a glass shop. Nonetheless, while most of my cohort members spent two academic quarters scratching their heads and hating every minute of time spent coding in R, I was excited to wake up everyday and learn something new in R. Since then, I’ve built an increasingly advanced R-programming skillset, which includes learning additional, more fundamental, programming languages such as C, C++, perl, and regular expressions.


In sum, the route of my data science journey thus far began with me as a novice R user by following my favorite “R coach”J. Steele’s instructions on what to type into the R-console without really thinking too much about the functionality of the code itself and instead focusing almost entirely on the output and statistical implications)]. From there, I traversed into becoming a novice R learner… and then investigating and dissecting the actual source code of any function that ever gave a warning or error message by building a keener sense of understanding how the output got to the console from the code, to a somewhat intermediate R learner after continuing to build my R skillset by studying the R-help documentation library for detailed information about pretty much every function I entered into R’s console… and then investigating and dissecting the actual source code of any function that ever gave a warning or error message, to becoming an intermediate-to-advanced R programmer.

At the latter junction, I found myself not only correcting errors in my own source code, but also de-bugging and/or optimizing the efficiency and usability of existing functions from packages published on CRAN, source or others’ GitHub repositories. I soon began writing my own R functions as I became increasingly comfortable and adept with the R language. The vast majority of the functions I wrote functions started out as little wrappers tailoring the default argument values of existing functions to fit my personal defaults (e.g., rendering HTML data tables using the {DT} package with my personal favorite format and consistently used argument values). These little wrappers started to become increasingly more complex until I got to a point where I was in fact writing entirely new functions to do entirely different tasks than the original functions, which I found myself only using as templates to get me started or as references for particular R-programming approaches/conventions that I needed for my new functions.


I am now in the process of developing an R package containing (hopefully useful) tools for mixed-methods research and data analysis.

Although it is very much still in the fledging stages of package development (e.g., I still have not decided what to call the package, hence the use of my own alias, ‘Riley’, for the current package name), the source code is available on Github

Other aspects of the package include tools for presenting data and analysis findings (and every step in between). These tools take advantage of the many, and growing, capabilities of Rmarkdown and knitr, and integrate styles inspired by Edward Tufte and the tufte R package.




     
Design by Rachel ("Riley") Smith-Hunter based on the Tufte CSS theme by Dave Liepmann & Edward Tufte
Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.