Functional programming and unit testing for data munging with R
Bruno Rodrigues
2017-12-28
Chapter 1 Why this book?
1.1 Important notice
This book is still being written, some chapters are not finished yet, and there might be (there are) some typos. Don’t hesitate to write to me if you notice something weird.
You can purchase a digital copy of this book at leanpub. The version on Leanpub will not always be up-to-date, I only update it when I made very big changes (new chapters, etc). But once this book will be finished, both version are going to be the same.
This book serves to show how functional programming and unit testing can be useful for the task of data munging. This book is not an in-depth guide to functional programming, nor unit testing with R. If you want to have an in-depth understanding of the concepts presented in these books, I can’t but recommend Wickham (2014a), Wickham (2015) and Wickham and Grolemund (2016) enough. Here, I will only briefly present functional programming, unit testing and building your own R packages. Just enough to get you (hopefully) interested and going.
This book is not an introduction to R either. I will assume that you have intermediate knowledge of R.
1.2 Motivation
Functional programming has very nice features that make working on data sets much more pleasant. It is common that you have to repeat the same instructions over and over again for different data sets that look very similar (for example, same, or similar column names). Of course, it is possible to loop over these data sets and repeat a set of instructions that change these data sets. However, we will see why a functional programming approach is to be preferred.
Unit testing then allows you to make sure that the functions you want to apply to your data sets actually do what you really want them to do. Knowing and applying these two concepts together will make you hopefully a better data analyst. Then we will learn to develop our own packages; not with the goal of publishing them in CRAN, but with the goal of making programming more streamlined.
1.3 Who am I?
I use R daily at my current job, and discovered R some years ago while I was at the University of Strasbourg. I’m not an R developer, and don’t have a CS background. Most, if not everything, that I know about R is self-taught. I hope however that you will find this book useful. You can follow me on twitter or check my blog.
1.4 Thanks
I’d like to thank Ross Ihaka and Robert Gentleman for developing the R programming language. Many thanks to Hadley Wickham for all the wonderful packages he developed that make R much more pleasant to use. Thanks to Yihui Yie for bookdown
without which this book would not exist (at least not in this very nice format).
Thanks to Hans-Martin von Gaudecker for introducing me to unit testing and writing elegant code. The PEP 8 style guidelines will forever remain etched in my brain.
Finally I have to thank my wife for putting up with my endless rants against people not using functional programming nor testing their code (or worse, using proprietary software!).
1.5 License
This book is licensed under the GNU Free Documentation License, version 1.3. A copy of the license is available on the repo, or you can read it online.
References
Wickham, Hadley. 2014a. Advanced R. CRC Press.
Wickham, Hadley. 2015. R Packages. 1st ed. O’Reilly. http://r-pkgs.had.co.nz/.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science. 1st ed. O’Reilly. http://r4ds.had.co.nz/.