Best Practices for Code Review: R Edition

A. What is Code Review?

Code reviews are traditionally done in the context of a software development team that is building out a new product or feature. The goal is to ensure that anything added to the common code base is free of bugs, follows established coding conventions, and is optimized. Code reviews are a practice that I first experienced after transitioning from working as a statistical analyst to a data scientist. One of the most important lessons I’ve learned over the past few years is that code reviews are critical for data science teams to ensure that good code and accurate analysis is being shipped. In this post, I will provide a review of practices that I’ve found most useful in my work leading code reviews. This will be specific to the R language as I work on a team where that is our primary language for performing analysis.

B. Why Conduct Code Reviews?

The primary stated benefit of code review in industry is improving software quality. By having small groups of colleagues review each others’ classes, functions, closures, and so forth on a regular basis, it will help ensure that the team writes elegant code, which in turn benefits the overall process or software that is being constructed. For data scientists and advanced analytics professionals, the rationale for conducting code reviews is similar. We want to write efficient code that contains sound logic and produces the appropriate output.

There are two other benefits to conducting code review that are worth mentioning.

  1. Consistent Design

Code review can help enforce a consistent coding style that makes the source code readable by a variety of members on the team. If different members on the data science team are following a single coding style, this will ensure that different parts of the project can be passed from one team member to another with greater fluidity. By emphasising a single coding style during the code review process, it will ensure consistent design and contribute to the maintainability and longevity of the code.

  1. Knowledge Sharing and Mentorship

Code reviews also allow colleagues to learn from one another and for junior folks to learn from more experienced team members. By allowing all team members to review others’ code, it allows employees at different experience levels to learn a lot by better comprehending the code. Furthermore, employees can also share new technologies and techniques with each other during the review process.

C. What Code Should be Reviewed?

As data scientists, we often write processes using R, Python, or other language where certain inputs are taken, a series of analysis is executed, and the desired results are generated. This type of process should generally be ‘automated’ and will be scheduled to run at particular times.

Consider the following R project. Let’s say that only one person is working on this, but they are part of a team of three data scientists.

The directory with R code contains the following files.

  1. basic_eda.R
    This file will just be a place where the employee takes the data set and does some exploratory data analysis. The goal is just to better understand the data through data visualization, simple regression models, and so forth. This file is really meant for running once or twice, and won’t part of the eventual pipeline.
  2. dataset_builder.R
    The dataset builder will eventually be part of a pipeline where a SQL query will be used to pull the raw data and construct the processed input data. This file will utilize user defined functions to undertake these actions and the goal is for this part of the process to be as abstracted as possible.
  3. execute_analysis.R
    This is the main execution file that runs the full analysis. It sources in the dataset builder and modeling functions, and conducts the desired process. This file will also need to contain a series of parameters that will determine filtering criteria and other parameters that dictate how the analysis will be run.
  4. helper_functions.R and modeling_functions.R
    The helper and modeling functions files contain user defined functions that are used at other parts of the analysis. These functions need to be fairly abstract and reusable code. The basic idea is that many tasks can be abstracted into a function or piece of code that can be reused regardless of the specific task.

So given the files that are available in this example project, what files should be evaluated during a code review process?

In general, we would never want to review four or five different files at a time. Instead, code review should be more targeted and the focus must be on the code that requires more prudent attention. The goal should be to review files or lines of code that contain complex logic and may benefit from a glance from other team members.

Given that guidance, the single file from the above example that we should never consider for code review is basic_eda.R. It’s just a simple file with procedural code and will only be run once or twice. The files that should receive attention during code review are dataset_builder.R and execute_analysis.R. These are the files with the bulk of the complex logic and so it would help to see if any issues are present in that code.

D. How Frequently Should Code Review be Performed?

I lead the code review process on the data science team at my current employer. If I were the manager, I’d push for the team to perform two hour code reviews every week on Thursday or Friday during which every member of the team would have their critical code reviewed. Currently, we don’t do that, and code reviews occur on an as needed basis. This “works” to some extent, but the frequency of code reviews will be dictated by how much time the team spends on complex processes.

E. How to Conduct a Code Review?

During each session, here are the instructions that I set forth to guide the code review.

  1. Every member of the team will focus on reviewing code produced by the other members. So each person on the data science team will have to review code from two others.
  2. A copy of each R file that needs review should be made and shared with the other two members of the team. Ideally, this file should contain fewer than 500 lines of code.
  3. The reviewer should use the file shared by the original author
  4. The reviewer should provide any suggestions or recommendations using comments that are in upper case. Any suggestions made about specific code should reference the function, line number, or section.

F. What Factors Should be Considered During a Code Review?

When the reviewer is looking at an R file for code review, here are the specific factors that they should evaluate.

  1. Does this code accomplish the author’s purpose?
  2. Are there any obvious logic errors in the code?
  3. Looking at the requirements, are all cases fully implemented?
  4. Does the code conform to existing style guidelines?
  5. Are there any areas where code could be improved? (made shorter, faster, etc.)
  6. Is this the best way to achieve the desired result?
  7. Does the code handle all edge cases?
  8. Do you see potential for useful abstractions?
  9. Were the unit tests appropriate?
  10. Is there adequate documentation and comments?

Any cases in which the reviewer is suggesting a change, I recommend that they provide a legitimate reason.

Furthermore, I provide the following guidance.

  1. Think like an adversary, but be nice about it. Try to “catch” authors taking shortcuts or missing cases by coming up with problematic configurations/input data that breaks their code.
  2. Compliment / reinforce good practices: One of the most important parts of the code review is to reward developers for growth and effort

These are some of the best practices that I’ve found from leading code review sessions on a small data science team. There is no single right way to set up a code review process and it will likely be dictated by the size of the team and type of work.

For any businesses interested in hiring a data scientist with over eight years of work experience, be it for freelance, part time, or full time opportunities, please contact me at mathewanalytics@gmail.com

7 thoughts on “Best Practices for Code Review: R Edition”

  1. Pingback: Best Practices for Code Review: R Edition – Data Science Austria

  2. Pingback: Finest Practices for Code Evaluation: R Version – JobsandVisa

  3. The standard for code review in industry and large open-source projects is to review every piece of code before it’s merged. The key to this strategy is ensuring that every line of code has been viewed by at the very least the author and one trusted team member. Sampling-based code review that’s biased to where the group thinks errors may be has the drunks-under-the-lamppost problem of not covering a large chunk of code.

    Typically, APIs are designed top-down from a client’s perspective (the client being a human or another piece of code that calls the API), then coded bottom up. Each piece is reviewed and unit tested before being merged. The key to this strategy is being able to develop high-level modules with the confidence that the low-level pieces work. It may sound like it’s going to take longer to unit test as you go, but the net is a huge time savings with the upside of having more reliable code.

    It’s also critical to keep the three key components to software development in synch: documenting (i.e., design), testing, and coding. In larger projects, features of any size always start with a functional spec outlining how it works from the client point of view—that’s usually written like the eventual documentation will be written because that’s what says what code does. With just doc, the key here is to make sure the API that is being delivered is both easy to document and easy to test. For example, if you find large functions with intertwined, dependent arguments, as you often find in REPL languages like R, Python, and Julia, produces what programmers like to call a “bad smell”, precisely because such functions are hard to document and test.

    Take the rgamma function in R for example. It takes three parameter arguments, shape, rate, and scale. Experienced statisticians will know that scale and rate are inverses, yet this isn’t mentioned in the doc anywhere other than implicitly with the values of the default arguments. What happens if you supply both scale and rate? The doc doesn’t say, so I just tried it. It does not return an error, as one might expect from languages that try to keep their arguments coherent, but rather uses the rate and ignores the scale (order doesn’t matter). At the point someone proposed the rgamma function’s API, someone else should’ve piped up and said, “Whoa, hang on there a second, cowpoke; this function’s going to be a mess to test and document because of the redundant arguments.” With scale not getting a default and rate and shape being inverses, the tests need to cover behavior for all 8 possible input patterns. The doc should really say what happens when both scale and rate are specified. Instead, it just says “Invalid arguments will result in return value ‘NaN’, with a warning.” That implies that inconsistent rate and scale arguments (rate = 10, scale = 10) aren’t considered invalid arguments.

    I should also say that my comments above are intended for API design, such as an R package one might want to distribute or a piece of infrastructure a lab or company wants to support. I wouldn’t recommend this style of functional design and doc and testing for exploratory research code, because it’s much harder to design up front and isn’t intended to be portable or distributed beyond a tiny group of collaborators. I’m not saying don’t test such code, I’m just saying the best practices there would be different than for designing APIs for public consumption. For example, no need to test Windows and Linux and Mac if you only ever target one platform, no reason to test all the boundary conditions if they’re never going to be used, and so on. It absolutely still helps to design top down and write modular reusable components bottom up. It’s just usually not apparent what these modules will be until after many iterations.

  4. Pingback: Drunk-under-the-lamppost testing « Statistical Modeling, Causal Inference, and Social Science

  5. Pingback: Drunk-under-the-lamppost testing – Data Science Austria

  6. Pingback: Drunk-under-the-lamppost testing – JobsandVisa

  7. Pingback: Drunk-under-the-lamppost testing | R-bloggers

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top