Logging in R: Why, When, and How

Anyone who is writing R code should get familiar with logging. While it may not be useful for writing exploratory scripts or doing basic coding projects, logging can be critical when doing development work as it helps track what your code is doing. In this post, I will provide some reasons to maintain a log file and show examples using the logger package. Note that R has a number of ways to undertake logging, including futile.logger, loggit, and log4r, but I have found that logger best suits my needs.

A. Why

Logging should give you information about what your code is doing. A log file is a file that records events that occur during a process. It basically helps to track the process and discover if anything has gone wrong. When logging my code, I basically think about characteristics in the data, code, or calculations that I’d like to know about. Therefore, my log files will document things like the number of rows in a data frame, number of unique identifiers, NA’s generated after a calculation, and so forth.

B. When

My logging practices are largely dictated by the type of project I’m working on.

  1. Small Projects

With smaller scripting or data analysis projects, my goal with logging is just to identify what is going on with the code and data as the script proceeds. For small projects, I do not save the logging records to a file, but simply print it out to the console and monitor it there. This will often help with identifying when
errors are occurring or if certain code is generating inaccurate outputs.

Consider the following example where I have a small process that imports data, generates a couple mew columns, and saves that new data table to file. Let’s say this R script ends up being 300 to 500 lines.

It may contain the following elements at the start of the script. In logger, the log_info function can be used to record a message at a particular moment in the script. In the following code, I’ve added logs for when the script was initialized, prior to importing the csv file, and logs some important characteristics about
the raw dataset.

#################################################################################################
### PRELIMINARIES

options(scipen = 999)

library(logger)

log_threshold(TRACE)

log_info("Script initialized....")

#################################################################################################
### IMPORT PACKAGES

new_packages <- use_these_packages[!(use_these_packages %in% installed.packages()[,"Package"])]

if(length(new_packages)) {
  install.packages(new_packages)
  sapply(use_these_packages, require, character.only = TRUE)
} else {
  sapply(use_these_packages, require, character.only = TRUE)
}

#################################################################################################
### SET DATE PARAMETERS 

curr_year = year(Sys.Date())

months_active = month(Sys.Date())-1

#################################################################################################
### IMPORT RAW DATA 

log_info("Loading data")

mydat = fread(data_path)

log_info("The raw dataset includes {nrow(mydat)} rows")
log_info("The raw dataset includes data on {length(unique(mydat$StaffId))} staff members")
log_info("The raw dataset includes data from {min(as.Date(mydat$WorkDay))} to {max(as.Date(mydat$WorkDay))}")

Once again, my logging strategy for smaller project is just to print the results to the console. Unless it’s a more complex small project, I rarely save the log messages to file.

  1. Medium to Large Projects

With medium and large projects, my goal with logging is to be more thorough and identify what happened at sensitive points in the script. This is frequently important when I’m trying to understand where my script broke or whether the results from some calculation were accurate. For these projects, I’ll often save the log files to a sub directory with a timestamp from when it ran.

Let’s say I have some loop in script that goes through a dictionary and filter the data in order to perform some analysis. In such cases, my logging will often take the following form.

#################################################################################################
### LOOP THROUGH LOYALTY DICT AND WEEK DICT

log_info("Start looping through each loyalty member and date")

#each_loyalty = 2
for(each_loyalty in 1:nrow(loyalty_dict)){ 
  
  curr_row_dict = loyalty_dict[each_loyalty,]
  #curr_row_dict
  
  log_info("Executing analysis for {curr_row_dict$row_id}")
  
  filter_criteria_1 = paste0(colnames(curr_row_dict)[1], " == ", curr_row_dict[, get(colnames(curr_row_dict)[1])] )
  filter_criteria_2 = paste0(colnames(curr_row_dict)[2], " == '", curr_row_dict[, get(colnames(curr_row_dict)[2])], "' " )
  filter_criteria = paste0(filter_criteria_1, " & ", filter_criteria_2)
  
  log_info("Executing analysis for {filter_criteria}")
  
  dat_loyalty2 = dat_loyalty[eval(parse(text=filter_criteria)),] 
  #dat_loyalty2
  
  log_info("Filtering resulted in the following number of rows: {nrow(dat_loyalty)}")

In the above code, I record each iteration of the loop, what filter criteria is being used, the size of the filtered data table, and so forth. There have been many times that such logging has allowed me to identify and fix errors in the code.

C. How

Most of the logging I do is done with the logger package. There is an array of functionality, but log_info is the main function one should be aware of. Furthermore, one will need to be aware of log_appender and appender_file when wanting to export your logs to file.

Here are a couple really simple example of doing logging and saving the results to a .log file.

log_appender(appender_file("test_file.log"))
log_info('where is this message going?')
log_info('new line')
log_appender()



log_appender(appender_file("test_file_2.log"))
for (i in 1:25) log_info(i)
log_appender()

So there you have a simple overview of why, when, and how to do logging in R. I strong recommend to aspiring data scientists that this is a critical practice that they should get comfortable with.

For any businesses interested in hiring a data scientist with over eight years of work experience, be it for freelance, part time, or full time opportunities, please contact me at mathewanalytics@gmail.com

3 thoughts on “Logging in R: Why, When, and How”

  1. Pingback: Logging in R: Why, When, and How – Data Science Austria

  2. Pingback: Logging in R: Why, When, and How - INFOSHRI

  3. Pingback: Logging in R: Why, When, and How – JobsandVisa

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top