R Programming Notes – Part 2

In an older post, I discussed a number of functions that are useful for programming in R. I wanted to expand on that topic by covering other functions, packages, and tools that are useful. Over the past year, I have been working as an R programmer and these are some of the new learnings that have become fundamental in my work.

IS TRUE and IS FALSE

isTRUE is a logical operator that can be very useful in checking whether a condition or variable has been set to true. Lets say that we are writing a script whereby we will take run a generalized linear regression when the parameter run_mod is set to true. The conditional portion of the script can be written as either if(isTRUE(run_mod)) or if(run_mod). I am partial to isTRUE, but this is entirely a matter of personal preference. Users should also be aware of the isFALSE function, which is part of the BBmisc package.


run_mod = TRUE

if(isTRUE(run_mod){
    tryCatch(
        GLM_Model(full_data=full.df, train_data=train.df,
                  test_data=test.df),
    error = function(e) {
         print("error occured")
         print(e)
    })
}

INVISIBLE

The invisible function can be used to return an object that is not printed out and can be useful in a number of circumstances. For example, it’s useful when you have helper functions that will be utilized within other functions to do calculations. In those cases, it’s often not desireable to print those results. I generally use invisible when I’m checking function arguments. For example, consider a function that takes two arguments and you need to check whether the input is a vector.


if(!check_input(response, type='character', length=1)) {
    stop('something is wrong')
}

The check_input function is something I created and has a few lines which contain invisible. The idea is for check_input to return true or false based on the inputs so that it’ll stop stop the execution when needed.


if(is.null(response) & !length(response)==0) {
    return(FALSE)
} else if (!is.null(response)) {
    return(invisible(TRUE))
}

DEBUG

When I’m putting together new classes or have multiple functions that interact with one another, I ensure that the code includes an comprehensive debugging process. This means that I’m checking my code at various stages so that I can identify when issues arise. Consider that I’m putting together a function that will go through a number of columns in a data frame, summarize those variables, and save the results as a nested list. To effectively put together code without issues, I ensure that the functions takes a debug argument that will run when it’s set to true. In the code below, it will print out values at different stages of the code. Furthermore, the final line of the code will check the resulting data structure.


DSummary_Variable(data_obj, var, debug=TRUE){
......
}

if(debug) message('|==========>>>  Processing of the variable. \n')

if(debug){
    if(!missing(var_summary)){
        message('|==========>>>  var_summary has been                         created and has a length of ', length(var_summary), ' and the nested list has a length of ', length(var_summary[['var_properties']]), ' \n')
    } else {
        stop("var_summary is missing. Please investigate")
    }
}

If you have multiple functions that interact with one another, it’s a good idea to preface the printed message with the name of the function name.


add_func <- function(a,b) a + b

mult_func <- function(a,b) a * b

main_func <- function(data, cola, colb, debug=TRUE){

if(debug){
    message("mult_func: checking inputs to be used")
}

mult_func(data[,cola], data[,colb])

if(debug){
    message("mult_add: checking inputs to be used")
}


Stay tuned for part three, where I’ll talk about the testthat and assertive package.

2 thoughts on “R Programming Notes – Part 2

  1. > The conditional portion of the script can be written as either if(isTRUE(run_mod)) or if(run_mod).
    > I am partial to isTRUE, but this is entirely a matter of personal preference.
    Be careful when using isTRUE(). It’s just a wrapper to identical(TRUE, x), which is nice because it always returns a single and non-NA logical. It can be surprising because identical() is really strict: isTRUE(c(a = TRUE)) returns FALSE because the input has names. Not nitpicking, just want to spare you any future headaches.
    Also, you might like the debug() function. It tags a function for debugging, so whenever the tagged function is called, you “step into” it’s environment and can run its contents line-by-line or examine the environment. That can be a time-intensive process, though, so a good method might be using your debug-logging to locate the problem code and then debug() to investigate in depth.

    Like

Leave a comment