atmathew

R Programming Notes – Part 2

December 18, 2022October 7, 20232 Comments

In an older post, I discussed a number of functions that are useful for programming in R. I wanted to expand on that topic by covering other functions, packages, and tools that are useful. Over the past year, I have been working as an R programmer and these are some of the new learnings that … Continue reading R Programming Notes – Part 2

R Programming Notes – Part 1

November 5, 2022October 7, 20234 Comments

I’ve been on a note taking binge recently. This post covers a variety of topics related to programming in R. The contents were gathered from many sources and structured in such a way that it provided the author with a useful reference guide for understanding a number of useful R functions. DO.CALL The do.call function … Continue reading R Programming Notes – Part 1

Writing Functions in R: Example One

October 14, 2022October 7, 20232 Comments

A. Background In previous posts, I covered a number of useful functions and packages for writing reusable code. I wanted to extend on that information by providing a working example of how to put together a function. In particular, I will walk through the process of generating a function that executes evaluation of a time … Continue reading Writing Functions in R: Example One

Data.Table by Example – Part 3

November 20, 2021October 7, 20232 Comments

For this final post, I will cover some advanced topics and discuss how to use data tables within user generated functions. Once again, let’s use the Chicago crime data. Let’s start by subseting the data. The following code takes the first 50000 rows within the dat dataset, selects four columns, creates three new columns pertaining … Continue reading Data.Table by Example – Part 3

Data.Table by Example – Part 2

November 15, 2021October 7, 20235 Comments

In part one, I provided an initial walk through of some nice features that are available within the data.table package. In particular, we saw how to filter data and get a count of rows by the date. Let us now add a few columns to our dataset on reported crimes in the city of Chicago. … Continue reading Data.Table by Example – Part 2

Data.Table by Example – Part 1

November 10, 2021October 7, 20236 Comments

For many years, I actively avoided the data.table package and preferred to utilize the tools available in either base R or dplyr for data aggregation and exploration. However, over the past year, I have come to realize that this was a mistake. Data tables are incredible and provide R users with a syntatically concise and … Continue reading Data.Table by Example – Part 1

Powerlytics: Impact of Age, Gender, and Body Weight on Total Weight Lifted in Powerlifting Meets

July 1, 2021June 5, 20222 Comments

A. Background The Open Powerlifting initiative attempts to create an accurate and open archive of all powerlifting meet data throughout the world. As someone who recently started competing again after a six year delay from powerlifting, I often mess around with the Open Powerlifting data as it’s of personal interest. Most of the anlysis that … Continue reading Powerlytics: Impact of Age, Gender, and Body Weight on Total Weight Lifted in Powerlifting Meets

Examining the Tweeting Patterns of Prominent Crossfit Gyms

May 20, 2021June 5, 2022

A. Introduction The growth of Crossfit has been one of the biggest developments in the fitness industry over the past decade. Promoted as both a physical exercise philosophy and also as a competitive fitness sport, Crossfit is a high-intensity fitness program incorporating elements from several sports and exercise protocols such as high-intensity interval training, Olympic weightlifting, … Continue reading Examining the Tweeting Patterns of Prominent Crossfit Gyms

Turning Data Into Awesome With sqldf and pandasql

July 29, 2018March 24, 20201 Comment

Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts. R and sqldf sqldf(“SELECT COUNT(*) FROM … Continue reading Turning Data Into Awesome With sqldf and pandasql

Semiparametric Regression in R

May 5, 2018March 24, 20204 Comments

A. INTRODUCTION When building statistical models, the goal is to define a compact and parsimonious mathematical representation of some data generating process. Many of these techniques require that one make assumptions about the data or how the analysis is specified. For example, Auto Regressive Integrated Moving Average (ARIMA) models require that the time series is … Continue reading Semiparametric Regression in R