atmathew

Data.Table by Example – Part 3

February 20, 2024May 7, 20252 Comments

For this final post, I will cover some advanced topics and discuss how to use data tables within user generated functions. Once again, let’s use the Chicago crime data. Let’s start by subseting the data. The following code takes the first 50000 rows within the dat dataset, selects four columns, creates three new columns pertaining … Continue reading Data.Table by Example – Part 3

Data.Table by Example – Part 2

January 15, 2024May 7, 20255 Comments

In part one, I provided an initial walk through of some nice features that are available within the data.table package. In particular, we saw how to filter data and get a count of rows by the date. Let us now add a few columns to our dataset on reported crimes in the city of Chicago. … Continue reading Data.Table by Example – Part 2

Data.Table by Example – Part 1

January 10, 2024May 7, 20256 Comments

For many years, I actively avoided the data.table package and preferred to utilize the tools available in either base R or dplyr for data aggregation and exploration. However, over the past year, I have come to realize that this was a mistake. Data tables are incredible and provide R users with a syntatically concise and … Continue reading Data.Table by Example – Part 1

Examining the Tweeting Patterns of Prominent Crossfit Gyms

May 20, 2021June 5, 2022

A. Introduction The growth of Crossfit has been one of the biggest developments in the fitness industry over the past decade. Promoted as both a physical exercise philosophy and also as a competitive fitness sport, Crossfit is a high-intensity fitness program incorporating elements from several sports and exercise protocols such as high-intensity interval training, Olympic weightlifting, … Continue reading Examining the Tweeting Patterns of Prominent Crossfit Gyms

Semiparametric Regression in R

May 5, 2018March 24, 20204 Comments

A. INTRODUCTION When building statistical models, the goal is to define a compact and parsimonious mathematical representation of some data generating process. Many of these techniques require that one make assumptions about the data or how the analysis is specified. For example, Auto Regressive Integrated Moving Average (ARIMA) models require that the time series is … Continue reading Semiparametric Regression in R

Working With SEM Keywords in R

August 15, 2017August 6, 20182 Comments

The following post was republished from two previous posts that were on an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine marketing campaign. Two, how … Continue reading Working With SEM Keywords in R

Using csvkit to Summarize Data: A Quick Example

May 10, 2017June 27, 20171 Comment

As data analysts, we’re frequently presented with comma-separated value files and tasked with reporting insights. While it’s tempting to import that data directly into R or Python in order to perform data munging and exploratory data analysis, there are also a number of utilities to examine, fix, slice, transform, and summarize data through the command … Continue reading Using csvkit to Summarize Data: A Quick Example

Examining Website Pathing Data Using Markov Chains

April 10, 2017May 23, 20172 Comments

A markov model can be used to examine a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Let’s define a stochastic process that takes on a finite number of possible values which are nonnegative integers. Each state, , represents it’s value … Continue reading Examining Website Pathing Data Using Markov Chains

Statistics Refresher

March 3, 2017May 23, 20172 Comments

Let’s face it, a good statistics refresher is always worthwhile. There are times we all forget basic concepts and calculations. Therefore, I put together a document that could act as a statistics refresher and thought that I’d share it with the world. This is part one of a two part document that is still being completed. This refresher … Continue reading Statistics Refresher

Introduction to the RMS Package

February 5, 2017March 11, 20173 Comments

The rms package offers a variety of tools to build and evaluate regression models in R. Originally named ‘Design’, the package accompanies the book “Regression Modeling Strategies” by Frank Harrell, which is essential reading for anyone who works in the ‘data science’ space. Over the past year or so, I have transitioned my personal modeling … Continue reading Introduction to the RMS Package