statistics – Mathew Analytics

Powerlytics: Impact of Age, Gender, and Body Weight on Total Weight Lifted in Powerlifting Meets

July 1, 2021June 5, 20222 Comments

A. Background The Open Powerlifting initiative attempts to create an accurate and open archive of all powerlifting meet data throughout the world. As someone who recently started competing again after a six year delay from powerlifting, I often mess around with the Open Powerlifting data as it’s of personal interest. Most of the anlysis that … Continue reading Powerlytics: Impact of Age, Gender, and Body Weight on Total Weight Lifted in Powerlifting Meets

Semiparametric Regression in R

May 5, 2018March 24, 20204 Comments

A. INTRODUCTION When building statistical models, the goal is to define a compact and parsimonious mathematical representation of some data generating process. Many of these techniques require that one make assumptions about the data or how the analysis is specified. For example, Auto Regressive Integrated Moving Average (ARIMA) models require that the time series is … Continue reading Semiparametric Regression in R

Using csvkit to Summarize Data: A Quick Example

May 10, 2017June 27, 20171 Comment

As data analysts, we’re frequently presented with comma-separated value files and tasked with reporting insights. While it’s tempting to import that data directly into R or Python in order to perform data munging and exploratory data analysis, there are also a number of utilities to examine, fix, slice, transform, and summarize data through the command … Continue reading Using csvkit to Summarize Data: A Quick Example

Examining Website Pathing Data Using Markov Chains

April 10, 2017May 23, 20172 Comments

A markov model can be used to examine a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Let’s define a stochastic process that takes on a finite number of possible values which are nonnegative integers. Each state, , represents it’s value … Continue reading Examining Website Pathing Data Using Markov Chains

Statistics Refresher

March 3, 2017May 23, 20172 Comments

Let’s face it, a good statistics refresher is always worthwhile. There are times we all forget basic concepts and calculations. Therefore, I put together a document that could act as a statistics refresher and thought that I’d share it with the world. This is part one of a two part document that is still being completed. This refresher … Continue reading Statistics Refresher

Batch Forecasting in R

December 29, 2016March 11, 20171 Comment

Given a data frame with multiple columns which contain time series data, let’s say that we are interested in executing an automatic forecasting algorithm on a number of columns. Furthermore, we want to train the model on a particular number of observations and assess how well they forecast future values. Based upon those testing procedures, … Continue reading Batch Forecasting in R

Statistical Reading Rainbow

October 17, 2016January 15, 2017No Comments

For those of us who received statistical training outside of statistics departments, it often emphasized procedures over principles. This entailed that we learned about various statistical techniques and how to perform analysis in a particular statistical software, but glossed over the mechanisms and mathematical statistics underlying these practices. While that training methodology (hereby referred to … Continue reading Statistical Reading Rainbow

Weekly R-Tips: Visualizing Predictions

September 4, 2016March 11, 20177 Comments

Lets say that we estimated a linear regression model on time series data with lagged predictors. The goal is to estimate sales as a function of inventory, search volume, and media spend from two months ago. After using the lm function to perform linear regression, we predict sales using values from two month ago. If … Continue reading Weekly R-Tips: Visualizing Predictions

Applied Statistical Theory: Quantile Regression

June 13, 2016March 11, 20176 Comments

This is part two of the ‘applied statistical theory’ series that will cover the bare essentials of various statistical techniques. As analysts, we need to know enough about what we’re doing to be dangerous and explain approaches to others. It’s not enough to say “I used X because the misclassification rate was low.” Standard linear … Continue reading Applied Statistical Theory: Quantile Regression

Applied Statistical Theory: Belief Networks

May 21, 2016March 11, 20171 Comment

Applied statistical theory is a new series that will cover the basic methodology and framework behind various statistical procedures. As analysts, we need to know enough about what we’re doing to be dangerous and explain approaches to others. It’s not enough to say “I used X because the misclassification rate was low.” At the same … Continue reading Applied Statistical Theory: Belief Networks