Examining the Tweeting Patterns of Prominent Crossfit Gyms

A. Introduction The growth of Crossfit has been one of the biggest developments in the fitness industry over the past decade. Promoted as both a physical exercise philosophy and also as a competitive fitness sport, Crossfit is a high-intensity fitness program incorporating elements from several sports and exercise protocols such as high-intensity interval training, Olympic weightlifting, plyometrics, powerlifting, gymnastics, strongman, and so forth. Now with over 10,000 Crossfit affiliated gyms (boxes) throughout the United States, the market has certainly become more saturated and gyms must initiate more unique marketing strategies to attract new members. In this post, I will investigate how some prominent Crossfit boxes are utilizing Twitter to engage with consumers. While Twitter is a great platform for news and entertainment, it is usually not the place for customer acquisition given the lack of targeted messaging. Furthermore, unlike platforms like Instagram,Twitter is simply not an image/video centric tool where followers can view accomplishments from their favorite fitness heroes, witness people

read more Examining the Tweeting Patterns of Prominent Crossfit Gyms

R Programming Notes – Part 2

In an older post, I discussed a number of functions that are useful for programming in R. I wanted to expand on that topic by covering other functions, packages, and tools that are useful. Over the past year, I have been working as an R programmer and these are some of the new learnings that have become fundamental in my work. IS TRUE and IS FALSE isTRUE is a logical operator that can be very useful in checking whether a condition or variable has been set to true. Lets say that we are writing a script whereby we will take run a generalized linear regression when the parameter run_mod is set to true. The conditional portion of the script can be written as either if(isTRUE(run_mod)) or if(run_mod). I am partial to isTRUE, but this is entirely a matter of personal preference. Users should also be aware of the isFALSE function, which is part of the BBmisc package. INVISIBLE The invisible

read more R Programming Notes – Part 2

R Programming Notes – Part 1

I’ve been on a note taking binge recently. This post covers a variety of topics related to programming in R. The contents were gathered from many sources and structured in such a way that it provided the author with a useful reference guide for understanding a number of useful R functions. DO.CALL The do.call function executes a function call on a list of arguments. do.call(“R_Function”, “List_of_Arguments”) This is equivilant to telling R which arguments the function should operate on. R_Function( “List_of_Arguments” ){ … } Consider the following list with four elements. We can use this function to find the total sum across all list elements or bind the rows into a data frame. x1 <- c(1,2,5) x2 <- c(1,3,6) x3 <- c(1,4,7) x4 <- c(1,5,8)   do.call(sum, list(x1,x2,x3,x4)) # sum all list elements do.call(rbind, list(x1,x2,x3,x4)) # rbind the list elements Let’s consider a scenario where we have a small data frame and want to run a general linear model on different

read more R Programming Notes – Part 1

Turning Data Into Awesome With sqldf and pandasql

Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts.             R and sqldf sqldf(“SELECT COUNT(*) FROM df2 WHERE state = ‘CA'”)   COUNT(*) 1 4   sqldf(“SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df2.state = ‘TX'”)   firstname lastname var1 state 1 David Spade -2.09 TX 2 Joe Montana 1.16 TX   sqldf(“SELECT df2.state, COUNT(df1.var1) FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0 GROUP BY df2.state”)   state COUNT(df1.var1) 1 AZ 1 2 CA 1 3 GA 1 4 IL 1 5 NC 1 6 NY 1 7 OK 1 8 SC 1 9 TX 1 10 VT 1 Python and pandasql import pandasql as ps     q1

read more Turning Data Into Awesome With sqldf and pandasql

Semiparametric Regression in R

A. INTRODUCTION When building statistical models, the goal is to define a compact and parsimonious mathematical representation of some data generating process. Many of these techniques require that one make assumptions about the data or how the analysis is specified. For example, Auto Regressive Integrated Moving Average (ARIMA) models require that the time series is weakly stationary or can be made so. Furthermore, ARIMA assumes that the data has no deterministic time trends, the variance of the error term is constant, and so forth. Assumptions are generally a good thing, but there are definitely situations in which one wants to free themselves from such “constraints.” In the context of evaluating relationships between one or more target variables and a set of explanatory variables, semiparametric regression is one such technique that provides the user with some flexibility in modeling complex data without maintaining stringent assumptions. With semiparametric regression, the goal is to develop a properly specified model that integrates the simplicity

read more Semiparametric Regression in R

Packages for Getting Started with Time Series Analysis in R

A. Motivation During the recent RStudio Conference, an attendee asked the panel about the lack of support provided by the tidyverse in relation to time series data. As someone who has spent the majority of their career on time series problems, this was somewhat surprising because R already has a great suite of tools for visualizing, manipulating, and modeling time series data. I can understand the desire for a ‘tidyverse approved’ tool for time series analysis, but it seemed like perhaps the issue was a lack of familiarity with the available toolage. Therefore, I wanted to put together a list of the packages and tools that I use most frequently in my work. For those unfamiliar with time series analysis, this could a good place to start investigating R’s current capabilities. B. Background Time series data refers to a sequence of measurements that are made over time at regular or irregular intervals with each observation being a single dimension. An

read more Packages for Getting Started with Time Series Analysis in R

Data.Table by Example – Part 3

For this final post, I will cover some advanced topics and discuss how to use data tables within user generated functions. Once again, let’s use the Chicago crime data. Let’s start by subseting the data. The following code takes the first 50000 rows within the dat dataset, selects four columns, creates three new columns pertaining to the data, and then removes the original date column. The output was saved as to new variable and the user can see the first few columns of the new data table using brackets or head function. We can now do some intermediate calculations and suppress their output by using braces. In this simple sample case, we have taken the mean value1 for each month, subtracted it from value2, and then squared that result. The output show the final calculation in the brackets, which is the result from squaring. Note that I also ordered the results by month with the chaining process. That’s all very

read more Data.Table by Example – Part 3

Data.Table by Example – Part 2

In part one, I provided an initial walk through of some nice features that are available within the data.table package. In particular, we saw how to filter data and get a count of rows by the date. Let us now add a few columns to our dataset on reported crimes in the city of Chicago. There are many ways to do do this but they involve the use of the := operator. Since data.table updates values by reference, we do not need to save the results as another variable. This is a very desirable feature. You can also just use the traditional base R solution for adding new columns as data.tables are also data frames. In any case, we now have three new columns with randomly selected values between 1 and 50. We can now look to summarize these values and see how they differ across the primary arrest type and other categorical variables. The above code allows us to

read more Data.Table by Example – Part 2

Data.Table by Example – Part 1

For many years, I actively avoided the data.table package and preferred to utilize the tools available in either base R or dplyr for data aggregation and exploration. However, over the past year, I have come to realize that this was a mistake. Data tables are incredible and provide R users with a syntatically concise and efficient data structure for working with small, medium, or large datasets. While the package is well documented, I wanted to put together a series of posts that could be useful for those who want to get introduced to the data.table package in a more task oriented format. For this series of posts, I will be working with data that comes from the Chicago Police Department’s Citizen Law Enforcement Analysis and Reporting system. This dataset contains information on reported incidents of crime that occured in the city of Chicago from 2001 to present. You can use the wget command in the terminal to export it as

read more Data.Table by Example – Part 1

Working With SEM Keywords in R

The following post was republished from two previous posts that were on an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine marketing campaign. Two, how can I develop an effective system for examining keywords based on different characteristics. Generating PPC Keywords in R Paid search marketing refers to the process of driving traffic to a website by purchasing ads on search engines. Advertisers bid on certain keywords that users might search for, and that determines when and where their ads appear. For example, an individual who owns an auto dealership would want to bid on keywords relating to automobiles that a reasonable people would search for on a search engine. In both Google and Bing, advertisers are able to specify which keywords they would like to bid for and at what amount.

read more Working With SEM Keywords in R