Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts. R and sqldf sqldf(“SELECT COUNT(*) FROM df2 WHERE state = ‘CA'”) COUNT(*) 1 4 sqldf(“SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df2.state = ‘TX'”) firstname lastname var1 state 1 David Spade -2.09 TX 2 Joe Montana 1.16 TX sqldf(“SELECT df2.state, COUNT(df1.var1) FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0 GROUP BY df2.state”) state COUNT(df1.var1) 1 AZ 1 2 CA 1 3 GA 1 4 IL 1 5 NC 1 6 NY 1 7 OK 1 8 SC 1 9 TX 1 10 VT 1 Python and pandasql import pandasql as ps q1
Anyone who has regularly worked with Google Trends data has had to deal with the slightly tedious task of grabbing keyword level data and reformatting the spreadsheet provided by Google. After looking for a seamless way to pull the data, I came upon the PyTrends library on GitHub, and sought to put together some quick user defined functions to manage the task of pulling daily and weekly trends data.
There are plenty of instances where analysts are regularly forwarded xls spreadsheets and tasked with summarizing the data. In many cases, these scenarios can be automated through fairly simple Python scripts. In the following code, I take an Excel spreadsheet with two sheets, summarize each sheet using a pivot table, and add those results to sheets in a new spreadsheet.
Using R Functions in Python
Using the csv module Using the pandas module
Not too long ago, I was on the job market looking for work as an applied statistician or data scientist within the the online marketing industry. One thing I’ve come to expect with almost every company is some sort of homework assignment or challenge where a spreadsheet would be presented along with some guidelines on what type of analysis they would like. Sometimes it’s very open ended and at other times, there are specific tasks and questions which are put forth. Initially, I saw these assignments as something fun where I could showcase my skill set. However, since last month, I’ve come to see them as a nuisance which can’t possible be a good indicator of whether someone is ‘worth hiring’ or not. I get it, companies often get inundated with resumes and they need effective processes to sift through them. And I get the value of getting some document which outlines how an applicant thought about a problem and generated