Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts. R and sqldf sqldf(“SELECT COUNT(*) FROM df2 WHERE state = ‘CA'”) COUNT(*) 1 4 sqldf(“SELECT df2.firstname, df2.lastname, df1.var1, df2.state FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df2.state = ‘TX'”) firstname lastname var1 state 1 David Spade -2.09 TX 2 Joe Montana 1.16 TX sqldf(“SELECT df2.state, COUNT(df1.var1) FROM df1 INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0 GROUP BY df2.state”) state COUNT(df1.var1) 1 AZ 1 2 CA 1 3 GA 1 4 IL 1 5 NC 1 6 NY 1 7 OK 1 8 SC 1 9 TX 1 10 VT 1 Python and pandasql import pandasql as ps q1
I’ve been putting together a basic SQL cheat sheet that could be used as a reference guide. Here are a series of common procedures that should be of use for anyone who uses SQL to extract data. No explanations are provided as they should largely be known to the end user.