Turning Data Into Awesome With sqldf and pandasql

Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts.

Screenshot from 2015-04-29 11:23:02
Screenshot from 2015-04-29 11:23:44

R and sqldf

sqldf

("SELECT COUNT(*) FROM df2 WHERE state = 'CA'")
 
  COUNT(*)
1        4
 

sqldf

("SELECT 
            df2.firstname, 
            df2.lastname, 
            df1.var1, 
            df2.state 
      FROM df1
      INNER JOIN df2 ON df1.personid = df2.id 
      WHERE df2.state = 'TX'")
 
  firstname lastname  var1 

state

1     David    Spade -2.09    TX
2       Joe  Montana  1.16    TX
 

sqldf

("SELECT 
            df2.state, 
            COUNT(df1.var1) 
      FROM df1
      INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0
      GROUP BY df2.state")
 
   

state

 COUNT(df1.var1)
1     AZ               1
2     

CA

               1
3     GA               1
4     IL               1
5     NC               1
6     NY               1
7     OK               1
8     SC               1
9     TX               1
10    VT               1

Python and pandasql

import pandasql 

as

 ps
 
 
q1 = """SELECT COUNT(*) FROM df2 WHERE state = 'CA'"""
 

print

 ps.sqldf(q1, locals())
 
   COUNT(*)
0         4
 
q2 = """
    SELECT 
        df2.firstname, 
        df2.lastname, 
        df1.var1, 
        df2.state 
    FROM df1 INNER JOIN df2 ON df1.personid = df2.id 
    WHERE df2.state = "TX";
    """
 

print

 ps.sqldf(q2, locals())
 
  firstname lastname  var1 

state

0     David    Spade -2.09    TX
1       Joe  Montana  1.16    TX
 
q3 = """SELECT 
            df2.state, 
            COUNT(df1.var1) 
      FROM df1
      INNER JOIN df2 ON df1.personid = df2.id WHERE df1.var1 > 0
      GROUP BY df2.state"""
 

print

 ps.sqldf(q3, locals())
 
  

state

  COUNT(df1.var1)
0    AZ                1
1    

CA

                1
2    GA                1
3    IL                1
4    NC                1
5    NY                1
6    OK                1
7    SC                1
8    TX                1
9    VT                1

1 thought on “Turning Data Into Awesome With sqldf and pandasql”

  1. Pingback: Distilled News | Data Analytics & R

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top