Executing plain SQL over Pandas dataframes

Flávio Teixeira
3 min readJan 20, 2020

Sometimes when using Pandas it's difficult to manipulate data the way we would do in a relational database. Some simple operations like joins or composite filters are not very intuitive or too painful to execute with Pandas library.

Join example using Pandas

There is a package called pandasql. It solves almost all our problems by allowing us to execute plain SQL over Pandas dataframes.

It's a very simple tool to use, just load your dataframe and query it with the library tools just like it was a table.

It uses the SQLite syntax, which is very complete and handles a lot of functions you would need in a normal relational database situation.

You can even use temporary tables with it!

Since SQL is a language people are more used to, and could be called "universal", you can expand the limits of your work with pandas by now using this simple tip.

Obs.: All my examples were made using the Jupyter Notebooks, and I did not care about receiving the return of the sqldf function to another variable for presentation purposes, but when working with pandasql you can be free to continue working with pandas dataframes since the return of the function is a transformed dataframe.

The notebook used in this example can be found here.

Feel free to access my other repositories, I post a lot of snippets and personal projects that could help you!

Follow me here to receive weekly tips and contributions, mainly about data engineering.

Thanks for reading :)

--

--

Flávio Teixeira

Data engineer, gamer and addicted to technology. Currently working at Riot Games