Programatically querying AWS Athena with JDBC driver
First of all, I want to apologize myself for not posting here for a long time. Recently I moved from Social Miner to iFood and I've been working hard to learn everything as fast and better as I can from them.
Before moving to iFood, I had to do a technical skills practical test and among a lot of data engineering requirements, I had this one to build a REST API to get some metrics, so I wanted to show you guys this solution I applied, a Spring Boot REST API consuming data directly from the Data Lake at AWS Athena.
Obs: For security and privacy purposes, I changed the sample data used for the test to a new generated one.
For the Athena connection I used this Java library.
The credentials required are all set in the .properties
file, and called dynamically within the getConnection
method, as seen bellow.
The connection is called and used just like in any other JDBC library.
It’s a very simple Spring Boot project consisted by the controller classes where the API routes are set, the services classes where the connections are made and the queries are pre constructed, the error and response handlers and the data objects.
I really liked the performance of the queries, it responds just like if it was querying a RDB. I made two metrics to materialize the use of the data, both of them joining the two sample tables, and each metric has its own endpoint.
Ps: I choose not to do list endpoints since this API was intended to contain only query metrics.
The first endpoint returns the number of orders by state in a given day.
The second one returns the top 10 restaurants by a given customer id.
I have a lot more to show about this project, but since it's more about code than anything else, I prefer to open its code so anyone can browse freely on it.
The source code can be found here.
Feel free to access my other repositories, I post a lot of snippets and personal projects that could help you!
Thanks for reading :)