Sending AWS Athena query results directly to users emails

Flávio Teixeira
3 min readFeb 3, 2020

Hi guys! Today I’ll make a different kind of post, it will not be a tip as usual but something I built.

I participated in this project which was a layer over Athena that build the query for the user using drag-and-drop features. The application would have many users and there was this need to send the query results directly to the current user that did the query, so I came to this solution:

Simple as that, the key point of this solution is the triggered AWS Lambda that does all the work. It receives the result object creation event and from this event, sends the link through the email using the SES service. But how does it know which user made the query execution? That is the trick in this structure.

The Athena API from Amazon SDK, independent of language, contains a function called Start Query Execution which has, among others, a parameter of query destination, marked in bold in the snippet bellow:

{    
"ClientRequestToken": "string",
"QueryExecutionContext": {
"Database": "string"
},
"QueryString": "string",
"ResultConfiguration": {
"EncryptionConfiguration": {
"EncryptionOption": "string",
"KmsKey": "string"
},
"OutputLocation": "string"
},
"WorkGroup": "string"
}

This can be used in a way the application can force the output of the query. I used this feature to set the first directory of the output to be the user email, so when the AWS Lambda function consumes the query result, it gathers the email from the start of its file path.

Example of query execution:

output = "s3://query-results-bucket/user@provider.com/queries"
athena_client.start_query_execution(
QueryString="SELECT * FROM USERS",
QueryExecutionContext={'Database': "datalake"},
ResultConfiguration={'OutputLocation': output}
)

The AWS Lambda function is ready to identify this email and do all the process until send the message to the user. This is the complete flux done by the function:

  1. One or more events are consumed
  2. The file path is identified
  3. The user email is extracted from the file path
  4. The file is set public at the bucket
  5. The email is sent with the file URL for the designed user through SES.

There are some things to consider...

This solution lacks of security because it needs the bucket to be public accessible to use the simple http download links, if your application and/or data lake has sensitive or personal data this will not work, but for study and developing purposes I preferred to go the easy way. There are two simple ways to fix these issues:

  • The first one is to close the bucket for public access and change the Lambda Function logic so it uses the presigned URL. This removes the feature of downloading the results in the future if needed, since this kind of URL has an expiration time.
  • The other one is to alter the bucket policies to filter the access by IP.

I really liked this solution because it can be used in a large number of situations and it opens a fan of possibilities of features and products to be made. You can build your own sand-boxes over Athena, you can use it in monitoring systems, etc.

The script for the lambda function used in this example can be found here.

Feel free to access my other repositories, I post a lot of snippets and personal projects that could help you!

Follow me here to receive weekly tips and contributions, mainly about data engineering.

Thanks for reading :)

--

--

Flávio Teixeira

Data engineer, gamer and addicted to technology. Currently working at Riot Games