If you’re using Amazon Athena, you may have seen one of these errors:
- Query exhausted resources at this scale factor
- EXCEEDED_MEMORY_LIMIT: Query exceeded local memory limit
- GENERIC_INTERNAL_ERROR: io.airlift.bytecode.CompilationException
- Encountered too many errors talking to a worker node. The node may have crashed or be under too much load.
It may mean you’ve started to hit the limit with Athena and need to move.
Why does it happen?
There are several reasons. One reason is that Athena is a shared resource. At any moment, any number of other companies could be using it. Beyond having limited resources, Amazon needs to make sure no one customer hogs the shared resources. Imagine what one accidental query against a massive data set could do. So they limit how much data, query power and concurrent queries you can run.
What are these limits?
- 20 concurrent queries - by default, Athena limits each account to 20 concurrent queries.
- Number of S3 requests - S3 limits you to 5500 requests per second, which Athena can hit during queries. This is an easy limit to overcome: just reduce the number of files.
- Timeouts - Athena times out after 30 minutes. You don’t get charged for the query time if it happens. But you don’t get your query results either.
Another big reason is that Athena is not designed for large data sets and queries.
- Number of rows - This limit is not clear. For one customer it was 5 billion rows. But if your table has too many rows, queries can fail.
- Number of columns - it’s also not clear when you hit this limit either. But when you do and run out of memory, you often get “GENERIC_INTERNAL_ERROR: io.airlift.bytecode.CompilationException”.
- Table size - Rows, columns and overall size all have to do with the limitation of having to load data into the RAM of a single node. If you have large data sets, such as a wide fact table approaching billions of rows, you will probably have an issue.
- Queries - Amazon will recommend to avoid full scans (e.g. *), multiple scans in the same query, or using multiple tables or views in a query. They also recommend avoiding “expensive” operations like JOIN, GROUP BY, ORDER BY, or UNION when possible, especially when working with large tables.
- Query output size - query results are written by a single Athena node, and the results rely on RAM. Even writing the results to a new table can be limited by the available RAM on a single table. In short, if you have large result sets, you are in trouble.
Even if you figure out tricks to get around Athena being a shared resource, such as not starting tasks right on the hour, you will still hit fundamental limitations with Athena’s design, many of which center around several Athena operations being limited to a single node.
In short, Athena is not the best choice for supporting frequent, large-scale data analytics needs. If you are already getting these errors, it means you need to consider moving.
If you are thinking about Athena, first analyze your data sets. If you have billion row fact tables, Athena will probably not be the best choice. Also, if you need to do ad hoc, those involve doing JOIN and GROUP BY operations with fast performance. That may eliminate Athena.
If you have gotten to a point where you need faster, more predictable query performance, you need to move to a data warehouse. If you’ve already accepted Athena, then you probably will be choosing a cloud data warehouse or Presto. If you want some guidance on making the choice between various data warehouses such as Firebolt, Snowflake, or Redshift; or other federated query engines like Presto you can read:
- The data warehouse comparison guide
- Choosing between the best federated query engine and a data warehouse
- Choosing the right federated query engine - Athena vs. Redshift Spectrum vs. Presto