<- Back to all posts

Query exhausted resources at this scale factor

February 17, 2021

February 17, 2021

AWS Athena Error: Query exhausted resources at this scale factor

Big data geek

No items found.

Listen to this article

Powered by NotebookLM

Listen to this article

If you’re using Amazon Athena, you may have seen one of these errors:

Query exhausted resources at this scale factor
INTERNAL_ERROR_QUERY_ENGINE
EXCEEDED_MEMORY_LIMIT: Query exceeded local memory limit
GENERIC_INTERNAL_ERROR: io.airlift.bytecode.CompilationException
Encountered too many errors talking to a worker node. The node may have crashed or be under too much load.

It may mean you’ve started to hit the limit with Athena and need to move.

Why does it happen?

There are several reasons. One reason is that Athena is a shared resource. At any moment, any number of other companies could be using it. Beyond having limited resources, Amazon needs to make sure no one customer hogs the shared resources. Imagine what one accidental query against a massive data set could do. So they limit how much data, query power and concurrent queries you can run.

What are these limits?

20 concurrent queries - by default, Athena limits each account to 20 concurrent queries.
Number of S3 requests - S3 limits you to 5500 requests per second, which Athena can hit during queries. This is an easy limit to overcome: just reduce the number of files.
Timeouts - Athena times out after 30 minutes. You don’t get charged for the query time if it happens. But you don’t get your query results either.

Another big reason is that Athena is not designed for large data sets and queries.

Number of rows - This limit is not clear. For one customer it was 5 billion rows. But if your table has too many rows, queries can fail.
Number of columns - it’s also not clear when you hit this limit either. But when you do and run out of memory, you often get “GENERIC_INTERNAL_ERROR: io.airlift.bytecode.CompilationException”.
Table size - Rows, columns and overall size all have to do with the limitation of having to load data into the RAM of a single node. If you have large data sets, such as a wide fact table approaching billions of rows, you will probably have an issue.
Queries - Amazon will recommend to avoid full scans (e.g. *), multiple scans in the same query, or using multiple tables or views in a query. They also recommend avoiding “expensive” operations like JOIN, GROUP BY, ORDER BY, or UNION when possible, especially when working with large tables.
Query output size - query results are written by a single Athena node, and the results rely on RAM. Even writing the results to a new table can be limited by the available RAM on a single table. In short, if you have large result sets, you are in trouble.

Even if you figure out tricks to get around Athena being a shared resource, such as not starting tasks right on the hour, you will still hit fundamental limitations with Athena’s design, many of which center around several Athena operations being limited to a single node.

In short, Athena is not the best choice for supporting frequent, large-scale data analytics needs. If you are already getting these errors, it means you need to consider moving.

If you are thinking about Athena, first analyze your data sets. If you have billion row fact tables, Athena will probably not be the best choice. Also, if you need to do ad hoc, those involve doing JOIN and GROUP BY operations with fast performance. That may eliminate Athena.

If you have gotten to a point where you need faster, more predictable query performance, you need to move to a data warehouse. If you’ve already accepted Athena, then you probably will be choosing a cloud data warehouse or Presto. If you want some guidance on making the choice between various data warehouses such as Firebolt, Snowflake, or Redshift; or other federated query engines like Presto you can read:

‍

Table of Contents

This is some text inside of a div block.

This is some text inside of a div block.

Why 99% of Data Teams Give Up on Real-Time And How Artie Changes That

Robin Tang explains how Artie simplifies real-time data streaming and CDC for teams at ClickUp, Substack, and Alloy.

Firebolt Team

The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft

Lyft's Ritesh Varyani details their polyglot data strategy unifying Spark, Trino, and ClickHouse with AI.

Firebolt Team

Unlocking Faster Iceberg Queries: The Writer Optimizations You're Missing

Your Apache Iceberg tables are slow because of how your data was written into them.

John Kennedy

Intrigued? Want to read some more?