Engines are the compute infrastructure for Firebolt’s Cloud Data Warehouse. Customers will use engines to ingest data from AWS S3 into Firebolt. Once the data is ingested into Firebolt, customers will use engines to run queries on the ingested data.
An engine has three key dimensions: 1/ Type - This refers to the type of nodes used in an engine 2/ Cluster - A collection of nodes of the same type and 3/ Nodes - The number of nodes in each cluster. An engine comprises one or more clusters. For a given engine, every cluster in the engine has the same type and same number of nodes.
There are four node types available in Firebolt - Small, Medium, Large and X-Large. Each node type provides a certain amount of CPU, RAM and SSD. These resources scale linearly with the node type - for example, an “M” type node provides twice as much CPU, RAM and SSD that is provided by a “S” type node.
You can use anywhere from 1-128 nodes per cluster in a given engine.
You can use up to 10 clusters per engine.
CREATE ENGINE IF NOT EXISTS MyEngine WITH
TYPE = “M” NODES = 2 CLUSTERS = 2;
All operations in Firebolt can be performed via SQL or UI. To create an engine, you can use the “CREATE ENGINE” command (shown above), specifying a name for the engine, number of clusters the engine will use, number of nodes in each cluster and the type of the nodes used in the engine. After the engine is successfully created, users will get an endpoint that they can use to submit their queries. For example, you can create an engine named MyEngine with two clusters, each with two nodes of type “M” as below:
For more details on how to create an engine, see the Guides section in documentation.
In Firebolt, you can scale an engine across multiple dimensions. All scaling operations in Firebolt are dynamic, meaning you do not need to stop your engines to scale them.
Scale Up/Down You can vertically scale an engine by using a different node type that best fits the needs of your workload.
Scaling Out/In You can horizontally scale an engine by modifying the number of nodes per cluster in the engine. Horizontal scaling can be used when your workload can benefit by distributing your queries across multiple nodes.
Concurrency Scaling Firebolt allows you to add or remove clusters in an engine. You can use concurrency scaling when your workload has to deal with a sudden spike in the number of users or number of queries. Note that you can scale along more than one dimension simultaneously. For example, the command below changes both the node type to “L” and the number of clusters to two.
ALTER ENGINE MyEngine SET TYPE = “L” CLUSTERS = 2;
All Scaling operations can be performed via SQL using the ALTER ENGINE statement or via UI. For more information on how to perform scaling operations in Firebolt, see the Guides section in documentation.
No. While you can add as many nodes as you want in a single scaling operation, Firebolt offers granular scaling, allowing you to incrementally add nodes to your engine, one node at a time. For example, if you have an engine, MyEngine, that currently has a single cluster with two nodes of type “M”, you can add one more node with the SQL command below:
ALTER ENGINE MyEngine SET NODES = 3;
After the above command is successfully executed, MyEngine will have a single cluster with three nodes of Type “M”. For more information on granular scaling, see here.
No. You can modify your engine configuration dynamically, meaning you don’t have to stop your running engines. Hence your applications will not incur any downtime.
Your queries will continue to run uninterrupted during a scaling operation. When you perform horizontal or vertical scaling operations on your engine, Firebolt adds additional compute resources per your new configuration. While new queries will be directed to the new resources, the old compute resources will finish executing any queries currently running, after which they will be removed from the engine.
Yes. By default, creating an engine would result in the creation of the underlying engine clusters and start the engine. This would enable the engine to be in a running state where it is ready to start serving the queries. However, you have the option to defer the creation of the underlying clusters for an engine by setting the property “INITIALLY STOPPED” to True while calling CREATE ENGINE. You can start the engine at a later point, when you are ready to start running queries on the engine. Note that you cannot modify this property after an engine has been created.
CREATE ENGINE IF NOT EXISTS MyEngine WITH
TYPE = “S” NODES = 2 CLUSTERS =1 START_IMMEDIATELY = FALSE;
Starting and Stopping engines in Firebolt can be done via SQL or UI. Use the following command to start an engine, MyEngine.
START ENGINE MyEngine;
To stop the same engine, you can use:
STOP ENGINE MyEngine;
You can use the AUTO_STOP feature available in Firebolt engines to make sure that your engines are automatically stopped after a certain amount of idle time. Engines in stopped state will not be charged, hence do not incur any costs. As with other engine operations, this can be done via SQL or the UI. For example, while creating an engine, you can specify the idle time, using AUTO_STOP, as below:
CREATE ENGINE IF NOT EXISTS MyEngine WITH
TYPE = “S” NODES = 2 CLUSTERS =1 AUTO_STOP = 15;
The above command will ensure that MyEngine will be automatically stopped if it has been idle for 15 minutes continuously. Alternatively, you can achieve the same after an engine has been created.
ALTER ENGINE MyEngine SET AUTO_STOP = 15;
If the engine has the AUTO_START option set to True, an engine in a stopped state will be automatically started when it receives a query. By default, this option is set to True. If this option is set to False, you must explicitly start the engine using the START ENGINE command.
No. In Firebolt, engines and databases are fully decoupled from each other. A given engine can be used with multiple databases and, conversely, multiple engines can be used with a given database.
No. While there is no theoretical limit on the number of databases you can use with a given engine, note that the configuration of your engine will determine the performance of your applications. Based on the performance demands of your applications and the needs of your business, you may want to create the appropriate number of engines.
Firebolt provides three different observability views that provide insight into the performance of your engine. 1/ engine_running_queries - This view provides Information about currently running queries. This includes whether a query is running or in the queue. For queries that are currently running, this view also provides information on how long it has been running. 2:/ engine_query_history - This view provides historical information about past queries - for each query in history, this includes the execution time of the query, amount of CPU and Memory consumed and amount of time the query spent in queue, among other details. 3/ engine_metrics_history - This view provides information about the utilization of CPU, RAM and Storage for each of the engine clusters. You can use these views to understand whether your engine resources are being utilized optimally, whether your query performance is meeting your needs, what percentage of queries are waiting in the queue and for how long. Based on these insights, you can resize your engine accordingly.
You can use the ALTER ENGINE command to dynamically change your engine configuration to meet the evolving needs of your workload(s). To change the amount of compute, memory or disk size available on your compute nodes, you can change the node type used in your engine. Based on your desired query performance, you can horizontally scale your engine by modifying the number of nodes available per cluster. If you need to improve the throughput of your workloads, serving a higher number of concurrent requests, you can increase the number of compute clusters in your engine.
No. Firebolt provides seamless online upgrades, which are automatic and incur zero downtime for your workloads. Firebolt takes care of transparently upgrading your engines, so users don’t need to worry about planning and managing maintenance windows.
Firebolt provides Role-based Access Control (RBAC) to help customers control which users can perform what operations on a given engine. For example, you can provide users with only the ability to use or operate existing engines but not allow them to create new engines. In addition, you can also prevent users from starting or stopping engines, allowing them to only run queries on engines that are already running. These fine-grained controls help ensure that customers do not end up with runaway costs resulting from multiple users in an organization creating and running new engines.
Firebolt provides a metric called Firebolt Units (FBU) to track the consumption of engines. Each node type provides a certain amount of FBU, as shown below:
For a given engine with a certain configuration (Type, Number of nodes and Number of clusters), you can calculate the FBU of the engine as below:
FBU-per-Hour for a given Engine = (FBU of node Type x Nodes x Clusters)
FBUs are consumed by engines only when are in a RUNNING state, and can be calculated as below:
FBU Consumed = (FBU per Hour / 3600) x (Duration for which engine was running in seconds)
For example, if you have an engine with the following configuration that was running for 30 minutes: TYPE = “S”, NODES = 2, CLUSTERS=1
This engine will have 16 FBUs available per hour (8 x 2 x 1). However, since the engine was running for 30 minutes, you will be charged only for 8 FBUs.
For more information on engine consumption, including examples visit the documentation section here.
You can use the following information schema view to get data on the FBUs consumed by engines in a given account.
Information_Schema.engine_metering_history;
The above view will provide hourly details on the FBU consumption for each of the engines in a given account. Note that for any engines that are currently running, this view will provide real-time information on the FBUs consumed by the engine, up to the latest second.
Firebolt is engineered to execute thousands of queries concurrently without compromising speed. It offers unparalleled cost efficiency with industry-leading price-to-performance ratios, and it scales seamlessly to handle hundreds of terabytes of data with minimal performance impact.
Firebolt uses advanced query processing techniques such as granular range-level data pruning with sparse indexes, incrementally updated aggregating indexes, vectorized multi-threaded execution, and tiered caching, including sub-plan result caching. These techniques both minimize data being scanned and reduce CPU time by reusing precomputed, enabling query processing times in tens of milliseconds latency on hundreds of TBs of data.
Data pruning in Firebolt involves using sparse indexes to minimize the amount of data scanned during queries. This allows for tens of millisecond response times by reducing I/O usage, making your queries highly performant.
Firebolt scales to manage hundreds of terabytes of data without performance bottlenecks. Its distributed architecture allows it to leverage all available network bandwidth and execute queries at scale with efficient cross-node data transfer using streaming data shuffle.
Firebolt's aggregating index pre-calculates and stores aggregate function results for improved query performance, similar to a materialized view that works with Firebolt's F3 storage format. Firebolt selects the best aggregating indexes to optimize queries at runtime, avoiding full table scans. These indexes are automatically updated with new or modified data to remain consistent with the underlying table data. In multi-node engines, Firebolt shards aggregating indexes across nodes, similar to the sharding of the underlying tables.
Firebolt's engine uses vectorized execution, which processes batches of thousands of rows at a time, leveraging modern CPUs for maximum efficiency*. Combined with multi-threading, this approach allows queries to scale across all CPU cores, optimizing performance.
* Boncz, Peter A., Marcin Zukowski, and Niels Nes. "MonetDB/X100: Hyper-Pipelining Query Execution." CIDR. Vol. 5. 2005.
* Nes, Stratos Idreos Fabian Groffen Niels, and Stefan Manegold Sjoerd Mullender Martin Kersten. "MonetDB: Two decades of research in column-oriented database architectures." Data Engineering 40 (2012).
Sub-plan result caching allows Firebolt to reuse intermediate query artifacts, such as hash tables computed during previous requests when serving new requests, reducing query processing times significantly. It includes built-in automatic cache eviction for efficient memory utilization while maintaining real-time, fully transactional results.
The Shuffle operation is the key ingredient to executing queries at scale in distributed systems like Firebolt. Firebolt leverages close to all available network bandwidth and streams intermediate results from one execution state to the next whenever possible. By overlapping the execution of different stages, Firebolt reduces the overall query latency
Firebolt first distributes data across all nodes using shuffle and ensures that total cluster memory is used. If this is still not enough, Firebolt selectively spills portions of the data into local SSDs to manage intermediate data structures beyond main memory. This approach allows Firebolt to run queries with working set sizes that exceed main memory, allowing the system to scale even with limited hardware resources.
Firebolt engines can scale up and scale out. For the most demanding high QPS workloads, Firebolt also supports concurrency scaling by allowing additional clusters to be added to the existing engine. Firebolt supports up to 10 clusters within a single engine, enabling concurrency scaling to manage heavy workloads. New clusters can be added on-demand to handle spikes in concurrent queries, ensuring optimal performance under any load.
Everything in Firebolt is done through SQL. Firebolt’s SQL dialect is compliant with Postgres’s SQL dialect and supports running SQL queries directly on structured and semi-structured data without compromising speed. Firebolt also has multiple extensions in its SQL dialect to better serve modern data applications.
Firebolt has full support for array data type, both ANSI SQL/Postgres compliant with support for correlated queries, lateral joins and UNNEST. But Firebolt also has a rich collection of SQL functions, making dealing with arrays simpler and more efficient, including array lambda functions.
Firebolt offers observability views through information_schema, allowing you to access real-time engine metrics. These insights help you correctly size your engines for optimal performance and cost efficiency. Read more here- https://docs.firebolt.io/general-reference/information-schema/views.html
Yes, Firebolt is designed for ease of use, leveraging SQL simplicity and PostgreSQL compliance. It allows data professionals to manage, process and query data effortlessly using familiar SQL commands.
For an in-depth understanding of Firebolt's capabilities, explore the Query Life Cycle Whitepaper, Firebolt Docs, and Pricing information available on our website.
Firebolt provides the following options for importing data:
Parquet and CSV file formats are supported at the current time. For other formats such as AVRO, JSON or ORC, please use the external table option.
Firebolt provides the ability to filter on file-level information such as name, modified time, and size using metadata fields: $source_file_timestamp, $source_file_name, $source_file_size. For more information, please refer to the “Load data using SQL” guide.
Streaming ingestion is on the roadmap for Firebolt. To address near real-time ingestion scenarios, Firebolt recommends using micro batching. Various tools such as Kinesis Firehose can be used to persist data to S3 in Parquet or Avro format.
In general, a Firebolt engine with multiple nodes (scale-out configuration). To size an engine appropriately, follow steps below:
CREATE ENGINE ingest_engine TYPE=S NODES=1;
SELECT event_time, cpu_used, memory_used
FROM information_schema.engine_metrics_history
WHERE event_time > CURRENT_DATE - INTERVAL ‘1’ day;
ALTER ingest_engine SET NODES=4;
Yes. Firebolt supports updates and deletes with transactional consistency. Additionally, Firebolt delivers updates and deletes with sub-second latency. For additional information on updates and deletes, please refer to our data management lifecycle blog. You can also refer to our DML benchmark for more information.
The “COPY TO” SQL command is available to export data to S3.
CSV, TSV, JSON and Parquet formats are supported.
Yes, Firebolt supports the ingestion and manipulation of semi-structured data types, such as JSON. The JSON data can be ingested either directly into text columns or parsed into individual columns. This flexibility allows for both schema-on-read and flattened into individual columns, depending on the nature of the input data and the desired query performance. Additionally, array data types combined with lambda expressions can be used to process repeated data present in JSON.
Firebolt optimizes data ingestion performance through:
Firebolt uses transactional semantics and ACID (Atomicity, Consistency, Isolation, Durability) guarantees for data ingestion operations. Each DML execution is treated as a separate transaction. Firebolt ensures that ongoing reads or other operations do not see data being ingested until the transaction is completed and committed, maintaining data consistency and integrity. There are no partial inserts or copies to clean-up.
Firebolt provides the ability to transform data during ingestion. You can use standard SQL functions to change data types, perform arithmetic operations, or use string functions to manipulate text data. When ingesting data from external sources, transformations can be applied directly within the INSERT INTO SELECT statement.
Firebolt is fully ACID compliant and treats every operation as a transaction. For example, a COPY FROM operation using schema inference will not present a partial view of the table while the copy is running. The entire table is presented only when the entire COPY operation is successful. In this case, the entire COPY FROM operation is a transaction.
Multistage distributed execution in Firebolt allows complex ELT queries to utilize all resources of a cluster. A stage can be split across different nodes of the cluster, allowing every node to work on a part of the data independently This approach optimizes resource utilization and speeds up data transformation by parallelizing data extraction, loading, and transformation steps.
Firebolt first distributes data across all nodes using shuffle, and makes sure to use total cluster memory. If this is still not enough, Firebolt selectively spills portions of the data into uses data spilling techniques that leverage local SSDs to manage intermediate data structures beyond main memory. This approach allows Firebolt to run queries with working set sizes that exceed main memory, allowing the system to scale even with limited hardware resources.
Yes, ELT processes in Firebolt can be automated using several tools and approaches:
With Firebolt, you can run your ELT jobs on a separate engine that is isolated from the Firebolt engine supporting customer-facing dashboards. With this isolation, customer-facing workloads are not negatively impacted.
To reduce costs, the isolated ELT engine can be run with auto_stop and auto_start configured to eliminate idle time and can be right-sized to meet the needs of ELT jobs. Firebolt engines can be dynamically scaled to meet the needs of individual workloads. For more information, please refer to the elasticity whitepaper.
Firebolt enables choice of engine shapes to address ingestion and analytics workload through multidimensional elasticity defined by the following attributes:
The choice of engine shape for ingestion and analytics workloads comes down to the nature of the workload and the resources requirements (CPU, RAM etc) of the workloads themselves.
Yes, Firebolt supports read-write from multiple engines at the same time. Firebolt leverages three-way decoupling of compute, storage and metadata. As a result, compute is stateless and any engine can read-write from any database. Additionally, the metadata layer enables ACID compliance and provides for strong consistency across engines.
Yes. All Firebolt engines support read-write from any database. You can query and ingest data from the same engine.
Yes, you can ingest and query data simultaneously on the same Firebolt table. Firebolt engines support concurrent read and write operations on the same table while maintaining strong consistency. Any committed changes made to the data, such as inserts or updates, are immediately visible to all queries across all engines without requiring additional data or metadata synchronization.
Yes. Engines can be configured to automatically stop after a certain period of idle time. This ensures idle engines are automatically turned off. Additionally, the engines can be configured to start automatically when a new query is addressed at the engine endpoint. These two features are handy to control compute consumption.
Firebolt achieves workload isolation by decoupling storage from compute resources (engines). Each engine operates independently, allowing multiple workloads to run simultaneously without affecting each other’s performance. This isolation ensures that workloads can scale independently based on their unique requirements without impacting other workloads accessing the same data.Firebolt also offers strong consistency - ensuring data changes are immediately visible across all engines.
Workload isolation enables multiple benefits:
Firebolt provides capabilities that are optimized for different workload profiles. For example, batch ingestion might benefit from parallel processing and scale-out while customer facing workloads that rely on aggregations and lookups can benefit from Firebolt’s indexing capabilities. Similarly, Firebolt’s multi-stage distributed query engine is equipped to address long running queries that handle complex transformations. Additionally, Firebolt engines can be granularly scaled to address workload demands.
You can dynamically scale Firebolt engines in three ways:
Firebolt provides observability metrics that help monitor engine performance. Key metrics include:
These metrics, accessible through the engine_metrics_history
view, guide decisions on whether to scale up, scale out, or adjust engine configurations based on current and historical engine utilization.
Access to engines and workload resources in Firebolt is managed using role-based access control (RBAC). Administrators can define roles and assign them specific permissions, such as the ability to start/stop engines or modify engine configurations. This ensures that only authorized users can manage critical workloads, enhancing security and operational control.
Yes, more details in our End User License Agreement (EULA) and Data Processing Addendum (DPA).
Yes, our SOC 2 Type-2 + HIPAA report is available subject to a Non-Disclosure Agreement (NDA).
Firebolt is certified for ISO 27001 and ISO 27018. Certification reports are available upon request.
Yes. As a business associate under HIPAA, we support business associate agreements (BAAs) to ensure healthcare data protection. Our SOC 2 Type-2 + HIPAA report is available subject to a Non-Disclosure Agreement (NDA)
A separate BAA with Firebolt is required since our service includes proprietary technology and other sub-processors not covered under the standard AWS HIPAA Eligible Services.
Firebolt is not PCI-DSS compliant and does not permit credit card data storage on its platform.
While Firebolt adheres to NIST SP 800-53, NIST 800-171, and NIST CSF guidelines, we are not currently FedRAMP compliant.
Firebolt processes customer data in compliance with both GDPR and CCPA regulations. We securely collect, store, and manage data according to the highest standards, ensuring that all GDPR and CCPA requirements are met.
For Data Subject Access Requests (DSARs) or any privacy-related inquiries, please reach out to us at privacy@firebolt.io
Yes, our policies, including Disaster Recovery (DR) and Business Continuity Plans (BCP), are tested regularly to ensure effectiveness.
Yes, customer access is managed via Auth0, while organizational access is controlled using Okta. All accesses are logged and monitored, and alerts are in place for any unauthorized configuration changes across our systems.
We use tools like SCA, SAST for code analysis, along with practices such as Fuzzing, scanning for pipeline weaknesses (like the use of unverified external sources), and secret scans as part of our secure software development lifecycle.
We use AWS Shield, WAF, and other logical layers to protect against DDoS. Additionally, we leverage auto-scaling to maintain availability during attacks by dynamically adjusting resources like EC2 instances, ELBs, and other global services capacity. (Though some scenarios may require manual intervention).
Firebolt employs a comprehensive security strategy that includes network security policies, encryption practices, tenant isolation, and governance controls. We are committed to safeguarding your data through state-of-the-art security systems, policies, and practices.
Yes, both IP Allow/Deny-listing is supported. More details on our Network Policy page.
Yes, this feature is supported. more details on our Identity Management page.
Yes, MFA is supported. more details on our MFA page.
Yes, RBAC is supported. more details on our RBAC page.
Yes, customers can choose the region in which they run the service to meet data sovereignty requirements. More on our Available Region page.
Yes, we support encryption for data at rest and in motion. More details are available on our Security blog.
Firebolt database itself inherently reduces the risk of SQL injection by minimizing the use of certain vulnerable constructs, customers are still encouraged to implement additional controls at their application level such as:
Besides our runtime binary hardening, Firebolt leverages a runtime protection tool that provides deep visibility and protection at the process level.
Customers own their data and can delete it via commands like DROP DATABASE. Regardless, and upon contract termination, all customer data is deleted within 30 days.
Researchers can report vulnerabilities by contacting security@firebolt.io.
Customer data is stored in S3 buckets with high availability and durability. Our recovery objectives are:
Yes, our insurance includes:
Yes, we support encryption for data at rest and in motion. More on our technical best practices can be found in our Security blog: “Building Customer Trust: A CISO's Perspective on Security and Privacy at Firebolt”