The company had a homegrown analytics solution that restricted its offering and market growth. While it was fast enough for specific queries and data sets, it was limited both in the types of analytics and the size of data it could support for each customer, and hard to change.
Onboarding was challenging and time-consuming because each new customer required a lot of customization to tailor data and analytics.
Adding new analytics took a very long time because it required a lot of custom work to load the data, build and optimize each query.
Scalability was an issue which limited both the data sizes and types of analytics.
They started evaluating different solutions to support segment analytics by its customers. They considered many alternatives as part of their evaluation.
Elasticsearch: They already use Elastic for keyword searches. But it is very expensive and hard to maintain for analytics. You not only need to be an Elastic expert. Loading data is also complex and expensive. It would have cost 6-20X the cost just for the initial use case of segment analytics.
BigQuery: They also evaluated BigQuery at a small scale, but quickly realized it wasn’t the best choice. Firebolt demonstrated 2-3x faster ingestion, and 3-70x faster query performance. Because the company is an AWS shop, they couldn’t justify adopting Google Cloud and BigQuery just for one use case.
They started using Firebolt for segment analytics. Their customers use segment analytics to help understand customer behavior, preferences, loyalty, and cross-selling opportunities.
They have over 200TB of customer data (compressed), spanning more than three years, in Firebolt. This has been increasing as they add data that they could not support in their previous solution. They now write SQL and run it on Firebolt to Extract, Load, and Transform (ELT) data from their data lake into Firebolt. All the segment analytics are accessed via REST APIs that invoke the SQL from their Web-based analytics UI.
With Firebolt, they are now much more agile. It used to take 6-9 months to roll out new analytics features to customers. The existing system required custom development to create the new schema and queries, and optimize for performance. When they first adopted Firebolt, the time it took to get started, become experts and deploy their first project into production was close to 3 months. But now that they are experts, new features take a few days.
Customer onboarding is now much faster as well because they were able to automate the personalization for each customer that used to take weeks of custom development. What made this possible was Firebolt’s native support for JSON. Part of the first project involved automating the personalization using two JSON columns for pages and timestamps and Lambda-style array functions in SQL. The SQL performs complex calculations that personalize queries for each customer.
The greater agility and flexibility comes from being able to do everything in SQL - from ELT to all the queries and personalization - and REST APIs. This has made it much easier to use 3rd party infrastructure and add all new kinds of analytics, including ad hoc.
Before Firebolt, it could take up to 3 months just to develop new ETL jobs. A data engineer would use AWS Lambda to process the data, perform complex calculations, and load it into the custom system.
With Firebolt, they now write SQL that uses lambda array functions, and deploy the SQL for ELT onto a Firebolt engine, which is a Firebolt compute cluster. You can deploy any number of engines to support different workloads like ELT or analytics, or use engines to scale out horizontally to support more users or queries. It now can take just hours to add any new data sets.
The same data engineer who used to write the AWS Lambda code was able to immediately start using lambda array functions in SQL, and loves it.
With Firebolt, every query they run is 2 seconds or less, with a lot of sub-second query times. A big part of what helped was indexing. Most of their query performance comes from a combination of Firebolt’s faster data access times, query optimization engine and sparse indexing. Sparse indexes are primary composite indexes on tables that include the most frequently used columns on each table. For one large with a lot of computed aggregates, they added an aggregating index, which you declare once in SQL and Firebolt automatically maintains. The index immediately doubled performance with minimal additional computing.
The company’s previous custom system could not handle all of the data their customers wanted. With Firebolt, they can now load any data their customers need and easily analyze the entire 3 years of data if needed.
One of the biggest benefits came from Firebolt’s decoupled storage and compute architecture. Their customers could be assigned to different Firebolt engines. As customers perform queries, Firebolt only fetches and caches the data ranges they need, not entire partitions of data or the entire 200TB+ of data. Because this is a much smaller data set, it not only requires much less compute. The queries run much faster than they would against the entire data set. This architecture was ideal for the way they needed to scale and support their customers.
Not only was Firebolt 5x lower in infrastructure costs compared to their existing system, which didn’t provide as much functionality, flexibility or hold as much data. It was also as much as 20x less expensive than the alternatives they evaluated. Not only were there even more savings from the lower costs of engineering and administration. There were also greater revenues through a much better service.
With segment analytics deployed, the company sees more opportunities to improve their analytics, such as keyword analytics. There are also opportunities upstream, The data prep team sees opportunities to move some Spark jobs to Firebolt for the greater performance and agility.