• 200M new rows/day
• 90 days of data retained
• 1.8B records maintained
"We improved performance by using Firebolt, and our reporting became nearly real-time - below one-hour latency in reports."
- Nayan Mevada, CTO at Infy.TV
Infy.TV helps publishers drive user engagement through intelligent content integration and monetization of streaming services.
The Challenge: Ingesting and Analyzing Over 200M Records Daily
As a rapidly growing startup in the ad tech space, Infy.TV had been hard-pressed to build out a scalable architecture to handle frequent data ingestion and low-latency queries for their bespoke, customer-facing financial metrics dashboard.
Infy.TV currently generates over 200 million records per day. Efficiently ingesting and analyzing these kinds of data volumes maxed out the capabilities of a scaled-out instance of Postgres on AWS’ RDS (Relational Database Services) offering. Ingestion from Infy.TV’s in-memory Aerospike document store into an aggregated form in RDS could only be processed once a day. The architecture for the RDS cluster simply couldn’t scale for high-performance analytical queries, and the cost of the entire setup was becoming prohibitive.
The Solution: Hardware Efficient Architecture for Sub-Second Performance
Infy.TV turned to Firebolt for its high-performance, hardware-efficient and cloud-native scalable architecture for analytics workloads.
The company established a micro-batch data ingestion pipeline into Firebolt, increasing daily ingest frequency to hourly loads. They leverage AWS Kinesis Data Firehose to orchestrate data movement to S3, with hourly ingestion pipelines defined, scheduled and executed from S3 into Firebolt.
In Firebolt, Infy.TV initially set up a data retention policy of 90 days, capturing, storing and analyzing 18 billion records on a rolling basis. With some modifications to the target data model, as well as some minor tuning of Firebolt’s primary index on the primary fact table, Infy.TV established sub-second query performance on this massive dataset with little system configuration, on a very modestly sized engine. Users are now able to access deep insights into their content performance metrics via both customer visualizations and Looker dashboards.
How it Works
Infy.TV’s data model and query requirements introduced some novel challenges, which have been solved with the following approach:
Sample ETL Logic
Sample Query Logic