Reverse ETL

Reverse ETL - Definition

What Is an ETL Tool?

ETL stands for Extract, Transform and Load. It’s a data integration pattern born in the 1970s that defines a standard pipeline used to move information from different sources to a single target. Usually, the final destination is a cloud-based data warehouse or data lake in modern applications. ETL processes aim to create a single source of truth (SSOT). Cloud ETL processes follow the same philosophy but without relying on physical storage directly. Instead, everything occurs online, typically on a cloud-based data warehousing solution, taking away the need to directly invest in hardware and its maintenance.

ETL tools can significantly simplify these procedures, therefore saving valuable resources. The characteristics to look for in a good ETL tool includes a large amount of connector’s type availability, portability, ease of use, and complete cloud compatibility. In addition, it should work natively in single-cloud, multi-cloud, or hybrid environments.

What Is Reverse ETL?

Reverse ETL is a relatively new inverse data pipeline pattern approach. Instead of copying data into the data warehouse, Reverse ETL copies data out of the warehouse into external systems, thus decentralizing knowledge.

The main reason for moving data out of the single source of truth is to feed operational systems (such as Hubspot, Marketo, Salesforce, Netsuite, SAP, Workday, Gainsight, Zendesk), which grants users powerful capabilities and several advantages. Those systems could directly be linked to the upstream data sources that continuously feed the data warehouse. But it will be much simpler and more consistent to connect them to the SSOT directly.

What Reverse ETL does in practice is to make data “operational” or “actionable.” The point of the pipeline process is to drive operations by delivering data to the exact place it’ll be most helpful. Analytics executed on this kind of data is called operational analytics. 

Until recently, the implementation of this pipeline was made on-demand by teams that wrote their API connectors, mapping fields from the data warehouse to operational systems. However, writing these connectors is difficult. It presents a series of common challenges when synchronizing two different data sources (complex field mapping, batching implementation, handling of retries, and rate-limit configuration). That’s why reverse ETL solutions have entered the market. With already made connectors to numerous systems and (usually) a visual interface, these solutions allow users to continuously sync or define what triggers the syncing between the two systems. 

There are many use cases for reverse ETL as it enables Sales, Marketing, Data, and Product teams to access near real-time updated data with the products they already use. In addition, it empowers data automation at a pre-defined frequency. Let’s see some of the countless examples:

  • Salesforce: updating customer overview as soon as possible for efficient customer cases issue resolutions
  • Zendesk: synchronizing internal customer data to prioritize tickets from VIP users
  • Hubspot: enabling customer relationship specialized analysis and easier access for non-technical users
  • Workday: obtaining up to date financial data to identify new opportunities and shift to meet market demand
  • Gainsight: coordinating customer engagements across different channels and teams

Advantages of Reverse ETL

Reverse ETL can smoothen the data sharing process significantly. Together with enabling operational analytics, Reverse ETL presents a series of significant technical collateral advantages: 

  • Data Isolation: decentralizing data from the data warehouse or the data lake will automatically grant a significant isolation level. External systems no longer need access to the whole knowledge base but only a tiny part. This has the potential to speed up processes. 
  • Performances: while it’s certainly possible to use business intelligence tools and queries to analyze the data using the data warehouse directly, moving critical data as near as possible to users of external systems comes with faster response times.
  • Shared load: by decentralizing read operations, the data warehouse itself will not be responsible for responding to a series of queries that will now use only the external systems, thus lightening the considerable load that usually falls upon those systems.
  • Data Transformation: While moving data from the data warehouse to external systems, it’s possible to enrich and modify data to adapt to the specific use case. For example, once fields in downstream apps are labeled in more intuitive ways that align with the terminology adopted by the app’s users, the prospect of searching for and finding data becomes significantly easier.

Challenges of Using Reverse ETL

Like all data technologies, reverse ETL can also be challenging to use, especially if it’s not the most efficient option for your organization.

  • Multiple sources of truth: even if synchronized the best way possible, having the same information duplicated among different sources could bring information to scatter and knowledge unreliable and redundant. 
  • Increased Costs: with Reverse ETL, a significant part of the load that should weigh onto the data warehouses will be shifted towards external Systems. However, reverse ETL hand-made implementations or reverse ETL out-of-the-box solutions come with a not ignorable maintenance cost.
  • Not feasible for real-time scenarios: Typically, a reverse ETL solution will come with some data propagation delay. The ingestion of data of upstream systems into the warehouse will not be reflected in real-time onto external systems. Operational procedures should be directly linked to the upstream systems through an event-driven architecture in real-time scenarios.
  • Lack of control and consistency: existing reverse ETL tools lack management and compliance over data replication and transformation. A great effort of Quality Analysis by companies using reverse ETL tools is thus needed to achieve data consistency.
  • Error propagation: If a data ingestion error occurs in the data warehouse, the problem will be reflected automatically onto external systems. You need to confirm whether the data is accurate. In addition, not all data warehouse tables are refreshed daily, monthly, or yearly. It is critical to analyze each dimension and fact table before using it in the reverse ETL process. Sometimes, the reverse ETL approach may require changing the loading strategies, which is a complex task that needs to be done with caution, ensuring the change doesn’t affect the existing BI report.