Cloud Services / Data / Observability

Datafold, Hightouch Team up for Data Synchronizations

2 Aug 2022 6:42am, by

Datafold and Hightouch recently unveiled an integration of their two solutions that provides detailed visibility into how changes to tables affect operational data. The partnership means data engineers can now do at-a-glance impact analysis and root cause analysis for real-time data synchronizations.

Hightouch, a data activation company, specializes in synchronizing data between backend sources (such as cloud warehouses) and operational ones, like CRM and email tools. Real-time applications of these capabilities are commonly referred to as data activation, which automates ensuing action, like generating email campaigns for sales and marketing teams.

Datafold, a data reliability company, makes this process more trustworthy by applying its data diffing approaches to dbt models that frequently transform data for Hightouch deployments. By combining rapid comparisons between source and destination tables with detailed data provenance, Datafold minimizes the potential for breaking tables when they’re changed.

Also read: Datafold Quashes Data Mismatches with Open Source Data Diff

Conversely, it also swiftly illustrates what happened when there are unexpected results, which is imperative for low latent data synchronizations.

“When you’re just doing it for internal dashboards, breaking a chart is annoying to business users,” acknowledged Matt David, Datafold director of growth. “But when your data ends up sending emails to the wrong people, you really can’t have that.”

Data Activation

David’s quote alludes to the increasing importance of what’s today called data activation. It also implies how valuable data from cloud warehouses has quickly become for operations, particularly when it’s synchronized between these settings. The days in which data warehouses exclusively focused on historic insights changed, perhaps forever, with the potential of cloud warehouses. Now, these backend data sources are informing low latent operational activity, like generating advertisements in real-time on e-commerce sites based on customer profiles and customer history.

The potential revenues generated from conversions, as well as the customer-facing nature of these real-time deployments, have amplified the need for reliable results. “This is a very automated thing where if you get that wrong, all of a sudden you’re putting money into ads that don’t work,” David said. “The new trend of data quality atop the modern data stack is very new stuff, but that’s how this space is evolving.”

Conventional Approaches

dbt is widely adopted for transforming data for these and other real-time Hightouch data synchronizations because it enables engineers to use SQL for these modern use cases. The danger is when changes are made to tables and organizations lack equally modern means to assess their impact. David observed that information routinely changes in tables, for reasons as common as new data sources, product names, different tools and more.

Methods that are more automated often include spot checking, “which is feasible when you’ve got a tiny team or not that complex a pipeline,” David acknowledged. “But once you’ve got hundreds of tables and several Hightouch things going through, individually spot checking just doesn’t make a lot of sense.” More automated methods involve unit tests, which may be more scalable, but aren’t always reliable. However, subtle changes to tables, or ones with unknown consequences, make it easy for undesirable results to “sneak in under the radar,” David remarked.

Contemporary Approaches

Datafold’s data lineage and data diffing capabilities are more effective and less time-consuming — which is beneficial for Hightouch’s real-time synchronizations. The former involves a graph of an organization’s data provenance that’s continually updated as frequently as organizations like. Thus, engineers can quickly assess it to see the tables that are most directly impacted by any changes made to them. More importantly, perhaps, this information is the foundation for the solution’s diffing techniques.

According to David, organizations can “leverage that graph to figure out everything that needs to be diffed, and we put that in an easy-to-read report in your pull request. So, you can get your audit right there and click into it for more in-depth steps.” The binary nature of this approach is another one of its strengths. Users can look at downstream implications before changes are made. Additionally, they can look upstream to see where exactly a problem occurred when one happens to arise.

The New Reality

The reality is that the new stack of modern cloud warehouses, real-time synchronizations, and operational systems requires data quality techniques to match their speed and scale. The integration between Datafold and Hightouch goes a long way toward providing these advantages. As such, it has the potential to increase the very value derived from data in the cloud era, in which data not only supports decision-making but timely action that quickly becomes fleeting if missed.

The New Stack is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: Real.

Feature image via Pixabay.