Will JavaScript type annotations kill TypeScript?
The creators of Svelte and Turbo 8 both dropped TS recently saying that "it's not worth it".
Yes: If JavaScript gets type annotations then there's no reason for TypeScript to exist.
No: TypeScript remains the best language for structuring large enterprise applications.
TBD: The existing user base and its corpensource owner means that TypeScript isn’t likely to reach EOL without a putting up a fight.
I hope they both die. I mean, if you really need strong types in the browser then you could leverage WASM and use a real programming language.
I don’t know and I don’t care.
Data / DevOps

The Engineer’s Guide to Enriching Streams and Dimensions

A look at the power of joining streaming and batch data sources to power user-facing analytics.
Jul 26th, 2023 6:54am by
Featued image for: The Engineer’s Guide to Enriching Streams and Dimensions
Image via Shutterstock.

Data — it’s the new oil, the new gold, the new …  you get the idea. There’s a whole lot of it, and it’s valuable. But unless you’re into data hoarding, the point isn’t just to collect it; you want to use it. You want to massage it, query it, build with it and glean insights from it in real time, right? Well, brace yourself, because you’re in for a treat.

Streaming Data and Data Warehouses: What and Why

Picture a river fed by tributaries — our unending stream of water (data) from varied sources (such as websites, sensors, applications and more). This river flows swiftly into a vast reservoir (a data warehouse like Snowflake).

Streaming platforms like Kafka are like rivers, moving data in real time to their final destination. Data warehouses, however, are the reservoirs. They store the data, regulating and releasing it for use when needed. Much like the power of water can be harnessed via a hydroelectric dam, so too can the power of data be harnessed in motion using real-time analytics.

For human flourishing, rivers are dammed to generate power and fill reservoirs with drinking water. For developer flourishing, applications enrich streaming data with dimensional data in warehouses to build powerful, massively differentiated features.

This post explains the power of joining streaming and batch data sources to power user-facing analytics. I’ll cover why you want to do this and share three specific use cases that benefit from it.

The Power of Combined Data Warehouses and Data Streams

What happens when you combine the real-time capabilities of streaming data with the robust storage and querying capabilities of a data warehouse? Pure magic, that’s what.

The combination is a process we data junkies like to call “enrichment.” When you can capture event streams in real time and join them with dimension tables in a data warehouse, you can accomplish some pretty powerful use cases. In fact, data-source enrichment is a foundational element of a solid real-time data strategy.

But how can you, as a developer, make this dream team a reality? By using a real-time data platform, that’s how.

Real-time data platforms (like Tinybird) allow you to easily connect many data sources, handle huge amounts of data at top speeds and still manage to keep things simple when building real-time applications. With Tinybird, for example, you get the ease of SQL with the power of a highly optimized query engine at your disposal.

Use Cases Combining a Data Warehouse with Data Streams

Sure, all this talk about streams and warehouses is cool, but what does it look like in the wild? Time to dive into some specific examples of real-time data processing.

Use Case 1: Fraud Detection in Real Time Using Historical Data

Imagine you’re a data engineer, and you’ve just been asked to help detect and prevent fraud on your platform. This task calls for some serious real-time analytics.

As part of your event-driven architecture, you have a steady flow of transactions entering your system via Kafka, with each transaction being a potential crime scene. To detect possible fraud, you need to analyze these transactions as they arrive, in real time.

On the other hand, you’ve got historical transaction data stored in your data warehouse, like Snowflake. It contains patterns of legitimate and fraudulent transactions from the past.

You can use a real-time data platform as an online feature store, using historical data in your data warehouse to model fraudulent behavior while computing features on the fly via event streams.

For example, let’s assume you have historical transaction data stored in Snowflake, where each transaction is associated with a customer and has fields such as transaction_amount, transaction_location, etc. Also, let’s assume that there is a historical pattern of fraud where fraudulent transactions are typically significantly larger than a customer’s average transaction amount.

Now, let’s say a new transaction event comes in via your Kafka stream. You could use a SQL query like the one below to flag potentially fraudulent transactions based on comparing the incoming transaction amount with the customer’s historical average transaction amount:

To learn more about building a real-time fraud detection system, check this out

Use Case 2: Real-Time E-Commerce Personalization with Product Data 

In the fast-paced e-commerce industry, real-time insights are critical. You’re running an online store with transaction data flowing in via a Kafka stream and detailed product info stored in a Snowflake warehouse. The challenge? How to put the right offer before a shopper using data from these disparate sources.

Having a real-time analytics e-commerce dashboard can certainly help your business assess how sales are going in real time, but you don’t want to stop there. You want to actually increase conversion rates and average order value by updating your web store with a real-time personalization system.

A real-time data platform makes this possible. It can connect to your Kafka stream and data warehouse, pulling in live transaction data and product dimensions respectively. The advantage? You can write SQL queries to extract real-time insights such as top-selling products, shopper transaction history, sales trends and inventory status.

Here’s an example SQL query that combines transaction data from Kafka and hydrates the stream data with product data from Snowflake to get a complete list of the 100 most recent orders on the system. The beauty of real-time data is that it gives you an instant snapshot of your business, enabling data-driven decision-making on the fly.

Check out a full example use case of an e-commerce dashboard using Kafka and Snowflake. 

Use Case 3: Real-Time Log Analytics Informed by Historical Data

In this scenario, you’re dealing with two types of data: streaming log events coming in hot from a streaming platform like Kafka, and historical log data chilling out in Snowflake, your data warehouse.

Yes, you want to monitor and analyze these log events in real time, but you also want to compare them with historical data to spot any trends or anomalies. That’s where you use a real-time data platform to bring together your streaming log events and historical data, and then use a SQL query to analyze this data in real time.

For example, suppose each log event has a log_type (such as “ERROR,” “WARNING,” “INFO”) and a timestamp. You want to know, in real time, how the count of each log type in the last hour compares to its average hourly count over the last 30 days.

Here’s a SQL query you might use for this purpose:

Want to deep dive into log analytics? Check out this starter kit

Wrapping Up

Enriching streaming data with data warehouse data opens up a world of possibilities for creating nifty, data-savvy, real-time applications. Toss a real-time data platform like Tinybird into the mix, and the whole process becomes as breezy as a walk in the park. Tinybird makes it possible to unify many data sources and handle enormous volumes of data at lightning speed, and it’s pretty fun to build and publish new real-time data products.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma, Real.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.