Cloud Native / Data

Striim Tackles Data Integration Across Clouds, On-Premises

16 Jun 2021 2:22pm, by

Increasingly applications rely on real-time data, often from multiple data stores in the cloud or on premise, and might even operate on that data before it ever reaches storage.

That has created added pressures for data integration as the volume and velocity of data enterprises deal with has increased exponentially.

Striim co-founders Ali Kutay, Steve Wilkes, Alok Pareek, and Sami Akbay set out to tackle the challenges of data integration across multiple systems, both on premises and in the cloud.

The three have a long history together, previously building GoldenGate Software, which was acquired by Oracle in 2009. They founded Palo Alto, California-based Striim in 2012.

“When we launched the company, the mission was to provide an end-to-end streaming platform that could collect data in real-time, process that data, deliver that data to end systems and do that in a secure and reliable way,” said Wilkes.

“Over the last few years, use cases have transitioned to become predominantly cloud-oriented. A few years ago, there was a lot of kind of big data requests — people wanted to write data into a data lake and maintain a data lake, keep that up to date with real-time data. Not so much anymore. That has changed a lot. The use cases [now] are really around migrating databases to the cloud.”

But customers can’t just shut down a database for the cloud move, and since they have to keep running a big aspect is change data capture — tracking all the changes taking place in the data in real-time.

It’s a Java-based real-time data-streaming platform that incorporates built-in encryption, stream processing, integration and delivery. Using clustered server technology, you can spin up a single instance of Striim or a cluster of four or 10, however many servers you need to handle your different workloads.

Striim continuously ingests data from myriad sources including databases, log files, messaging systems, cloud apps and internet of things devices. The real-time stream processing can include tasks such as filtering, transformations, aggregations, masking and metadata enrichment, taking place in-memory with continuous SQL-based queries.

“With a streaming continuous query, you create the query, and it sits there in memory. And then a data stream is feeding it. And every time there’s new data on that stream, it goes into the query and the results come out. So it’s continuous,” Wilkes explained.

“And one of the big differentiators between a real-time data integration platform and some of the old school data-integration platforms is there’s no notion of a batch. There’s no notion of a job. Everything is 24/7, continuously running.”

It provides monitoring and validation features to help trace and confirm the collection and delivery of streaming data, as well as a wizard that helps customers define data flows and connections to build their own custom pipelines.

Out-of-the-box dashboards show table-level metrics and latency of data delivery. Its tools enable admins to configure performance and uptime alerts and create self-healing pipelines with remediation workflows. These dashboards also incorporate AI to produce metrics on connector components such as read and write rates, latency and CPU usage.

After transformation, it also provides data collectors and delivery to targets such as AWS, Azure, Google Cloud.

Its customers include retailer Macys, the British broadcaster Sky, the French telecom Orange, Google, Gartner and grocer Albertson’s.

“Striim gives us a single source of truth across domains and speeds our time to market, delivering a cohesive experience across different systems,” said Neel Chinta, IT Manager at Macy’s.

Big names such as IBM, SAP, Google, Oracle, Cloudera, Informatica and Talend, as well as newer players such as Matillion, Fivetran and Airbyte all have their own take on enterprise data integration and analytics. And then there are the open source tools such as Kafka, Flink and Spark.

These open source tools tend to be more developer-oriented, more for message-oriented applications, event-driven applications, according to Wilkes. He maintains that continually reading and writing to Kafka topics causes disk I/O, which causes things to slow down. And that having third-party collectors and third-party targets in the mix, as with Kafka Connect, makes it hard to guarantee end-to-end data integrity and data delivery.

At the same time, it offers Kafka as an optional part of its products, though it has its own in-memory messaging system.

He said Striim customers typically are not developers, but database administrators and IT operations folks, though more data scientists are getting involved.

“Customers want to be able to say, ‘How do I get all my data into Snowflake? And I don’t want to have to build this myself. I don’t want to have to configure a whole bunch of different technologies. I don’t want to have to write any code, I just want to move this data from this database, or these database tables, and put it into these tables in Snowflake. And do that in an easy way,” Wilkes said.

Currently deployed on-premise, through cloud marketplaces or as a containerized cloud product, the company recently announced a preview of its managed service called StreamShift, to comprehensively handle cloud data flows with no administration. It includes services such as automated database profiling, migration compatibility assessment, schema creation, transformation and data movement capabilities for both lift-and-shift database migrations and zero-downtime continuous online migrations.

It also announced a $50 million Series C round, bring the 120-person company’s total funding to more than $100 million.

The company is heavily focused on the SaaS version right now, due for general availability in the next few months, Wilkes said, and on ease of use.

Image by Adrian Malec from Pixabay 

A newsletter digest of the week’s most important stories & analyses.