DataOps: Lenses.io Drives Data with SQL, Kafka
You just jump in the car and drive from point A to point B. You don’t have to know all about whether your engine is gas-powered, diesel or electric. You don’t have to know all the inner workings of an engine. You just step inside a safe environment and go.
Too often technology is driving the business, something the London-based company wants to invert. Data projects tend to be expensive and require highly specialized skills. He cites a lack of skilled labor among the reasons behind Gartner’s estimate that up to 85% of data projects fail.
“How can we [make it so that] you can really operate your data like operating a car? Make it simple, so you can get started immediately. You don’t need all the high-tech skills that are so hard to acquire. And you’re also agnostic and future-proof of technology?” he asked.
SQL on Top
Lenses’ answer to that is by providing an SQL layer on top of open source technologies, then using APIs to connect to other systems. It aims to provide architects, data engineers, data officers and business analysts an easier way to build data infrastructure while the system takes care of monitoring, logging, authentication and security.
“Typically, in a modern kind of projects, what we see is that multiple vendors and technologies are involved. We like to enable our customers to get the benefits of using all those different technologies that they have already selected, and to deliver a secure environment for them to move and use that data in real-time. And that means technologies like Kafka or Kubernetes or Elasticsearch and in a way that doesn’t lock those customers with specific technologies,” he said.
The streaming data management platform operates on top of Apache Kafka with a web interface and features for creating and querying real-time data and creating and monitoring Kafka topologies.
To fully master the Kafka Streams API generally requires developer skills such Java, Scala, Kotlin, or other Java Virtual Machine-based languages and a steep learning curve to provide joining, filtering and aggregation of data streams to provide value from the data streaming in, the company maintains.
It employs REST or WebSocket APIs on top of the SQL layer to integrate with technologies like Kubernetes.
Chalkiopoulos describes the approach as a greater focus on business logic and a simpler way to implement it.
“In a layer above the technology, SQL is a very concise and precise language to do such operations, then you can move your systems across any technology,” he said.
The technology employs Continuous SQL processors to join, aggregate, filter and enrich streaming data and Prometheus and Grafana provide monitoring and visibility.
Popular data technologies like Apache Spark, Apache Flink, Akka Streams and KStreams can all be displayed in a central topology screen regardless of the underlying technology and data format. It has been written to AVRO, JSON, XML, CSV, STRING, INT, LONG and via extensions Google’s Protobuf as well as custom data formats. It provides Kafka Connect Connectors for nearly all the major data systems using Kafka.
The Lenses SQL engine enables querying of streaming data or data at rest in tables. It uses the Kafka Connect framework for moving data in and out of Kafka. The SQL Streaming engine enables users to build real-time data pipelines quickly and easily.
Lenses also can deploy connectors and SQL processors from either inside or outside Kubernetes. The Lenses Docker image is available on DockerHub.
Chalkiopoulos and his wife Christina Daskalaki founded Lenses, originally named Landoop, and released version 1.0 in November 2017. They had previously written around 35 open source tools around Kafka and decided to put them all together into something bigger.
The company merged with Data Mountaineer, one of the biggest contributors of open source components for Kafka, in December 2017. It raised $7.5 million last September in a Series A funding led by 83North, bringing total funding to $8.5 million.
“We hear consistently of organizations failing or slow to deliver their data projects. This is due to the complexity of today’s data infrastructure technologies, the fast-evolving landscape and the skills shortage,” Laurel Bowden, partner at 83North, told The New Stack.
“We feel Lenses.io has a different approach in the market: Allowing organizations to select and match powerful open source technologies whilst making it possible for anyone to build real-time data applications without requiring the skills or being tied to the particular infrastructure. They are the only ones in the market who decouple the application and data layer from the data infrastructure in this way.
“Despite being very new, Lenses.io is already working closely with a number of Fortune 100 brands, and we feel they are in the best place to be leaders in DataOps,” she said.
The company recently rebranded as Lenses.io and opened a headquarters in New York. Its customer portfolio has grown to more than 100 companies including Barclays, Daimler and Ericsson.
Gaming platform Playtika found that as the company grew, visibility across various open source tools became a problem with developers and engineers having to request access to myriad systems.
“As our use of Kafka grew, production incidents became more difficult to manage without proper visibility,” said Ivan Vasyliev, systems architect at Playtika. Design, marketing and other teams also needed access, which created an even greater challenge with visibility.
Playtika use Lenses with fine-grained access controls and auditing to provide visibility and monitoring for multiple teams and to enable quick investigation into issues. It now touts savings of 300 engineering hours per day across 600 developers, QA, operations and analysts.
Chalkiopoulos foresees data ethics growing in importance in the coming years, with understanding how sensitive information moves through different technologies and parts of an organization.
Lenses makes data security and privacy part of its “core DNA,” he said.
It builds on the authorization, authentication and encryption of underlying middleware to provide fine-grained role-based access, which administrators can control with whitelists, blacklists and namespaces.
With a topic-centric security model and the use of namespaces, someone granted read access to a certain topic has access to all applications using that data, making it easier to understand all the places that data exists.
Using a complementary field-centric security model, it collects metadata and other information about where sensitive data resides. Data policies can then be applied, including redaction as required.
It provides traceability — all queries are tracked and audited to help the organization know who accessed the data and when.
Feature image via Pixabay.