Data / Machine Learning / Monitoring

Devo: Faster Time to Insights from Data

31 Jan 2020 6:00am, by

Data holds huge value for organizations, especially for security and operations teams, but with volumes increasing exponentially, managing it remains a challenge.

Data analytics platform Devo tackles the challenges that enterprises face in managing the explosion of data from various sources, focusing on observability rather than just data collection. (CA Technologies’ Peter Waterhouse previously outlined the difference at The New Stack. It’s also a topic close to the heart of Honeycomb Chief Technology Officer Charity Majors.)

“We are seeing a lot of customers who are dealing with the way in which they collect and operationalize their machine data and their logs, it’s become a real challenge for them in terms of scalability, the economics of it, particularly for security and IT operations groups,” said Dimitri Vlachos, Devo chief marketing officer.

Adroit Market Research recently forecast the global AIOps market — tools that use artificial intelligence to improve IT operations — will reach $237 billion by 2025, harnessing technologies including big data platforms, predictive analytics and machine learning.

The Adroit report cites Devo among the top players in the AIOps market. Most recently the company has been touting its technology as a next-generation security information and event management (SIEM) system, though it’s applicable to other data-intense use cases across the enterprise.

Devo Security Operations provides a central hub for security teams, enabling analysts to collect, store, and analyze any data type from any source. It also offers machine learning-based behavioral analytics, automation and collaboration to identify threats and respond quickly.

In addition, its ML workbench allows operations teams to put their own models into production at scale and test them with real data.

Telefonica, one of the largest telecommunications companies in the world, used Devo in isolating user experience problems with the rollout of its IPTV service, enabling it to draw data from the user’s home through its whole delivery network.

In another use case, a major U.S. manufacturer used Devo to catch bots that were buying up inventory just as it was releasing new shipments and selling it on secondary markets, Vlachos said.

Micro-indexing

Devo provides analytics on streaming and historical data for any business unit.

Pedro Castillo, now the Devo chief technology officer, founded the Cambridge, Massachusetts-based company in 2011. Formerly called LogTrust, the company rebranded as Devo in 2018.

“When our founders wanted to solve this challenge, they looked at what was out there and in open source, there wasn’t a solution out there that could do this,” Vlachos said.

The proprietary technology was built from the ground up.

“We looked at the ELK stacks, we looked at Hadoop, we looked at what others have been doing in the open source area, and it wouldn’t need to meet the needs of the future of the enterprise that we saw,” he said.

Devo takes a different approach to big data. Rather than indexing data at ingest, which can make a system slow and slower as the index grows larger and harder to maintain, Devo relies on micro-indexing after data is stored. This reduces CPU and memory resources required for indexing by up to 80 percent, freeing it up for querying.

Running many micro-indexes in parallel delivers high performance and predictable fast response rates to real-time queries on large data sets, according to the company.

Data is immediately written to disk in its raw format, then compressed by 90%. The data remains always hot rather than being stored in a tiered infrastructure. An out-of-band tokenized index is created from the raw data and written to disk asynchronously. These micro-indexes sit side by side with the original raw data.

One micro-index per source data type is created each day, then made immutable, enabling massive parallelization.

Query is independent of ingestion within the data node. The company maintains this approach enables a single 64 core data node to query up to 48 million events per second. The system scales horizontally by adding data nodes.

A visual interaction model allows users to search and analyze data without knowledge of any specialized programming languages and without writing any code. More experienced users can run queries using LinQ or SQL directly in the UI or via the API.

Devo Activeboards enables users to create and share data visualizations.

“Being able to build and modify dashboards on the fly with Activeboards streamlines my analyst time because my analysts aren’t doing it across spreadsheets or five different tools to try to build a timeline out themselves. They can just ingest it all, build a timeline out across all the logging, and all the different information sources in one dashboard. So, it’s a huge time saver. It also has the accuracy of being able to look at all those data sources in one view. The log analysis, which would take 40 hours, we can probably get through it in about five to eight hours using Devo,” wrote Jay Grant, manager of security services at OpenText, in a review at IT Central Station.

Services sit on top of the platform’s core query engine and data model. They include a correlation engine, aggregation engine, machine learning engine, alerting functionality, data enhancement (lookups) functionality, APIs and a web user interface.

It is available as a SaaS offering, an on-premises solution, or a hybrid combination of the two.

Devo also correlates multiple metrics, so organizations can monitor an entire application stack for an overall health score rather than having to monitor individual components.

With customizable policy, alerts can be delivered by methods including email, Slack, JIRA, or PagerDuty.

The New SIEM

Devo’s closest competitors would be the Splunks or Elastics of the world or legacy Security Information and Event Management (SIEM) providers, Vlachos said.

“The real differentiation for us is really twofold. One is the breadth of data we can easily bring into our system and the second is this: Historically, when you look at these solutions, they’ve either been optimized at bringing in data, and they’re doing techniques to really index that and store that in a way that writes quickly to disk. But then when they go to query that data, there’s a contention between, ‘Hey, am I going to focus on collecting this data? Or am I going to really put all my energy on querying that data?’ And so there’s been a very natural contention between those two,” he said.

“So, if you want to query and bring in a huge amount of log data or machine data, the amount of infrastructure you’d have to build around it, the amount of contention you have between query and collection, we’ve really eliminated that. … So it really comes down to the ability to bring data in and high rate, the ability to query that data at the same time as you’re ingesting it. And without the need to actually change that as the data changes.”

If a source changes data format, you don’t need to do any reprocessing. The system can deal with changes in format, enabling users to continue doing what they’re doing, he said.

Image by Myriam Zilles from Pixabay.

A newsletter digest of the week’s most important stories & analyses.