In its bid to make data more easily accessible even across multiple databases and clouds, data virtualization vendor AtScale recently added time-series analysis capabilities.
It’s part of the industry trend toward creating “fit for purpose” processing and analysis of Big Data rather than a one-size-fits-all approach.
“It used to be all about performance, but the battleground is moving or has moved to agility,” said Matthew Baird, co-founder, and chief technology officer of AtScale.
In what he refers to as applying the best tool for the job the time-series capabilities include:
- Modeling live data using time-based calculations, metrics and relationships, through a single web interface.
- Standardized period-over-period comparisons across arbitrary definitions of time.
- Intelligent federation to the most suitable database regardless of location to enable complex time-based analysis at scale.
“What we’ve seen in the market — performance, security, agility are super important and always have been and always will be. But in the cloud world, there’s a cost element as well,” he said.
Virtualization means you don’t care where the data is, you don’t care about how it gets accessed, what you care is getting your questions answered and letting the computer go get that answer in the optimal fashion, he said.
AtScale provides virtualization across various data platforms, including Teradata, Oracle, Snowflake, Redshift, BigQuery, Greenplum, and Postgres.
“The goal is to allow people to quickly merge data from several different sources, roll out a data service, then have their users start querying it. When they do, they’re sending their intent about what data is important to them, and AtScale responds by doing these data engineering jobs that improve the performance, security and agility of the system,” he said.
AtScale originally was created on Hadoop, but since has built a data fabric on Spark. It includes a scalable dimensional calculation engine, a machine learning performance optimizer, a universal data abstraction layer and enterprise-grade security, governance and metadata management capabilities.
It connects business intelligence tools like Tableau, Excel and Power BI to live data sources without moving the data, eliminating extract, transform and load (ETL) and other manual processes.
In a demo Baird pulled together five databases — two in physical clouds, three in different cloud databases — making 20 tables look like one big table.
Baird described its technology as two parts: a user experience application where people design data services; and a server that accepts queries. It looks like a Hive database, but it’s more like an autonomous data engineering and higher-level cost-based optimizer, he said.
It models data as a virtual OLAP (Online Analytical Processing) cube, using familiar frameworks such as BusinessObjects Universe or Cognos Frameworks Manager. The model can then be published via ODBC, JDBC, and MDX for querying.
The Universal Semantic Layer allows users to connect any tool to AtScale and access models without having to join tables, create calculations or business rules. It adapts as it learns about the model and continually reviews how users interact with the data to automate the data engineering needed behind their work.
It’s key to only fetch the relevant data. Users simply ask for the data they need, and AtScale automatically rewrites the query to optimize performance in fetching the data.
In what it calls Adaptive Cache, its aggregates queries in real-time, based on learning what data is important, how it’s used and how best to shape that data. These aggregates are stored on disk alongside original data in a chosen data platform, speeding response times on subsequent queries.
Its customers include Home Depot, Visa, UnitedHealthcare, Toyota and others.
Data workers typically use four to seven different tools to source, query, model and analyze data, taking up to 40% of their time, according to IDC.
Myriad companies are working to make working with data more manageable, such as Pepperdata, which is running Spark on Kubernetes; the self-service platforms Qubole and Panoply; and most recently Amundsen, the project Lyft open sourced that helps users find data quickly and ranks how trustworthy it is.
AtScale employs federation to support multidimensional analytics. It can talk to more tools and better simplify the analysis, according to Baird. It supports tools natively like Excel, Hyperion, Cognos and BusinessObjects.
“Our approach to solving the problem is very different. Denodo and Dremio focus on federation and caching for performance at scale. We employ this autonomous data engineering approach: What would the data engineering team do?” Baird said.
It also centralizes security, which can quickly get out of date if you’re moving data around.
“We have a patent for end-to-end delegation and impersonation that we call true delegation. So the person that executes the query, sitting on a Windows desktop talking to Power BI or whatever tool they’re using, is the same identity used for the query on the back end. And no matter how we build the acceleration structures and store them, the security of the underlying data warehouse is applied to those queries,” he said.
He also cites a friendly UX for building use cases, taking minutes or hours versus weeks of complexity.
Feature image via Pixabay.