How Hybrid Analytics Improves Real-Time Data-Driven Insights
You’re hungry for a pizza as you compose a post to your newly found group on a professional website from your mobile phone when the phone buzzes with a Happy Hour coupon for your favorite pizza on your next order.
Why was the coupon for your most-ordered pizza presented to you around 3 p.m.? And before that, how did you even discover that this group existed on your professional business network site? And why did the coupon arrive as a push message and not an email?
Many of these nudges are the result of algorithms that have taken data scientists months, if not quarters, to refine and then equally long to deploy for their customer audiences.
Customer experience has quickly become the most important competitive differentiator and has ushered the business world into an era of monumental change. As part of this revolution, enterprises are interacting digitally — not only with their customers, but also with their employees, partners, vendors and even their products — at an unprecedented scale.
These interactions, which create mountains of data, are powered by the internet and other 21st-century technologies — and at the heart of the revolution are a company’s cloud, mobile, social media, big data and IoT applications.
There is little doubt that enterprises recognize the value of data emanating from emerging applications. And they are making strategic decisions to adopt technologies that allow them to glean insights from growing amounts and types of data to provide customers with a level of customization that we’ve never seen before.
However, the aspiration to be more data-driven is easier said than done and involves not only the adoption of new technologies that enable the use of machine learning models against operational data in real-time, but also new approaches to data management.
NoSQL Databases Provide Opportunities but Introduce New Challenges
To adapt to this new business landscape, organizations must identify a database technology that fits the needs of the business. One such solution is a NoSQL document instead of the traditional rows and columns used by relational databases. NoSQL stands for “not only SQL” rather than “no SQL” at all.
This means that a NoSQL JSON database can store and retrieve data using ANSI SQL you’re familiar with combined with the flexibility of JSON. It’s the best of both worlds. Consequently, NoSQL databases are built to be performant, flexible, scalable and capable of rapidly responding to the data management demands of modern businesses.
Developers and customers face multiple modern data and analytics challenges when working with JSON data. Amid growing data volumes and variety, demands on developers are to make the data available in real-time for further analysis. Traditional data warehouses require schema modifications and ETL (extract, transform, and load) pipeline changes when application data changes.
This results in analysts being unable to quickly obtain insights into what’s happening in their business, hindering their ability to monitor and optimize data and execute in a way that doesn’t interfere with their operational systems’ workloads.
Furthermore, there’s an increasing need for computations and complex ad-hoc analytical requests with large joins, aggregations and grouping. Against this backdrop, DevOps teams need the ability to offload the data processing requests from underlying transactional data services to analytical data processing engines that support parallel query processing and bulk data handling.
Addressing Data Pipeline Indigestion with Hybrid Analytics Processing
As underlying data grows, there is a need to manage various data flows from one system to another, increasing the number of data pipes needed to perform ETL processing. Often this involves pipelines to transform the operational data before the analytical tools can process it.
And, of course, all of this leads to increased time, costs, more systems and processes to maintain and further adds complexity to the overall data platform and infrastructure. In other words, it can lead to a serious case of data pipeline indigestion!
The traditional paradigm of thousands of users with known structures was and still is well served by the traditional relational database structure of rows and columns. But it does not serve the need for the schema-less, higher-volume throughput and agility demanded by web applications. NoSQL database technology evolved from these use cases and supports millions of transactions per second across millions of users.
Analytics at the Speed of Transactions
A hybrid operational and analytics processing (HOAP) architecture “breaks the wall” between transaction processing and analytics, enabling more informed and “in business real-time” decision-making.
In fact, a recent academic paper presented at the IEEE International Conference on Big Data stated that, “database system architectures with hybrid data management support — known as HTAP (hybrid transactional/analytical processing) or HOAP (hybrid operational/analytical processing) support — are appearing and increasingly gaining traction in both the commercial and research sectors.”
The commonly accepted practice for handling both transactional/operational workloads and analytical workloads has been to keep them separated, with each workload running in a separate system. The fact that one process may impede upon the other — that long-running analytical queries affect incoming transactions, for instance — is just one of the many reasons it makes sense to separate out these two workloads. Hybrid analytics blends both operational transactions and analytics within a single system or platform and eliminates ETL delays. Again: Analytics at the speed of transactions!
Optimizing Real-Time Analytics with a HOAP Data Platform
Part of the appeal of HOAP-full systems, or systems that are capable of hybrid operational and analytic processing that include both OLTP (transactions) and OLAP (analytics) in a single implementation, is more than just an efficient strategy of fewer systems to maintain; it’s also the ability to do analytics on incoming operational transactions in near real-time and even to use your ML algorithms as part of your queries. Vendors that provide these capabilities include Couchbase Server, Microsoft SQL Server and SAP HANA — the first one being for NoSQL databases and the latter two for structured (relational) databases.
Now let us consider how a HOAP data platform can give companies the ability to quickly, reliably and efficiently run near-real-time analytics on operational data without the need for ETL — and also thereby avoiding potential security risks of data-leakage through data transfer as well as loss of time — to offer the right nudges to help their customers learn about opportunities, or thwart unwanted risks, and improve their overall experience.
Much of the data needing analysis by today’s data scientists is in JSON, so the work of the data scientist requires them to munge the JSON data from its document format to a format that works for their OLAP tables (rows and columns), which takes time and technology to be able to use the data for developing ML algorithms. A NoSQL hybrid database can combine both the JSON operational data, and all the benefits it provides, with the ability to run analytics on the data using a structured query without incurring the lost time for extracting, transforming and finally loading the data to start the research.
The traditional model of training an ML algorithm is unchanged; users can import and operationalize an ML model as a callable function as part of a query. This is unique, as it helps the data scientist to not only improve and test an algorithm by exploring it on queried subsets of the data, but also to deploy it for use on the operational data in near real-time on the analytic side of the HOAP system.
Use cases that can benefit from the high transactional performance of a NoSQL database, that is able to run structured queries that can then be exposed to further interrogation, are many.
In the retail industry, for example, the ability to offer the right product recommendation or coupon for a customer as they progress through the checkout process has been shown to increase sales. Systems today tend to use static algorithms, but by replacing them with a Python ML model that can then be applied to the latest operational data, businesses can improve efficiency and further increase sales based on data-driven insights.
Across industries susceptible to fraud, including financial services, retail and travel, the need to identify anomalies in user behaviors and indicators of compromise is critical. However, reviewing thousands or millions of transactions per second with a structured query is unrealistic, particularly if you want to take action in less than a couple of seconds. A HOAP platform with the ability to use more insightful ML algorithms can help organizations analyze transactions quickly and at scale to more effectively identify fraud.
Another use case we’re increasingly seeing is with banks and financial institutions needing to calculate risk scores, which requires organizations to perform complex analytical queries, computations and aggregations on JSON data, enriched with third-party data.
Couchbase Server Optimizes Analytics Pipelines
Leading enterprises use Couchbase to solve the above use cases and many more by leveraging the Analytics service in Couchbase Server, a hybrid operational/analytics processing database. This helps modern businesses increase their revenues, reduce their risk profiles and improve their operational efficiency.
Additionally, customers have enjoyed the benefits of isolating their operational and analytical workloads within the same data platform instance — avoiding ETL, performing analyses on the same data model as the application, avoiding performance interference and thereby avoiding data pipeline indigestion. This is the key to providing real-time insights on real-time operational data.