What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.

How to Solve the Performance Challenges of Web-Scale Applications

Oct 18th, 2018 9:12am by
Featued image for: How to Solve the Performance Challenges of Web-Scale Applications

Web-scale applications require high performance and massive scalability to support a large number of users with an outstanding user experience. They must be scalable on-demand to serve a rapidly growing user base while delivering great performance, agility and resiliency. Web-scale applications are increasingly used in financial services, healthcare, online business services, media, telecommunications, and other industries where the number of system users is constantly growing and their needs are continually changing.

One challenge to developing web-scale applications is a reliance on traditional disk-based databases. These databases introduce unacceptable latency at scale. Further, the extract, transform and load (ETL) process to move data from an operational database into an analytical database means the data is stale before it’s analyzed. Today, the simplest, most efficient and most cost-effective strategy for eliminating latency caused by disk-based databases is to deploy an in-memory computing (IMC) platform.

In-Memory Computing 101

Nikita Ivanov, CTO of GridGain Systems
Nikita Ivanov, founder and CTO of GridGain Systems, has led GridGain in developing advanced and distributed in-memory data processing technologies. Nikita has more than 20 years of experience in software application development, building HPC and middleware platforms and contributing to the efforts of other startups and notable companies, including Adaptec, Visa and BEA Systems. In 1996, he was a pioneer in using Java technology for server-side middleware development while working for one of Europe’s largest system integrators. Nikita is an active member of the Java middleware community and a contributor to the Java specification. He is also a frequent international speaker with more than 50 talks at various developer conferences in the last 5 years.

In-memory computing is based on massive parallel processing across a distributed computing cluster that shares all the available memory and CPU power in the cluster. The cluster can be built using commodity servers and scaled simply by adding new nodes. When new nodes are added, the system automatically rebalances the data across the nodes, providing extreme scalability along with data redundancy. IMC platforms can deliver a 1,000x or more increase in processing speeds compared to applications built directly on disk-based databases. Some IMC platforms support ANSI-99 SQL and ACID transactions, which make it simpler to integrate an IMC platform into an existing web-scale application and can allow the in-memory computing platform to serve as the system of record for the application.

The speed and scalability of an IMC platform enables hybrid transactional/analytical processing (HTAP) (aka hybrid operational/analytical processing (HOAP) or translytical processing). HTAP is the ability to run analytics on the operational data set at scale without impacting the performance of the system. Web-scale applications built on HTAP solutions such as an in-memory computing platform allow companies to gain real-time insights into user behavior and respond in real-time to any opportunities or threats from the changing behavior.

The key capabilities and characteristics of an IMC platform include the following:

In-Memory Data Grid for Existing Applications

For existing applications, the IMC platform is used as an in-memory data grid (IMDG) inserted between the application and data layers, without the need to rip and replace the underlying database. The data in the underlying RDBMS, NoSQL or Hadoop database is loaded into the RAM of the IMC cluster. Collocated, massively parallel processing on the cluster nodes delivers a tremendous performance boost. If the IMDG supports SQL, communicating with the data grid can be as easy as using standard SQL commands to manipulate and analyze data.

In-Memory Database for New Applications

For new or rearchitected applications, some IMC platforms can serve as a standalone in-memory SQL database (IMDB). To protect the data in memory from loss during a reboot or power outage, a cost-effective strategy is to use a “persistent store.” This is discussed in the next section.

Persistent Store

A “persistent store” capability can be added to a distributed ACID transaction and ANSI-99 SQL-compliant disk store deployed on spinning disks, solid-state drives (SSDs), Flash, 3D XPoint or other storage-class memory technologies. For IMDBs, the persistent store preserves the data in the event of a reboot or power outage and allows the application to access a larger data set that can be held in the total cluster RAM. For IMDGs, the persistent store lets an organization balance infrastructure costs and application performance by keeping the fully operational dataset on disk while keeping only a subset of the data set in memory. An important benefit of a persistent store is that it enables immediate data processing following a server reboot without waiting for all the data to reload into memory.

Machine Learning

Some IMC platforms now feature integrated, fully distributed machine learning (ML) and deep learning (DL) libraries that have been optimized for massively parallel processing. This enables each ML or DL algorithm to run locally against the data residing in-memory on each node of the IMC cluster, which allows for the continuous updating of the ML or DL model without impacting performance, even at petabyte scale.

Integrations with Other Applications

An IMC must operate as part of a complete architectural stack and must integrate easily with other useful solutions. For example, the following open source solutions all seamlessly integrate: the Apache Ignite in-memory computing platform, the Apache Kafka stream-processing platform, the Apache Spark distributed general-purpose cluster-computing framework, and the Kubernetes open-source container-orchestration system.

Open Source

It is not surprising that all the above solutions are open source. Open source solutions have become vital to enterprises launching digital transformation and omnichannel customer engagement initiatives and make developing web-scale applications practical for organizations of all sizes. Open source offers a reliable, proven strategy for developing applications with a much lower upfront investment. It provides organizations with greater control over their own destinies, since the standards-based approach of most open source projects reduces vendor lock-in. And open source projects deliver greater innovation much faster than the traditional proprietary vendor model.

For web-scale applications to deliver their expected benefits, organizations must achieve real-time application performance at massive scale. IMC offers the only practical, cost-effective path for achieving this, which is reflected in a Gartner prediction that by 2019, 75 percent of cloud-native application development will use in-memory computing or services using IMC to enable mainstream developers to implement high-performance, massively scalable applications (Gartner: “Predicts 2018: In-Memory Computing Technologies Remain Pervasive as Adoption Grows”). Architects, developers and CTOs who understand the importance of web-scale architectures to the future of the data center should immediately begin investigating the power, flexibility and scalability of in-memory computing solutions.

Feature image via Pixabay.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.