For Robin Systems, Everything’s in a Container
While the buzz about containers has largely centered on how it can ease application development, Robin Systems is making containerization part of its strategy for an application-centric data center.
And it’s using containerization for applications such as Hadoop and MySQL that haven’t been considered good candidates for Docker. At the same time, Robin focuses on efficiency without performance loss through hardware multitenancy on Linux; a strategy DH2i takes with Windows environments as well.
“It’s a next-generation platform based on containers as a core technology, but we do a lot of resource management, storage and data management as well as to enable people to consolidate applications on bare metal to eliminate VM sprawl,” said Sushil Kumar, Robin’s chief marketing officer.
The idea to make the application and its needs the focal point of the data center, with the boundaries among servers and storage invisible to applications.
“Our technology takes the software-defined data center to the next level to an application-defined data center,” he said.
“In the old days, you’d say ‘This machine is for this application.’ You didn’t run another application there because you were unsure if they’d step on each other’s toes. Since we run containers for runtime applications in isolation, we don’t have to worry about which machine runs what,” Kumar said.
With Robin, the user can deploy multiple hardware clusters, such as multiple nodes running Hadoop clusters running on the same virtual machine, according to Kumar. “It’s dictated more by the needs of applications rather than you having to map your applications to the machine boundaries. With Robin, those boundaries are invisible.”
The platform offers the ability to:
- Consolidate databases and Big Data apps.
- Reduce VM sprawl.
- Deploy applications faster with portability; easily make clones for test and dev purposes.
- Optimize data capacity and performance through application-driven storage management.
- Eliminate data duplication by enabling data sharing across applications and clusters.
Founded in 2014, Robin Systems in October raised $15 million led by a subsidiary of USAA and DN Capital, bringing its total funding to $22 million. Its customers include AutoWeb, and USAA.
It’s been touting the experience of Walmart eCommerce, which in tests sped up the time to ingest 250 million files by 8.5 times and boosted query performance by 2.5 times. It also found it could reduce its hardware footprint from 16 servers/320 cores to 10 servers/160 cores.
Kohl’s Department Stores has been testing Robin Systems and now is looking to scale out the technology throughout its whole infrastructure, Ratnakar Lavu, Kohl’s CTO said during an interview with SiliconAngle at BigDataSV 2016.
How it works
Robin bills its platform as a complete out-of-the-box solution for enterprise data-driven distributed applications on a shared platform created out of commodity or cloud components.
It creates a compute and data continuum, so multiple applications can be deployed per machine to ensure the best hardware utilization, ensuring multi-tenant harmony and isolation for each application. Containers provide mobility across machines and clouds.
The platform decouples compute and storage layers and combines a distributed persistent storage layer with an integrated host-side distributed caching layer to provide high-performance storage access to applications. Its virtual storage pool serves as block storage to the compute layer. A RAID6 configuration provides high availability for the persisted data.
The compute or “data acceleration layer” is a distributed host-side caching layer that uses SSDs to provide read and write acceleration to the virtual storage pool. The data acceleration layer on each compute node presents a filesystem interface to the application layer. Reads and writes to this filesystem are accelerated by using SSDs as a data cache. Read and write bandwidth is maximized by striping data across SSDs.
The Robin Central Manager (RCM) serves as the agent-based orchestrator for managing the deployment of “virtual clusters” of LXC or Docker containers.
Containerize all the Things
“In our platform, every application gets deployed in a container,” said Kumar.
“A lot of awareness around containers is based on Docker. Docker has taken the technology to solve application-development problems,” he explained.
“We are approaching containers from an IT-optimization perspective. We use containers to provide zero performance impact virtualization on bare metal,” Kumar said. “People can take demanding enterprise applications, which don’t get deployed on Docker containers, and deploy that on containers for consolidation without performance impact. It’s about deployment, it’s about the quality of service, it’s about application lifecycle management. We handle the whole application end-to-end.
“Just as virtualization took a machine and made it more agile, we’ve taken that one step further. We’ve taken an application as a native entity and done everything from accelerating deployment to management of performance, quality of service.”
Each container is exposed to a raw device, so it looks like it has local storage, he explained. That raw device can only be seen by that container, which data privacy.
“Because we control the I/O path all the way from the application to spindle, we can tag I/Os and ensure a predictable performance all the way from application to disk,” he said.
The storage is done on the back-end centralized storage layer, so even though the compute layer sees a virtual storage layer, the data is striped across multiple different nodes on the storage layer.
“If an application dies, our software can transparently deal with that by bringing the stateless containers up or any containers on surviving nodes, the rerouting them back to the storage volume. So it makes the compute layer more robust, elastic and fungible,” Kumar said.
Any piece of data is distributed across multiple disk drives to protect against machines going down.
“The key benefit in the case of the data-centric workload is that most distributed applications like Hadoop make serial replication, and they do that because the commodity hardware doesn’t have the smarts to protect data,” he said. “When you have petabytes of data, just having two petabytes to have two other copies is really expensive. Because we protect data at the storage layer using techniques such as erasure coding, our customers have been able to turn off three-way replication, either completely or go back to two-way replication and get 50 percent of their storage capacity back.”