Applying Workload Orchestration to Experimental Biology
HashiCorp sponsored this post.
For many biology researchers, from small academic labs to large pharmaceutical companies, the process of physically running and documenting experiments is nearly as complex as the biology itself. Ph.D. biologists spend 40% of their time doing repetitive mechanical tasks, which distracts them from more impactful work like designing and analyzing experiments. Manual processes also present the risk of human error in the laboratory.
This issue came to the forefront of the public consciousness during the ongoing coronavirus pandemic. Labs researching COVID experienced bottlenecks when trying to align the right number of machines and people, to maximize daily testing capacity. Plus labs that weren’t COVID-focused halted work due to work restrictions. Figuring out how to allocate resources and schedule workflows is a familiar problem, whether you’re working on a manufacturing pipeline or on a pandemic.
Several forward-thinking organizations are re-framing this as a computing problem — in particular, they’re realizing that building a compiler down to biology labs will enable biologists to use abstractions and optimizations automatically. This will address key operational and deep-rooted organizational issues with biotechnology companies.
The core ideas behind an operating system — that of process, memory, and storage abstractions over shared resources — are also applicable to biology labs. These abstractions are what Radix Labs is building software to accomplish.
Siloed Infrastructure Blocks Road to Lab Scaling
The Radix team approached this from first principles. We developed a full toolchain — including programming language, compiler, and (distributed) run-time — to compile lab protocols into robotic movements or (human) lab technician directives. Researchers can design, run, and analyze experiments automatically through an intuitive interface. Disparate lab machinery can be orchestrated under a single control plane. Laboratories using our tools can run more tests in a shorter time period, receive actionable advice on scaling and capacity planning, have correct-by-construction compliance, and more — all automatically exposed and optimized in our high-level language.
However, it wasn’t an easy road and we faced several challenges in centralizing laboratory data and workflow.
- Data Siloing: Laboratory equipment, such as DNA sequencers and pipettes, typically do not share or centralize their data. Each piece of equipment runs on its own on-premise bare metal node. This precludes the network sharing that’s key to an automated laboratory.
- Machine Isolation: In the typical biology lab, there is no centralized plane to access equipment remotely. Equipment is often fragmented across several organizational units, siloed away from where users may demand capacity. Workflow becomes exponentially more unproductive and inefficient as labs scale — when you get to a pharmaceutical company, the duplication of resources at that scale is astounding.
- Manual Intervention: Current set-ups typically require people to physically go to each piece of equipment and manually access the data or results. They have to manually reprogram equipment during experiment redesigns. Researchers spend more time in front of their equipment than they do on research and analysis.
What we needed was a simple way to orchestrate these disparate lab machines. In essence, it’s the opposite problem of big compute: bio labs have too many computers, not too few. Our aim is to empower users and make its software as accessible as possible, so we required a tool that could easily fit and deploy on a lab’s existing on-premise infrastructure — typically air-gapped bare-metal Windows environments with legacy machines.
Unifying Lab Workflow
We ultimately selected HashiCorp’s Nomad tool for our orchestration solution, which helped us arrive at an early sketch of our runtime system in under a week — and has grown with us over time. Nomad’s flexible workload support and easy on-prem deployment allow us to support a variety of applications to meet any existing configuration. Our system runs several Docker containers as well as Java applications that access hardware resources and require root. Nomad makes it easy to add machines like mass spectrometers by first fingerprinting through the exec driver, and then rolling out more specific drivers. This gives us a unified control plane, enabling hardware and driver rollouts using vendor’s drivers — be it a centrifuge, incubator, or mass spectrometer.
We generally add nodes only for the purpose of adding a new physical device like a DNA sequencer, which generally reads directly off of Apache Kafka, and it has Kafka running inside Nomad. All this is well integrated and hidden from users, running seamlessly in the background. Core services are run in a three-node Consul/Nomad/Vault cluster — not on driver endpoints.
Researchers Freed to Focus on Experiments
Workflow orchestration is a core element to our ability to enable better ways to manage labs and scientific experiments to our customers. Too many biologists are treating lab equipment and machines like “pets” that must be manually managed and updated. One of the things we are trying to do is encourage biologists to treat lab resources like “cattle” — a distributed resource pool without human intervention. Orchestration plays a key role in reducing waste and better allocation of resources — especially for a biology lab, where machine utilization is in the 10-30% range.
Researchers themselves don’t need to worry about the computing side. All they see is their experiment running — things like individual node failure and dynamic failure checking are invisible. The data just shows up for further processing. From their side, they’re adding a mass spectrometer, or running a standard process that needs a gene sequencer — our OS works in the background to make it possible — and extremely efficient.
Researchers can, therefore, deploy applications/experiments, ingest data, and retrieve results remotely on any piece of lab equipment via a single unified control plane. Limited physical or manual presence is required, allowing them to cut time spent on collecting lab results and focus that time on understanding the complexities of biology.
Lab research is hard. Having an operating system for its computer architecture will make it easier.
At this time, The New Stack does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email: email@example.com.
Feature image from Pixabay.