Traditionally, patients participating in clinical trials had to go to a lab or research hospital to submit data. It might be a blood pressure reading, a questionnaire about how a new medication affects their sleep, or a test of their respiratory function.
With internet-connected devices, however, patients can submit data from anywhere, so researchers can collect more data more frequently and aggregate data more effectively, pointed out Tom Hosford, vice president of engineering at Koneksa Health, at a session recently at MongoDB World 2017.
The three-year-old New York City-based startup has developed an analytics platform to monitor, aggregate and analyze continuous and high-volume data sources from wearables and devices such as blood pressure cuffs, heart rate patches or spirometers, which measure lung function.
The company announced a partnership with Japanese pharmaceutical firm Takeda, to use Koneksa’s remote data-collection capabilities from biosensors and wearables in early-stage clinical trials. It’s also working with ActiGraph, which provides clinical-grade wearable activity and sleep monitoring technology.
Montefiore Einstein Center for Cancer Care in New York City has teamed with Koneksa Health to integrate data from Garmin Vivofit devices to track the activity of cancer patients during concurrent chemo-radiotherapy.
Koneksa uses Linux on Amazon Web Services, including a number of Amazon services such as CloudWatch for logging, EC2 for servers and more. It pairs Mongo as the primary data store — all the application data and a summarized form of the data — with Amazon Simple to store the raw data. With a tracker watch, for instance, the raw data is stored in AWS’ Simple Storage Service, then the application will transform and normalize the data into a JSON format used by Mongo and ultimately data scientists for analysis.
The architecture includes three replicas, each with primary and secondary servers. The identity server controls the authentication of users. The admin app is accessible only to administrators working on the trial, but a customer app is accessible to patients, who can see how their trial is going if it’s not a blind trial. The API server goes out to the devices to get the data through APIs or other sources.
The Koneksa application collects data from various devices, including a spirometer, which Hosford demonstrated. It measures data such as the maximum volume of air a person can expel in one second.
The patient blows into device, which is synched to his phone. The API server retrieves the data, which is used to populate a dashboard. The research coordinator can see all the endpoints from which data has been collected and drill down over different periods of time and also into the data from individual endpoints, explained software developer Brenda Deverell Cortez.
Because Koneksa’s platform collects data from various devices, it comes in various formats.
“How do we choose a measurement model to write reusable code to say, ‘Grab all data points on a graph’ or one field per analysis within the same data model? Mongo provides for this with embedded document construct and flexible schema,” Hosford said.
There are some common fields such as user ID, device type, location, time of measurement. Then there is an embedded nested array that stores all the data points in a reading in a one-to-many relationship, which can represent many different values. After the API server fetches the data and puts it into a JSON format, then it is processed and normalized to a consistent schema.
Mongoose, which is a Node.js library, provides some enhanced functionality.
“First off in a validation layer, we can set required fields — throw an error if a record has no association with a user. We can do validation in a set. We can say the device has to be a spirometer, a blood pressure cuff or an ECG patch. Likewise, we can set default fields, if the date is missing, set it to the creation time. This lets us fill in some gaps that are not provided natively by MongoDB, but we get a lot of value out of these things at the application layer,” he said.
The default field also enables a “lazy migration.”
The company recently added international time zone support to its data model. In a relational database, first you’d run an “alter table” statement to add a new time-zone column to the user model, then you’d run a massive update to update all the users in the database. This could create performance problems for a large application with many users and many clients, Hosford said.
With Mongoose default fields and the flexible schema enable a “lazy migration,” where records are migrated over time and as they are saved and fetched.
Data Retrieval and Performance
To find measurements for a specific user, you’d have to do a left join — combining data from two tables in a database, Cortez explained. You’d have to find the specific user and find all the measurements from him or her.
You could do this at the application level using Mongoose’s library, but it wasn’t supported natively in MongoDB. In Mongo 3.2, the $lookup call does this natively. Version 3.4 adds $graphlookup, which means can limit data to just the spirometer, for instance.
Because it wants to limit calls back to the database, the company has adopted a cache field approach, which is denormalization on the user record.
“We denormalize the latest measurements into what we call the latest data field on the user object. When a coordinator loads the participant dashboard, they get all these user objects, and each of those has its own latest data,” Hosford explained. “So we can populate the dashboard without making any more queries. Then if the coordinator wants to see the detailed views with the specific graphs for those endpoints, they can do that and new queries will be made at that time.”
Feature image via Pixabay.