Tecton, the enterprise feature store for machine learning (ML), has launched low-latency streaming pipelines to its feature store this week, giving its users the ability to build real-time ML applications with ultra-fresh features to the order of sub-100 milliseconds.
A feature, in ML, is data that a machine learning model can train on and infer a signal from, while the feature store is the interface between that data and the model. The feature store handles a variety of tasks, from transforming and cleaning the data, to serving the features, to providing functionality around logging and monitoring the health of pipelines.
Companies typically adopt ML pipelines along a continuum according to the complexity of adoption, starting with traditional analytics, such as daily dashboards, before moving to analytic ML, which powers the batch-generated ML predictions behind forecasts and marketing campaigns. The next step up in complexity is operational ML, which can power real-time predictions, such as product recommendations, in production applications, and is generally where feature stores come into the mix. At this point, feature stores can help with providing the low latency and service level agreements (SLAs) required for in-production use. The final step is real-time ML, which no longer relies solely upon batch data, but instead requires access to these “ultra-fresh” features built with real-time and streaming data sources.
It is this final level of complexity that Tecton is addressing this week with its newest capability, which it says automates the process of transforming streaming and real-time data, including time aggregations, into ML features in less than one second. The innovation here is not in the ability to do so, but rather the ability to do so without a large team of dedicated engineers and potentially weeks and months of lead time.
“It’s extremely hard to find the talent that can implement and solve these types of problems. Even if you have the talent, it still takes that talent a long time to solve these problems, because every time a new machine learning feature is required by a data scientist, that engineer now has to spend weeks or months actually implementing it and making it production-ready,” explained Tecton co-founder and Chief Technology Officer Kevin Stumpf. “With our system, we basically make these capabilities available now for many more enterprises. They don’t need to hire as expensive an engineer, and the lead time is drastically reduced, because the platform automates all of this work that the engineer otherwise would have to hand implement in an error-prone way over weeks or months.”
Stumpf and the rest of the founding team behind Tecton are intimately familiar with the process of doing this all by hand, as they previously worked together at Uber to create Michaelangelo, the machine learning platform used there to build, deploy, and operate machine learning solutions at Uber’s scale. Their goal now is to bring this same sort of capability to companies that don’t want to make that same investment.
By serving these “ultra-fresh” features, Tecton users will have the ability to do things with ML that would otherwise be impossible, such as fraud detection. Stumpf explained that fraud detection requires the most up-to-date features possible, and that without them it could easily miss important signals.
“One of the most important features in machine learning are these time window aggregations. For instance, how many transactions have been made with a given credit card over the last five minutes or over the last 30 minutes?” said Stumpf. “If you have a fraud detection application, you need to know in that very moment, ‘How many transactions has this credit card been used for over the last five minutes as of right now?’ Not as of a minute ago or as of five minutes ago. It’s much, much easier to support and serve features that are old, where you can easily calculate in the background, but as you can imagine, that’s not good enough. You really need to know in the moment in which you’re making a prediction, how many transactions have happened as of right now, because if you don’t know that information, you’re basically throwing away a really important signal that is required in order to accurately predict whether this transaction is a fraudulent transaction or not.”