Machine Learning

Add It Up: Machine Learning Developers Don’t Predict

20 Sep 2018 10:12am, by

A majority of developers involved with machine learning do not use models to generate predictions. Even among those that work with training data sets directly or via an API, only 10 percent work with more than a million rows. Technologists focusing on real-time, Big Data analytics better narrow their focus a bit. Those are just two findings based on SlashData’s State of the Developer Nation 15th Edition, which asked detailed questions of over 4,200 developers involved with machine learning (ML) and data science beyond just using someone else’s algorithms, building frameworks, or artificial intelligence (AI) not related to ML.

When asked if their team uses models to generate predictions, 58 percent said no, that at most they use models to describe data. Twenty-eight percent use batch processing for predictions while 21 percent make real-time predictions. The batch and real-time data may be slightly higher the question didn’t allow for multiple responses, but we still believe that this level of adoption represents proof that there is demand for stream processing to analyze data continuously and quickly.

As a reminder, a DZone survey of ML developers found that predictive analytics is by far the most likely reason an organization has adopted AI or ML. Yet, the SlashData survey says a majority of ML developers don’t generate predictions. Business executives should be wary of big budget proposals for ML technology that oversell exponential gains due to predictive analytics that have yet to be widely deployed even among companies that say they are already using ML.

The lack of suitable data is probably inhibiting the wide-scale use of predictive analytics. Sometimes, available data is too “dirty” to be useful. In other cases, the data sets continue to be very small. No matter the reason, more than half of ML developers training datasets work with less than 20,000 rows. Developers with speech recognition and image classification use cases work with smaller data sets than others. This means that ML’s impact in these areas are more likely due to automation rather than from deep learning models that become more powerful with increasing amounts of data.

Be wary of big budget proposals for ML technology that oversell exponential gains due to predictive analytics…

Developers working with industrial maintenance and prognosis data are more likely to be using large data sets, with 18 percent working with more than a million rows. This data is often systems data or collected from IoT beacons. The availability of massive amounts of data means that developer involved with network security or performance train some of the largest datasets, with 10 percent working with more than five million rows.

Lawrence Hecht has produced analysis and reports about enterprise IT markets for nearly two decades. He analyzes both distributed and decentralized technologies using surveys, interviews, and non-traditional market research techniques.

Feature image from the State of the Developer Nation 15th Edition’s cover art.