Machine Learning: Cutting Through the Hype
Every technology vendor, it seems, is taking advantage of the buzz around machine learning (ML), especially in cybersecurity, where vendors know that their clients need to keep pace with ever more intelligent threats. Including machine learning as a feature-benefit of a product has almost become “table stakes,” meaning that companies that don’t claim some element of machine learning may be at a huge disadvantage compared to competitors. As a result, marketing teams are stretching the limits of what “machine learning” means, and potential product users are left alone to separate fact from fiction.
Machine learning is a large, active area of research that is developing quickly, which means that capabilities and outputs from one company vary significantly. If you’re evaluating a product that promotes machine learning as a feature-benefit and want to ensure you’ll get what you’ll pay for, there are a few things you should know.
Ask Who Their Data Scientists Are — Good AI Requires Expert Human Participation
Yes, it’s true that in the future some jobs currently performed by humans will be taken over by machines. In their place, however, new types of jobs will be created. Machine learning is a perfect example. No ML system can operate without the direct input and expertise of a skilled human — preferably, an expert data scientist. ML is much more than uploading a bunch data and letting a computer “learn” from it, after which it spits out results.
True machine learning uses complex algorithms that must be selected by a human to determine outcomes. A human must choose the data to input and define the goals or outputs. There’s an old computer science saying that applies here: “Garbage in, garbage out.” This couldn’t be more true when it comes to machine learning. The results of any ML system are only as good as the data fed into it and its process for normalizing. In other words, data curation is an extremely important element of machine learning, and no machine is, or will be in the foreseeable future, capable of making those determinations. Further, the data set used for one use case isn’t necessarily portable to another. Only a human can evaluate varying use cases and decide on the appropriate types of data for that project.
So ask potential vendors about their data science subject matter experts. Any organization running a valid program will be more than happy to share their experts’ background and experience. A good place to start? Ask the following questions: What data is used? How is it trained? What is tested?
How Old Is the Training Data?
There’s a lot of hype in the industry around how machine learning is determinant of future decisions or actions, but the reality is that predictions or decisions made by today’s machine learning systems are entirely dependent on historical data. Even if a vendor is feeding its most current data into its ML engine, by the time the data is processed and delivers recommendations, the data is “past.” Though “past” may not be a perfect description for data that’s, say, 30 minutes old, but it is accurate to say that machine learning can never be 100% current, and it certainly cannot yet account for new variables.
When evaluating a vendor’s technology, it’s important to ask about the “aging process” of the data inputs, how often the company’s data scientist adjusts algorithms to account for mistakes of the past, and how often new requirements are fed into the system.
As such, when evaluating a vendor’s technology that claims to incorporate machine learning, it’s important to ask about the “aging process” of the data inputs, how often the company’s data scientist adjusts algorithms to account for mistakes of the past, and how often new requirements are fed into the system. Machines are not “thinking,” per se, but are rather looking for patterns, and it’s important to understand how organizations are handling the data and logic.
While some amount of history is inherent in machine learning, a good engine can adjust for accuracy. But you need to know how each vendor is tackling the past-present-future problem.
How Has the Vendor Defined the Goal of ML?
How a company establishes its goal functions for its machine learning system is a critical element. Even if the company has a top-notch data scientist who chooses, finds or creates a vast and reliable data set, picks the right learning algorithms and selects appropriate inputs and outputs, if that person (again, see point #1, above) fails to define the problem that the learning system is solving for with enough accuracy or in enough detail, the results will be unreliable.
For instance, if the goal of the machine learning algorithm is defined as, “determine secure connections to my MongoDB,” but the inputs don’t account for usability requirements, a reasonable outcome of a non-human decision based on an analysis of a network environment might be: “No connections are secure.” While in absolute terms this may be factual, if a database is 100% secure, but unusable, you may as well not have a database at all.
In this example, the machine learning algorithm needs to balance usability and security, and the only way for this to happen is if the person managing/operating/administering the machine learning engine has carefully characterized:
- the problem;
- the goal; and
- the learning process, including what margin of trial and error, is acceptable.
Depending on the application and the use case, the overall accuracy of the solution may be “good enough” or not. It’s best to be clear beforehand about what level of accuracy (and tolerance to which kinds of mistakes) you can live with.
What Is the Vendor’s Testing Process?
Machine learning can be transformative, but just like any other engineering effort, it requires robust testing to be effective and accurate. Any organization using machine learning or buying a product with machine learning baked in must understand the logic flow behind the data, then test the system to ensure that false positives are fleshed out whenever possible. Like all other security technologies, machine learning is not “set it and forget it,” so any company presenting a machine learning solution should be committed to testing and adjusting the logic flows, input data, and algorithms used to arrive at machine-learned conclusions.
There is no doubt that machine learning is and will continue to be a viable and useful technological capability. That said, as it’s still a new capability, there is a huge amount of variation in the market and savvy users and buyers need to do their homework on what’s real and what’s just marketing speak. Armed with a little research and a bunch of questions, it’s relatively easy to separate the wheat from the chaff.
Feature image via Pixabay.