Machine Learning / Security / Sponsored / Contributed

Machine Learning and Beyond: Algorithmic Detection in Security

19 Mar 2018 8:37am, by

Vishwanath Raman, StackRox
Vishwanath Raman is Principal Scientist at StackRox, where he works on machine learning and detection. His experience covers a range of areas that include machine learning, automata learning, security, static program analysis, program exploration using symbolic methods, formal synthesis, image processing, and in-car driver assistive dialog systems. In his spare time, Vishwa enjoys cycling, photography, and cooking.

This article discusses the ways we can use algorithmic detection in all aspects of security. We define algorithmic detection as the ability to perceive threats, analyze them and contextualize them using algorithms that are either automata-theoretic or statistical.

For algorithmic detection to succeed, we need both first-class security expertise and a collection of algorithms that are sufficient to enable effective defense. Knowing that there is no one-size-fits-all approach, what we want is an ensemble of algorithms, each tuned to detect the largest set of attack phases with high detection accuracy; very few false positives and a high rate of successful detections.

In this article, we’ll describe several categories of algorithms, the importance of effective data collection and labeling and the central role that security expertise and research plays in what we call the “security virtuous cycle” to enable algorithmic detection.

Introduction

Algorithms are cheap but data is expensive. Over the last couple of decades, with the increasing prevalence of open-source software, we have mature code contributions with liberal licenses that cover the gamut of algorithms and languages; if you want a library in Golang for non-cryptographic hashing such as murmur hash, or a Java library for linear algebra, or a C++ library for discriminative training, or a Python library for deep neural networks, to name a few, open-source is your answer. This has lead to a democratization of algorithms, obviating the need to re-invent the wheel while enabling a focus on building more complex systems that use sophisticated implementations off-the-shelf.

Data, though, is a different story. If we have a backhoe, then we can grab as much data as we want from all the way up in the interwebs through all the way down to individual containers supporting a service in an application in a cloud. But what about labeled data? That is expensive! Look no further than Luis von Ahn and the ESP Game, a platform that crowdsourced labeling images via a brilliant piece of social psychology to vastly improve Google image search, to understand the work that it takes to collect labeled data.

But we are getting ahead of ourselves. Let’s look a bit deeper into algorithms for security. What do they look like and what are their categories? First, here is our perspective about AI for security. There is a lot of talk of AI for security in the community. On the one hand, one can take a broad definition of AI and include everything from rule-based systems with an inference engine to sophisticated statistical models under the category of AI. On the other hand, one can define it narrowly to mean systems that pass the Turing Test, which would be systems that are indistinguishable from humans in their ability to perceive, learn, abstract and reason. Our stance is to stay focused on the problem at hand and to use the most parsimonious and practical algorithms that give us high detection accuracy without impeding high-performance data collection and transformation.

Algorithmic Detection

At a high level, every algorithm we present is solving a search problem. In the case of generative statistical models, it is searching for model parameter values that give us precise summaries of observations. In the case of discriminative statistical models, it is searching for a model’s parameter values that help us discriminate between classes of observations by optimizing a loss function.

In the case of automata-theoretic models, it is searching for paths that lead to malicious behaviors, synthesizing attacker strategies on a model of a cloud ecosystem, searching for a sequence of Indicators of Compromise (IOCs) that indicate a particular phase of an attack, and so on. But what are these algorithms doing under the hood? To answer that question, let’s take a deeper look into these algorithm categories.

Automata-Theoretic Algorithms

Think of finite state automata as directed graphs with nodes representing states and edges depicting transitions with one or more terminal states. The terminal states may be accepting or rejecting. Checking if a sequence of IOCs form an alert amounts to walking an automaton from a starting state, taking an edge labeled by the next IOC in our sequence, generating an alert if we reach an accepting state, and discarding the sequence otherwise.

Algorithms that operate on such transition graphs fall under the category of automata-theoretic algorithms. In the above example, they rely on security expertise to build pattern recognizers for IOCs, and rules to recognize sequences of IOCs as shown in Figure 1. Algorithms in this category have wide applications, including security protocol synthesis, formal reachability analysis for vulnerable behaviors, and attacker strategy synthesis using zero-sum objectives over two-player turn-based game graphs.

Generative Algorithms

Generative algorithms fall under the category of statistical algorithms. They are used for anomaly detection by forming baselines of application behaviors, also known as generative models, and then detecting deviations from these baselines as anomalies. They rely on security expertise to determine which signals are, collectively, good proxies for system behaviors and the choice of data distribution that best models observations.

On a technical level, the parameters of the data distribution, e.g., mean and variance for normally distributed data, are modeled not as point estimates, but as coming from a distribution called the parameter distribution. A Bayesian update algorithm, Figure 2, updates the parameters of the data distribution using batches of observations during learning. In production, if an observation cannot be reasonably generated by the learned model, then it is considered an anomaly.

Discriminative Algorithms

Many discriminative algorithms also fall under the category of statistical algorithms and are used to build discriminative models. They are used to discriminate between multiple classes of behaviors. They rely on security expertise to determine which signals are proxies for system behavior and, more importantly, to create labeled datasets of observations, where each observation has a label giving its membership class. A deep neural network discriminative algorithm works on a network structure as shown in Figure 3.

The learning principal at play here is to feed observations forward through the network and propagate errors in the network’s ability to correctly classify observations, given their labels, back through the network. This backpropagation step tunes the network parameters to lessen the likelihood of making the same mistake again. Once a network is well trained, and continuously adapted to changes in data, it becomes effective at classifying production behaviors, and in particular, malicious behaviors.

What We Desire in Security Algorithms

Having covered some useful algorithm categories for security, inspired by a video posted by John Launchbury of the Defense Advanced Research Projects Agency (DARPA), the following are the capabilities that we desire for algorithms in cyber-security; Perception, Learning, Abstraction, and Reasoning.

The automata-theoretic algorithms cover perception and reasoning with the ability to model attacker intent and synthesize attacker strategies over security objectives that are zero-sum. The statistical algorithms cover perception and learning with generative algorithms useful for threat hunting and discriminative algorithms for identifying attack phases.

The confluence of these algorithm categories makes it possible for us to explore abstraction, not general purpose AI-like abstraction, but in the limited domain of security, to get us contextual adaptation as a security strategy. Contextual adaptation is the ability for systems to construct contextual explanatory models for classes of real-world phenomena. In the realm of security, this gives us the ability not only to detect but also to use alert context, which is the metadata around the IOCs that generate the alert, to anticipate attacker behavior and adapt our response accordingly. To make this all tick and work effectively and in unison, we need one last ingredient. We call it the security virtuous cycle.

The Security Virtuous Cycle

Take Google Voice text messaging. For every sentence that one speaks, the system performs transcription and repeats the transcript back to the speaker. When the speaker confirms that the transcription is correct, the message is sent. Otherwise, the speaker is prompted to repeat the message again. The self-interest of the speaker in getting her or his message right is the incentive by which Google collects labeled data at a global scale! In this case, and in the case of image labeling using the ESP Game, the incentive models in play ensure collection of high-quality labeled data.

In a security setting, this role is played by security researchers and white hat hackers. Their incentive model is to anticipate and outwit the continuously evolving attacker’s modus operandi. Toward this end, besides creating realistic application stacks, security research is responsible for building attacks on them, collecting signals, sifting through signals for those with discriminative power, transforming signals for use in learning, detection, alerting and reporting, response and finally mitigation via inoculating containers, services, applications, orchestrators and hosts. When all of these play in concert we achieve what we call the security virtuous cycle.

The lofty goals of contextual adaptation, where we are not just detecting, but also continuously anticipating attacker moves and thwarting them by a dynamic re-investment of resources and algorithms, cannot be achieved without this key ingredient. When we add to this the requirement to take as small a fraction of the available cluster resources as possible while still guaranteeing high detection accuracy, we not only need parsimony in algorithm choice, but also in collection strategies, data transformation strategies, and response strategies.

Conclusion

We leave the reader with two thoughts that are apt to this blog; Algorithms are cheap whereas data is expensive, and skewed training data creates maladaptation (a quote from John Launchbury). Strong security expertise with security research is key to address them both! The lofty goal of contextual adaptation in security via algorithmic detection will be impossible without this expertise.

What is important is that we optimize for performance, collection and transformation, and detection accuracy by a suitable choice of parsimonious algorithms with strong research contributing signals, labeled observations, and algorithm choice, for realistic application stacks and attacks. Adaptive strategies to thwart attackers by learning their intent via contextualization will be a strong driver for security in the years to come!

StackRox sponsored this story.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.