Analysis / Technology / Top Stories /

Machine Learning Lends a Hand for Automated Software Testing

13 Sep 2017 2:00am, by

Automated testing is increasingly important in development, especially for finding security issues, but fuzz testing requires a high level of expertise — and the sheer volume of code developers are working with, from third-party components to open source frameworks and projects, makes it hard to test every line of code. Now, a set of artificial intelligence-powered options like Microsoft’s Security Risk Detection service and Diffblue’s security scanner and test generation tools aim to make these techniques easier, faster and accessible to more developers.

“If you ask developers what the most hated aspect of their job is, it’s testing and debugging,” Diffblue CEO and University of Oxford Professor of Computer Science Daniel Kroening told the New Stack.

The Diffblue tools use generic algorithms to generate possible tests, and reinforcement learning combined with a solver search to make sure that the code it’s giving you is the shortest possible program, which forces the machine learning system to generalize rather than stick to just the examples in its training set.

“What we have built is something that does that for you. If you give us a Java program, which could be class files or compiled Java bytecode, we give you back a bunch of Java code in the form of tests for your favorite testing framework,” Kroening said.

The results can also be used as JUnit tests with some suitable mocking framework.

Diffblue started with Java because test automation is so common in the JUnit community, but it is adding support for Python, JavaScript (starting with Node.js), C# and eventually PHP. For all the languages, the tools can work with both source code and binaries. You can keep the generated tests so you can keep using them through your development cycle. The security scanner also provides advice. “If it finds a SQL injection flaw, it will give you the input strings that trigger the vulnerability,” Kroening explained.

Today the product is a web service running inside a Docker container, and you give the service the URL of your code repo. That makes it easy to integrate with an existing DevOps process. “If you have something like Jenkins set up it fits in there extremely well,” he noted. “In the longer term, I envision a lightweight environment that runs on a laptop, inside an IDE; think of it being integrated into Eclipse or Visual Studio. It’s extremely valuable to provide feedback immediately after you’re written a piece of code, not the next day after the Jenkins build has failed.”

Fuzzing It for You

Microsoft Security Risk Detection (previously known as Project Springfield) takes a slightly different approach. It’s a suite of fuzzers that find security bugs in Windows and (currently in private preview) Linux applications, and an automation platform that tests your binary with the many inputs these fuzzers create, in an Azure virtual machine.

“We’re packaging up some AI that we’ve been using inside Microsoft for over a decade into a cloud service that customers can use to spin up testing labs on demand and use them on the binaries they are about to ship,” Microsoft security researcher David Molnar explained to us. The idea: “How we can help people use a service on their binaries, on code they’re shipping and code they’re buying, on code that runs their business, that really helps them find bugs that they couldn’t find before?”

The AI in Springfield combines two techniques; time travel debugging and constraint solving. Molnar compared time travel debugging to working out how a car crash happened. “If you see a car crash, you can try to figure out what happened by looking at how the fender is bent and the way the wheel is turned or the state of the windshield, but you can only learn so much that way. Now imagine you had a video of the two minutes before the car crashed and you could step through it frame by frame. We can do that with your program as it’s executing. We combine that with a form of AI called Constraint Solving which allows us to ask ‘what if’ questions. I want to know, what do I need to change about my input to increase my coverage and make my if statement true?”

The ideas behind solvers go back to the 1970s but originally they only worked on small examples, he explained. “Now we can solve questions quickly and on a scale that would have seemed unimaginable before; we’re able to handle dynamic instruction traces of over a billion instructions.” The portfolio of fuzzing techniques in the service has also been developed over many years (the name comes from the dial-up line noise that inspired the first fuzzing techniques of crashing software by injecting noisy inputs).

“We’re adding more fuzzers on the back end and exploring how to increase the efficacy of the fuzzers,” Molnar said; there are new techniques both from Microsoft Research and the open source world that may make their way into the service in time.

Molnar is the researcher running the team behind Springfield; previously he helped apply the same techniques to products like Windows and Microsoft Office, finding a third of the security bugs discovered by fuzzing in the Windows 7 client. That required a lot of trial and error and hands on work, as well as expertise in fuzzing.

“Now we’re about to able to hide most of that behind a GUI; you don’t have to turn any knobs but under the hood,” Molnar said.

You do need to have some initial test cases for the service to start with, he explained. “Those can be the same functional or regression tests you might have lying around, or you can search the web to find some initial test cases.”

The information produced by the service is designed to be useful for testers as well as developers, so it does more than just find bugs; it also helps you triage them. “We make sure the bug is reproducible; we repro it five times. We have a part of the system that looks at every crash and tells you what kind of issue it is and if we think it’s exploitable or not.”

He called the technique for deciding that “a best effort” but suggested that the service gives even security experts who are familiar with fuzzers a level of deduplication and triage that simplifies their task.

Running the service over your code means uploading the binary to a virtual machine that you can install any necessary pre-requisites on. “What we like about giving customer a full VM is that we already see customers who have dependencies. If we told you to just upload a zip or give us access to your source repo there are customers who wouldn’t be able to use us, who can use us because it’s a VM. If you need to install SQL or a kernel driver or make crazy registry key modifications [for your software], you can,” he said.

Because the service works on binaries, you can use it on third party code where you don’t have the source. “You might have a DLL that your company has built over many years or you might want to test new software that you’re thinking of using. Some customers say ‘I have server-side code powering my business and forget about security, it just can’t go down, I don’t want anything that can cause it to fall over.’ You don’t need to know the details of the library or the source code or how to call it. When we used this service on Windows, the Windows team wrote a wrapper around the libraries that make up parts of Windows but they didn’t have to write annotations or explain the source code to us.”

The disadvantage of using a VM is that it’s harder to build into existing DevOps and testing workflows. It would be an ideal companion to entity repositories like Sonatype and WhiteSource that help you track what third party code you’re using as part of your development. That’s something Molnar hopes the service will be able to do in the future.

When you find bugs in third party code that you can’t change, he suggested a range of mitigations, like sandboxes, virtualization, EMET and Exploit Guard in Windows 10. “That’s why we call the service risk detection — because we help you figure out what the risk is, then offer a whole range of options to contain it.”

Beyond Bugs

Some researchers are trying to go further than just finding bugs to fixing them. The new Accurate Condition System from Microsoft Research, Peking University and the University of Electronic Science and Technology of China uses a combination of natural language analysis and statistical analysis of open source code available online to generate patches that aim to fix the underlying bug, not just pass a test suite.

Kroening also hopes Diffblue will eventually create code too. “The ultimate goal is to enable programming to be something that can be done by non-experts. I want to enable you to produce code from scratch, not debug existing code, by means of a learning engine.”

Feature image by Jairo Alzate on Unsplash.


A digest of the week’s most important stories & analyses.

View / Add Comments