All software has bugs. But standing in front of the CEO of a major customer getting shouted at for the bugs in the software his team of consultants had delivered, Embold CEO Vishal Rai found himself wondering if software has to have quite so many bugs and whether automated tools could do better than human developers. After some research, he decided that giving developers access to static analysis tools that look at source code would be the single best way to improve their code.
“Software might have eaten the world, but now bugs are eating the software. If every company in the world is now a software company, then every company needs a software development stack that’s going to help them write better code,” Rai told The New Stack.
“We have the ‘what’, which is tools to write features in. Once you’ve written your software, we have the ‘where,’ which is GitHub and Bitbucket. But in between is the ‘how’; how do you write software well?”
Embold aims to fill at least part of that gap. It analyses source code using natural language processing (NLP), machine learning and a set of algorithms to find design issues, bugs and anti-patterns. The output is visualizations of code quality and complexity, high-level metrics for code quality that you can track over time and a prioritized list of the most significant issues, along with machine learning-driven recommendations for fixing them.
There are dozens of static analysis tools, but Rai believes few of them are easy for “average developers” to understand.
“Either they’re super expensive, complicated tools or open source tools that give you metrics, but you still have problems in your code,” Rai said. “If they give you too much data, saying that there are 50 critical issues, 100 medium issues and a thousand minor issues, how does that help a developer who only has 15 days to ship their code? What they need is a static analysis tool that finds the 50 critical issues and shows which one is the most critical and will give you the most bang for the buck when you fix it.”
Scoring issues is more helpful than just grouping them as critical, major or minor, he suggests. “Every software has bugs and there are always some components that are better written than the others. We rank every component in terms of how well designed and coded it is; we score them from minus five to plus five and we aggregate that to a system score. So instead of saying your software has 8,000 classes and 50 are critical, we rank all of them so you can say ‘let’s go after the top ten worst components and fix them.’ We also tell them why a component is bad; if it’s designed poorly with massive dependencies. Developers don’t have to worry about how they should judge the problem, the platform ranks it for them so it’s consistent.”
Architecture and Antipatterns
At the highest level, Embold shows the overall score for the codebase, calculated from the scores for metrics like complexity, duplication and coupling between objects, the number and level of code issues and the design quality of components, along with the size of the codebase and the number of “hotspots” — areas with a large number of issues. Having metrics makes it easier to track the quality of a project over time and see whether you’re making improvements, just keeping up with issue reports or falling behind.
You don’t have to make any changes to your code to start using Embold; just point it at the git master and start scanning.
You can have multiple projects in a dashboard, so you can see the state of all your projects and repos together to get an overview or where you have the most problems, and you can start clicking through to see more detail on specific projects.
The high-level dashboards show heat maps of the size and quality of all the components in the codebase; critical quality issues in large components will probably take more resources to address than the same level of issues in a smaller piece of code. It also uses information like update frequency.
“If a component churns frequently and there are hundreds of commits, the risk factor is high,” he suggests.
Use a slider and you can see just the components with low ratings, to see where work needs doing. “You can say ‘this is the worse component and it’s designed poorly, so we should give that to the architect, but these components only have these code issues, so the developers can fix them’. It helps you make a judgement call on how to address things,” Rai explains. “Teams always have a finite amount of time to work on code, and this gives you better tools to manage that.”
You can walk through an annotated view of the components and see how they’re used in the code, with a diagram of dependencies; something that helps a lot with understanding a codebase but is usually hard to see.
Embold finds antipatterns; coding habits that tend to cause problems. “We developed about 22 antipatterns working from first principles in programming” like having a class with a lot of functions, having a lot of duplicated code, not having default labels on switch statements, Rai explains. They’re not language-specific, but many of them relate to object-oriented programming. “There are C++ antipatterns around global variables that everyone knows about but people use them because they’re convenient.”
Antipatterns can also expose security vulnerabilities. “Design has a strong correlation with bugs and security is a consequence of designing well. If you’ve got too many dependencies, you’ll probably have a memory leak. If you’ve got a shotgun pattern, you’ve got latencies.
There’s a built-in workflow for pull requests, and you can see code changes between versions side by side.
Use a slider to refactor code by reducing or increasing granularity and Embold creates new components with English names. “We use NLP to understand the intent of the class, based on variable and method names.” Even if you don’t let it handle the refactoring for you, you can see the connections between methods and functions, so you can discover the functional unit in a class and extract it. If several of those new components get the same name, maybe they’re a common task that should be a class you call rather than five ways of doing the same thing.
Smarter Static Analysis
Static analysis is based on rules that code is checked against; usually those rules are hand-written by experts and unless you write rules for your own codebase, they won’t reflect what you’ve already learned about it and what bugs have been fixed in the past. Rai estimates that a high percentage of bugs reported by users are bugs that have been reported before, fixed before and reintroduced (usually by a different developer).
When a developer joins a team and accidentally recreates the same bug that was fixed in the codebase by someone else a year ago, why can’t a tool spot that and show them how it was fixed the last time before the code even fails a test? That’s what the latest tool in Embold aims to do.
“Imagine if one could learn from all the issues reported in a project and see how the issues are fixed and remember that as rules for future developers. Or if we could train a neural network though all the popular open source issue databases on the planet and how they were fixed. That would be very powerful.” For now, the recommendation engine trains on your git repo or issue system and learns rules from that.
Embold’s Intelligent Analyzer tracks the changes in a repo over time (it works with GitHub and SVN, with support for git, GitLab and BitBucket planned) and connects to other tools like JIRA and FindBugs to match tickets and issues to bugs and code changes.
“It sees which commits in your repo created bugs and it remembers those commits. When you’re writing code and it sees a similar pattern, it notes that this line looks very similar to this issue that was reported and here is how it was fixed.”
The tool also analyses new issues reported and suggests areas of code that may be responsible for the bug, and suggests developers to work on them.
Currently, the AI is still in beta, and is available for Java, C and C++. The emphasis on these languages isn’t coincidental. The company is working on having Embold certified as a functional safety tool for automotive and medical software development. A number of autonomous driving platforms are already using it for their C++ lidar and radar development.
He’s also hoping that open source projects will start using Embold. While it’s commercial software and available as SaaS or to run on your own servers, the hosted SaaS version is free for open source projects. To show that it could be useful, he ran it over the Kafka repo, where the recommendation engine found a missing conditional check; “we hadn’t written the rule for that, but it found it from issues that had been reported previously in the project.
Feature image via Pixabay.