Tackling AI’s Black Box: Howso Challenges PyTorch and JAX
An early employee and software engineer for the AI company Howso would, for fun, hook up the Howso AI Engine to video games like Rocket League and Grand Theft Auto Five. He was able to debug the data, much like debugging code, said Howso co-founder and CTO Chris Hazard.
“This particular individual wasn’t a data scientist by training, [he] was just a really good software engineer,” Hazard said. “We’re really trying to drop that bar where you put the data in, analyze, and if you just know a little bit of basic statistics, you know what uncertainty is, you know what plus or minus this is, you’re pretty much good to go.”
Developers don’t have an easy on-ramp for playing with AI: It’s expensive to train models and they tend to be a black box of proprietary magic. Howso AI Engine is an open-sourced machine learning engine that addresses these challenges by leveraging an instance-based learning approach.
Howso open sourced its ML engine in September and the AI engine is available on GitHub. There’s also a Howso AI playground: Developers want to be able to build something “cool” on their own, and this was one way to help the community do that, Hazard said.
“We are a very mission-oriented company and we want to change the way AI is done,” Hazard said. “We want to make things debuggable, understandable, fixable — because it’s hard to fix something if you can’t see inside, and so for us open sourcing is the way and it’s a sign of maturity of the company.”
Instance-Based Learning in Brief
Instance-based learning is one alternative to building neural network models, which are a type of machine learning algorithm that can be used to learn from data. Large language models are a type of neural network. Neural networks are made up of layers of interconnected nodes, or neurons, which are able to process information and make predictions. By comparison, instance-based learning is a type of machine learning algorithm that directly stores and classifies instances, learning by storing the training data. When faced with a new instance, the algorithm classifies it by comparing it to the stored instances and identifying the most similar one, then assigning the class of the most similar instance to the new instance, according to Bard. It’s also known as memorization-based learning or lazy learning.
Instance-based learning also allows you to edit the data on the fly without retraining the AI. For instance, if a controller robot bumped into something because of bad data, it’s possible to just delete the data that’s causing the crash and fix it “on the fly,” Hazard told The New Stack.
Howso, formerly Diveplane, open sourced its AI engine in September, making it available for download on GitHub. The GitHub repository explains that at the core of Howso is the concept of Trainee, “a collection of data elements that comprise knowledge.”
“In traditional ML, this is typically referred to as a model, but a Trainee is original training data coupled with metadata, parameters, details of feature attributes, with data lineage and provenance,” it explained.
Unlike traditional machine learning approaches, Trainees are designed to achieve a number of functions after a single training, including:
- Perform online and reinforcement learning;
- Perform anomaly detection based on any set of features; and
- Measure feature importance for predicting any target feature.
Use cases include making predictions and identifying patterns such as fraud, Hazard noted.
Still, Hazard acknowledged that instance-based learning AI does have some catching up to do to compete better with neural networks.
“We’re tiptoeing towards language,” he said.”In some ways, we’re retracing some of the paths of neural networks, but our goal is to eventually make them obsolete, replace things with this transparent technique. If you look at the history of neural networks in the past 15-20 years, initially it was just used for supervised learning and some other cool things and it grew in terms of capability. We’re retracing a similar path, but in a very different way.”
Instance-based learning as primarily used for structured data and some semi-structured data, he said.
Another problem with neural networks is that they are a black box, Hazard noted.
One way to think about machine learning is it’s just programming with data,” Howard said. “You have all this data, you create a model and make some predictions great, except this prediction was wrong. There may be some hidden bias and there might be some issues. How do you fix it? And a lot of the tools today are just not cutting it in terms of that. It’s a black box.”
Rather than explainable AI, which is AI that can explain how it arrived at a prediction, Howso is focused on what Hazard called understandable AI. One tool for that is calibration, a notion in AI and machine learning where the AI can give you a measure of the uncertainty around that answer, Hazard said,
“Are we right 80% of the time, 90% of the time? And if we predict that we’re right, how accurate is our prediction,” he said.
For instance, Hazard explained, that if a developer is working with a robotic sensor, the developer might want to predict how far the velocity can increase without crashing. Howso’s AI engine can present the range of speeds wherein it’s likely to crash. Then, it can be programmed that if the robot ever gets outside of this uncertainty range, it must stop and ask a person for help.
Howso’s ML engine also deals with the understandability problem by providing details on the data and features that drove the results, Hazard explained. The math underlying the Howso ML engine leverages probability theory and information theory, he added.
“It’s all math, all the way down in ways that we can understand; so if you say, why did it make this decision, it can say ‘I used these seven data points,’” Hazard explained. “So if you’re like ‘Oh, I see this was a bad piece of data, this is biased in this way,’ you can fix it and debug it just like you would software.”
That’s crucial to some industries, he added.
“Our customers are mainly in finance, healthcare, insurance and the like, who need to make decisions based on their data, but they really want to have that debug and that feedback loop,” Howard said. “So if you’re like ‘Oh, I see this was a bad piece of data, this is biased in this way,’ you can fix it and debug it just like you would software.”
The Role of Synthetic Data
Synthetic data is data that’s created algorithmically and used to train ML models.
The Howso engine can synthesize data when there’s missing data or sparse data, Hazard said. For example, if data can’t be shared, or it’s incomplete, it can be synthesized, creating a new dataset that has all the statistical insights but none of the original data, he explained.
“It is very strong, private with regard to privacy, differential privacy, etc.,” he said. “Now, you can share that with your colleague. So it really is sort of a transformable or transformative debuggable flywheel for data in general.”
Synthetic data can be combined with time series forecasting for business predictions, he added.
“We’ve had a managed service provider …. take a look at like p&l [profit and loss] records — where should I invest in my business to improve sales, improve revenue the best — basically plop it all in and using our time series, forecasting in a very understandable way,” Hazard said. “We can say, hey, here’s the uncertainties.”