Finnish Startup Valohai Wants to be the ‘GitHub of Machine Learning’

In Finnish, a valohai is a lanternshark, a breed that dwells deep within the ocean, providing illumination through bioluminescence. That’s the vision of machine-learning-as-a-service startup Valohai.
The Turku, Finland-based company bills itself the “GitHub of machine learning,” setting itself apart by not only offering machine learning infrastructure-as-a-service but focusing on collaboration and ML team workflows.
“Every company out there is trying to reinvent the wheel instead of focusing on their actual machine learning solution. We are automating all the necessary grunt work and empowering developers to take us into the future faster,” said co-founder and CEO Eero Laaksonen. “So the name is an allusion to working with deep learning.”
Its machine-learning platform automates the training infrastructure and provides tools for collaboration.
“Many people feel AI and automation are dehumanizing society; we think it’s important to note that we’re working to achieve the opposite,” Laaksonen said.
Infrastructure and More
The company was founded in October 2016, but Laaksonen said he and his cofounders — Otso Rasimus, Aarni Koskela and Ruksi Laine — go back at least 10 years.
“Before Valohai we’ve learned a lot, built multiple companies, worked in San Francisco. Valohai presented us the perfect opportunity to work together on something meaningful,” he said.
Many machine learning algorithms are nothing new, he said, but now we have the computing power and data available to actually use them.
Forrester Research, for one, has said 2017 would be the year when businesses gain direct access to powerful customer insight through new cognitive interfaces and other AI-related tech. It predicted the global artificial intelligence market will reach $1.2 trillion by 2020, while research firm Markets and Markets forecast the machine learning-as-a-service market, in particular, will reach $3.75 million by 2021.
Instead of months of DevOps work, training your model on GPUs requires a simple configuration file. The service maintains automatically scaled clusters of worker servers for client use on AWS. It touts these are costing 50 percent less than market prices, billed per-second. It’s also is available for private environments using installable agent software.
“We are doing to machine learning what continuous integration and version control have done to programming,” the team states in a forum at Hacker News. It’s seeking open beta users.
Valohai supports frameworks such as TensorFlow, Keras, Torch and Caffe — actually, it can run any language or machine learning library that can be packaged into a Docker image, according to the company.

Fierce Competition
The company faces big-time competition. McKinsey reports that tech giants such as Baidu and Google spent between $20 billion and $30 billion on AI in 2016, with machine learning attracting nearly 60 percent of overall AI investment.
Laaksonen names Amazon, Google and Microsoft among the “scariest” competitors.
“However their solutions are proprietary, black box and lower level,” he said. “For example, Amazon Web Services’ current machine learning offering doesn’t let you decide which algorithm to use and the end result is a web API. On Valohai, you decide which framework and algorithm to use, and the resulting model can be deployed wherever you want, be it a web server or an IoT device.”
It also faces off against startups such as FloydHub and Nexosis.
Laaksonen stresses that Valohai is more than just providing infrastructure. In addition to collaboration as a GitHub-like space for teams, its focus is on real-time results, record keeping and repeatability.
You can run multiple parallel experiments. Experiment metadata is available in real time and visualized between other concurrently running experiments. One or more linked Git repositories define what kind of “runs” or “tasks” can be executed within a project context. The version control repository will have a valohai.yaml file that defines these execution templates.
Everything that an execution ingests and produces is recorded and accessible through a web browser or command line. You can also access the data via a REST API.
If log output looks like JSON, it is interpreted as chartable metadata that you can compare between executions.
Feature image by Dawn Sobieski, via FreeRange Stock.