It produced a list of customers likely to cancel, and the company’s customer relations department called these customers in an effort to retain them. The calls, however, gave those customers an easier way to cancel, only accelerating the churn.
The predictions were accurate, though the consequences unintended, Puget noted to laughter in the room.
“We need to have a feedback loop. We need to measure the business consequence of the predictions,” he said.
And with models so much easier to create, he told the group, many of the pain points for companies trying to use analytics effectively are internal.
In another case, the data science team built a great model, then gave it to developers who recoded it in a different language for implementation. A few months later, the data scientists came up with a better model, but the developers had been reassigned and weren’t available for recoding, so the refined model was never implemented.
“It has to be a DevOps story,” he said, pointing out that the tools for creating a model might not be the same as the tools needed to deploy and operate the model, but there needs to be a continuous process.
Applying Machine Learning
It’s aimed at addressing bottlenecks in four areas:
- Data and feature engineering.
- Model training, tuning and testing.
- Deploying and operating models.
- Updating models regularly.
The company touts its Pentaho Data Integration package for preparing automating data onboarding, data transformation and data validation — eliminating the drudgery of manual data-prep processes.
“You can’t just feed data into models,” said Wael Elrifai, Pentaho director of worldwide enterprise data science. “Have to do feature engineering and data engineering… making sure data is consistent and meaningful and differentiating between data that’s scalar and vector and so on.
“You’re not using machine learning to do data prep, you’re doing data prep to do machine learning,” he said.
With integrations for languages like R and Python, and for machine learning packages like Spark MLlib and Weka, data scientists can continue using their favorite tools for training, tuning and testing. But working inside the Pentaho data analysis platform, it’s easier for different teams to collaborate because everyone’s using a common platform and they’re building data pipelines that can be used for different purposes, said Arik Pelkey, Pentaho senior director of product marketing.
In Pentaho’s Weka Knowledge Flow environment, for instance, machine learning can be used to assess variable importance, find key variable interactions, and make initial predictive evaluations.
The drag-and-drop interface makes it simple to deploy and run models within existing applications with embeddable APIs, and update models regularly.
“You can’t have static models; models have to evolve,” Elrifai said.
Yet updating is an area in which companies tend to slide. Only 31 percent of organizations use an automated process to update their models, according to Ventana Research.
With Pentaho, data engineers and scientists can re-train existing models with new data sets or update features using custom execution steps for R, Python, Spark MLlib and Weka. Pre-built workflows can update models and archive existing ones automatically.
Building an Insights Team
In the years ahead, just having a lot of data about your business won’t translate into competitive advantage because everyone will have lots of data, Forrester Research analyst Brian Hopkins told an audience at AnacondaCON conference in February.
Companies that capture more and better data and use closed-loop systems that iterate on feedback will come out on top. He calls these “systems of insight.”
“Companies that bake an evolved data science into systems of insight will win,” he said.
— The New Stack (@thenewstack) February 8, 2017
Insight-driven companies are growing at about 27 percent compound annual growth rate while the overall economy is only about 3 percent. Startups are growing even faster.
The trick is to find the right data and then use that data to take action.
“Anytime you apply an insight, there’s data being produced about the results of that insight, and firms are not capturing that data back into the process so they can learn and optimize around it,” he said.
Effective companies create insights from data, then test the insight and carefully instrument software and processes to iterate on that insight and capture more data for the next round of iteration.
He describes a cross-functional group of five to 15 data scientists, developers, data engineers and domain experts in an “insight-ops”-type of team supported by a technology manager whose job it is to get the right technology and the right data for the team, and who works for a business leader who has an outcome to own and a budget to spend on that outcome.
— The New Stack (@thenewstack) February 8, 2017
It has to be a team sport, he says.
“[Data scientists] are not separate teams that you toss requirements over the fence to; They are embedded into insight-driven businesses,” he said. “They’re the team driving the difference that creates the brand and value that drives corporate valuation.”
And they don’t start with, “Hmmm. I have a lot of data, what can I do with it?” They start by identifying the outcome and the metrics, and sometimes the interim metrics, that they believe will give them a competitive advantage.
They test that insight and toss it if it doesn’t pan out. They keep refining the insight using with data management, analytics and insight execution in a single platform that supports iterative, agile development, he said.
That platform has a Big Data foundation, a data pipeline made up of an analytics runtime — one place to run all your models — and insight execution framework made up of tools or microservices that allow you to deploy the components of an insight application and combine that flow with things like insight caches, graph databases, things that can hold the results of analytics. They use applications and software specifically instrumented to collect the specific data you need for the insight loop, he said.
Artificial or advanced machine learning can enhance the effort, he said, by helping you figure out what’s important about the data and prepare it for analytics, using more advanced algorithms to do feature engineering, greatly reducing the abstractions used to run tests and select the models that drive outcomes best and simplifying the process of customizing generic models to specific situations.