Sometimes Less Is More: How Simple ML Models Win More Often
Machine Learning (ML) is not just hype, it’s here to stay. While modern ML techniques such as Deep Neural Nets (DNN) are extremely powerful and can effectively capture the nuances of complex data, their complexity and black-box nature can make them difficult to trust — and even more difficult to explain.
The biggest challenge for a data scientist or ML engineer is explaining these models to stakeholders who may not have a technical background. Here is where intuitive, simpler models come into play. While they might not capture the full complexity of your data like a DNN can, these intuitive models are much easier to explain. They frequently yield results that are more than good enough for making informed decisions, which can significantly ease the process of getting buy-in from decision-makers, including your boss.
The Need for Interpretable Models
ML powers everything from customer recommendations to fraud detection, giving companies the edge they need to stay competitive.
However, properly harnessing the power of ML requires that stakeholders understand these models. This understanding is key to aligning the models with the company’s strategic objectives, which is essential for cohesive and informed decision-making.
Involving stakeholders early on in the modeling process with interpretable models fosters collaboration between data scientists and decision-makers. This approach leads to decisions that are both data-driven and business-savvy.
Exploring Intuitive ML Modeling Methods
So, what do we mean by “intuitive” machine learning models? These are models that prioritize transparency, simplicity and interpretability. They are the models that you can explain to your boss without needing a whiteboard full of equations. They focus on being easily explainable over capturing every tiny variance in data. In this post, we will explore a few of these intuitive models, discussing how they work and their applications in simple terms:
Linear regression is the trusty ruler of your ML toolkit. Imagine trying to predict the price of a house based on its square footage. You plot this data on a graph, and linear regression is the process of drawing the straightest line through the plotted points. The idea is simple: as the size of a house (in square feet) increases, we expect its price to increase as well, and this relationship is linear.
As the above visualization shows, the linear regression model tries to fit the best straight line that captures the trend in the data, which can be used to generate predictions. We can use this line to predict the likely price for a house given the square footage. But life isn’t always a straight line. While linear regression is easy to understand and explain, it assumes a constant rate of change. This can be a limitation when real-world data has relationships between variables that are not strictly linear. While useful for predicting real numbers like the price of a house, linear regression is not quite the right tool when you want to classify examples into a distinct set of classes.
Logistic regression is a sibling of linear regression geared toward classifying inputs into a distinct set of classes such as spam or not spam for emails. Imagine the odds that it will rain tomorrow. Logistic regression takes various factors (like humidity, wind speed, etc.) and calculates these odds as a probability between 0 and 1. It’s a “yes” or “no” kind of model where 0 means “No” and 1 means “Yes.”
The result of a logistic regression model is the probability curve, so we can use the model not only to classify points but also to determine the probability that the given point belongs to a particular class. However, it is limited when handling highly complex relationships in data where the decision boundary is not just a simple curve. What if we could model our decision-making process like how we make decisions ourselves?
Think of decision trees as a flowchart-like structure where a series of yes-or-no questions guides you systematically toward a decision. For example, deciding what to wear based on the weather might start with the question: “Is it cold?” If the answer is “no,” the next question might be “Is it windy?” and so on. Each question narrows down your options until you reach a conclusive decision — like choosing a raincoat on a cold or windy day.
Decision trees are intuitive and easy to visualize, making them an appealing choice for both data scientists and stakeholders. They can map out complex decision-making processes in a way that’s straightforward.
However, their simplicity can be both a strength and a weakness. For datasets with intricate, subtle patterns, Decision trees can sometimes be too blunt an instrument. They tend to carve the data into broad, clear-cut categories, which can mean they miss the finer nuances. In trying to capture these nuances, they can become overly complex, resembling a tangled web of questions more than a clear, branching tree. This complexity, while aiming for precision, often leads them to ‘memorize’ the data, which is known as overfitting. In such cases, the tree is so tailored to the specific dataset it was trained on that it performs poorly on new, unseen data.
Suppose you run an online store and want to automatically sort customer reviews into positive or negative based on the text. Naive Bayes calculates the probability that a given review is positive or negative by analyzing the frequencies of words and word phrases within the reviews.
It’s a model known for its speed and efficiency, making it a popular choice for text classification tasks. Because of its straightforward approach, it can quickly sift through vast amounts of text and make relatively reliable and easy-to-understand predictions.
So, in our example, Naive Bayes would calculate the probability that a new review, based on its words, belongs in either the positive reviews or negative reviews category. It does this under the assumption that the words in the review are conditionally independent given the review’s sentiment — a simplification that often works well in practice, despite its “naive” nature.
K-means clustering groups similar data points together into clusters based on their features. Imagine you have a basket of fruits and you want to sort them based on characteristics like size, color and shape. You could use k-means clustering to determine the best groupings.
In this example, k-means clustering algorithm tries to find the best way to group the data points into two clusters based on their features. It does this by iteratively adjusting the cluster centers (the red markers) until it finds the most cohesive and separate groups.
One challenge lies in defining what “similar” means and determining the right number of groups, which is not always straightforward, especially with complex, high-dimensional data.
Striving for “Good Enough” Quickly
Time is of the essence. Striving for the “perfect” model often has diminishing returns. Simpler models can be developed, understood and deployed quickly. They offer a “good enough” solution that can drive decisions without waiting for the perfect answer.
It’s about striking a balance — between accuracy and interpretability, complexity and speed. In many cases, a model that is “good enough” and ready now is far more valuable than a perfect model that takes months to develop.
Empowering Stakeholders with Understanding
While simpler models might not capture every intricate detail, they provide a view into your data that is easy to understand and act upon. As we embrace the age of data-driven decision-making, the importance of these simple models cannot be overstated. In the end, it’s all about balance — between the simplicity that fosters trust and understanding and the complexity that real-world data often demands. We encourage you to explore these intuitive methods, keeping in mind the goal of achieving timely, actionable and transparent results.