Machine Learning

A Gentle Introduction to Machine Learning

14 Apr 2017 1:00am, by

This piece is the latest in a series, called “Machine Learning Is Not Magic,” covering how to get started in machine learning, using familiar tools such as Excel, Python, Jupyter Notebooks and cloud services from Azure and Amazon Web Services. Check back here each Friday for future installments. 

I was fascinated by a demo that I saw at one of the Microsoft conferences. The presenter uploaded candid pictures of people to a website called how-old.net, and it detected the age of all the individuals in the photo with great accuracy. Microsoft claimed that the site was powered by Machine Learning which obviously raised my curiosity. Since then, I came across many amazing use cases and scenarios that are based on Machine Learning.

From predicting weather to enabling an autonomous car to drive itself, ML is everywhere. As a technologist, I always wondered what goes behind the scenes of such powerful applications. Most of the guides that are available for ML introduce various concepts without really demystifying it for developers. My objective is to connect the dots to demystify how the concepts of ML are applied in these complex use cases.

Before we delve into the details, it is important to understand the expected goal of an ML implementation. At a very high level, the aim of any Machine Learning program is to either predict an outcome or classify an item. For example, as developers, we are expected to incorporate ML to predict the stock price of a specific company based on historical data; predict how much rise an employee would get in five years; categorize a set of resumes into the profiles of architects, developers, and admins; Find if a credit card transaction is genuine or fraudulent.

Predicting a value based on existing data.

Classifying an object based on existing data.

Microsoft’s how-old.net website uses a classification technique of ML to place the person within a specific age bracket.

Finding age of a person based on existing data.

 You may argue that it is entirely possible to achieve the goal without having to go through the process of learning and implementing ML.

For both the tasks involving prediction and classification, it is possible to write programs that use multiple, nested if conditions to arrive at a specific outcome. This approach has two problems. Firstly, it is too brittle, which means when the program encounters unexpected input that doesn’t match the predefined conditions, it would fall flat. The second problem is that it is not maintainable. With all the hard-wired logic that went into the program, it’s extremely complex to modify the logic to accommodate changes.

One key thing that we need to understand is the fact that there is existing data, and many times we have access to quite a bit of data that we can use to draw inferences. Machine Learning solves problems by relying on the same fact — the availability of existing data. It relies on historical data to build a sort of dynamic rules engine that is future proof. Unlike the fragile code with multiple if and then conditions, this provides a robust approach that can handle a wide range of input.

At the risk of oversimplification, let me tell you that Machine Learning can be thought of as an intelligent and dynamic rules engine that doesn’t use conventional decision-making techniques often found in programming.

We are now a step closer to understanding the core concepts and terminology of Machine Learning. It’s time for us to look at a real world problem.

The table below shows a subset of salary data from Stack Overflow calculator for developers working in the New York office.

A subset of salary data from Stack Overflow.

Based on the above dataset, if I ask you to predict the salary of a candidate with 6 years experience, what would be your approach?

Does this remind you of the viral memes that we often see on Facebook?

Facebook Maths Meme.

There is a similarity between the above two problems. We are expected to find how one parameter influences the other. But, I am digressing.

Let’s come back to the problem statement. Where do we begin from? To keep it simple, we can assume that all parameters like position, skill level, location are constant. It all starts by finding the correlation between the number of years of experience of a developer and his salary.

Let’s try to calculate this by finding the difference between the salary of two developers with just one year of variation in experience. We will subtract 104,900 from 103,100 to find the difference of salary between a developer with no experience and a developer with just one year of experience. The difference comes to 1,800. Repeating this process for each row gives us a sense of increase that an employee gets with each additional year of experience.

The increase in salary with each additional year of experience.

It turns out that with each additional year of experience, Stack Overflow commits to increase the salary between $1,700 and $1,900. That’s not a bad deal.

It is now safe to assume that the increment in salary can be an average of 1,700 and 1,900, which is 1,800 (1,700+1,900/2). Going by this logic, a developer with seven years experience makes about $114,100 and with another year of experience, he would get $115,900.

Let’s now compare it with the real data from the calculator.

Stack Overflow salary for a Developer with six years of experience.

Stack Overflow salary for a Developer with seven years of experience.

Though it is not 100 percent accurate, we are close. It does give us some confidence for going ahead with the hypothesis.

At this point, we have two important assumptions:

  1. Starting salary = $103,100
  2. Increase per year = $1,800

Please make note of these facts. We will revisit them in the future sections of this tutorial.

Let’s pause for a moment and recap the concept of prediction in ML. Though we are experimenting with a known set of data, the premise is the same — based on certain assumptions, we attempt to make a prediction.

Predicting the salary based on assumptions.

We can now apply these assumptions to extrapolate the dataset to predict the salary for up to ten years or even 20 years. We do this by simply adding $1800 for each additional year of experience. Let’s also compare our predictions with existing data.

The below table shows the actual salary and the predicted salary side-by-side.

Actual salary vs. predicted salary for 10 years.

Though we are close to the actual data, each prediction is off by $100. The variance becomes clear from the below table.

The difference between actual and predicted salary.

To summarize, we attempted to infer the correlation between experience and salary. Based on an assumption, we have gone ahead and predicted the salary, albeit with some inaccuracy.

We have reached an important milestone in our journey towards learning ML. Without using the jargon, we managed to learn the core premise of Machine Learning — to predict the value of an unknown parameter from a known dataset.

In the next part of this tutorial, we will explore the key concepts related to Machine Learning. I will explain you the differences between Supervised Machine Learning and Unsupervised Machine Learning. We will also understand what an algorithm is and how to create a Machine Learning model by combining existing data and algorithms. We will map the official terminology of ML to the concepts we learned so far. Stay tuned!

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.