Development / Machine Learning / Contributed

Can Self-Supervised Learning Teach AI Systems Common Sense?

16 Sep 2021 10:00am, by
Pieter Buteneers
Pieter Buteneers is an industrial and ICT-electronics engineer. He started his career in academia, first as a Ph.D. student and later as a postdoc, where he did research on Machine Learning, Deep Learning, Brain Computer Interfaces and Epilepsy. He won the first prize in the biggest Deep Learning competition of 2015 together with a team of machine learners from Ghent University: the National Data Science Bowl hosted on In the same year, he gave a TEDx talk on Brain Computer Interfaces. In 2019 he became the CTO of, a platform to build multilingual chatbots who communicate on a human level. In 2020 was acquired by Sinch and now Pieter leads all Machine Learning efforts at Sinch as Director of Engineering in ML & AI.

Imagine having an artificial intelligence (AI) system that is capable of mimicking human language and intelligence. Given AI’s capabilities, it seems simple, right? Not quite. Despite recent advancements in AI (especially in the fields of natural language processing (NLP) and computer vision applications), mastering the unique complexities of human language continues to be one of AI’s biggest challenges.

According to IDC, worldwide revenues for the AI market are forecast to grow 16.4 percent year over year in 2021, as the market is expected to break the $500 billion mark by 2024.

As companies continue to develop and deploy AI solutions to automate processes, solve complex problems and enhance customer experiences, many are realizing its shortcomings — including the amount of data required to train machine learning (ML) algorithms and the flexibility of these algorithms in understanding human language.

The ability for computers to effectively understand all human language would completely transform how we engage with brands and businesses on a global scale. As businesses begin transitioning away from high-frequency, one-way communications and toward two-way conversations, it will be critical for organizations to gain a deeper understanding of human language as they look to improve customer interactions.

Think about it: If AI systems can get a deeper understanding beyond the traditional means of analyzing data, they’ll exceed human performance in language tasks and bring AI one step closer to human-level intelligence. The big question is: Is this level of human performance achievable?

Yes — and the secret lies within self-supervised learning.

Self-Supervised Learning: How Can It Improve AI?

Most of what we learn about the world, especially as babies, is mainly through observation and trial and error. As we learn, we develop common sense and the ability to learn complex tasks such as driving a car.

While ML algorithms can’t directly mimic the way babies learn, self-supervised learning can help systems predict what comes next. If we want AI to act more like humans, then we need vast amounts of high-quality labeled data.

Self-supervised learning allows ML algorithms to train on low-quality unlabeled data. The technique typically involves taking an input dataset and concealing part of it. The self-supervised learning algorithm must then analyze visible data and enable it to predict the remaining hidden data. As a result, this process creates the labels that will allow the system to learn and gives the system the ability to fill in the blanks.

Self-supervised learning eliminates the need for data labeling, opening up a huge opportunity for organizations to better utilize unlabeled data and streamline data processes. It creates a data-efficient AI system that can analyze and process data without the need for human intervention, eliminating the need for full “supervision.”

While self-supervised learning is a relatively new concept to the world of AI, it has already enabled major advancements in NLP. For example, Google introduced the BERT model in 2018, where engineers recycled an architecture typically used for machine translation and made it learn the meaning of a word in relation to its context in a sentence.

Facebook eventually took this a step further and was able to train a BERT-like model on more than 100 languages simultaneously. In 2020, Google pushed the BERT architecture to its limits by training a much larger network on even more data. The language-agnostic mT5 model performs better than humans in labeling sentences and finding the right answers to a question.

But with all these recent advancements, why aren’t we seeing these algorithms everywhere?

What’s Holding Us Back?

First and foremost, training the T5 algorithm is costly. While Google publicly shared these models, they can’t be used for anything specific without fine-tuning them to accomplish the task at hand — ultimately adding more cost. Furthermore, once these models are optimized to accomplish your specific problem, they still require a lot of time and power to compute and execute.

Most deep learning algorithms and workflows remain inefficient. While deep learning has made significant strides in recent years, it requires large amounts of data in order to have useful outputs.

Reducing AI’s data-dependency and moving beyond the limitations of deep learning will require the capabilities of self-supervised learning in order to both be successful and teach AI systems common sense.

Over time, as companies invest in advancing AI systems and fine-tuning their efforts, I expect that new applications will emerge. We could see more complex applications in the coming years, but I also foresee new models emerging to outperform the T5 algorithm.

Feature image via Pixabay.

A newsletter digest of the week’s most important stories & analyses.