Machine Learning - A Definition


Machine learning is a subset of Artificial Intelligence.

Machine learning is a subset of Artificial Intelligence

Traditional Programming

But before we dive deeper into what that means, we will take a look at “traditional” programming:

Traditional software development approach

In traditional programming, a programmer/analyst/designer creates a program that manipulates data to get to a result.

As a concrete example, let us look at a money transfer in a banking app.

Example: A banking app

The analyst knows exactly which actions need to be performed to successfully transfer money from Alice to Bob:

  • authenticate the user
  • fetch her account(s)
  • check the balance
  • get all the details for the money transfer
  • successfully wire the money to Bob.

To achieve a successful money transfer, the programmer programs the exact sequence of steps and data manipulations for the computer to perform.

Machine Learning

In machine learning, however, we will use the data, to produce a ‘program’ that is able to create an output from inputs it has not seen before, by learning relationships in the data.

machine learning approach

Let’s read that again:

  • in machine learning
  • we use data
  • to learn relationships
  • which produce a “program”
  • that is able to take inputs it has never seen and create an output.

Here is an informal defintion:

Machine Learning - an informal definition

This gives you a bit of an idea.

Here is the formal definition:

Machine Learning - a formal definition

If you had to read that formal definition a couple of times and it still makes little sense, that is more than ok.

To make it more digestible, we will have a look at two examples:

Example 1: playing checkers

example: a game of checkers

If we want to have a computer play checkers (=task) then the more the computer plays (= experience), the better it should become at winning the game (=performance).

Example 2: recognizing cats

example: recognize cats

The same goes for recognizing cats in pictures (=task). The more the ‘program’ looks at pictures of cats (=experience), the better it should become at recognizing whether or not there is a cat in the picture (=performance).

Some common tasks

In machine learning, there are some common tasks:

  • regression
  • classification
  • extracting knowledge
  • recommenders
  • generation

Regression: predict a value

With regression we try to predict a value given a number of inputs.

For example:

  • given the day of week, temperature, and chance of rain: how many ice creams will be sold?
  • given the size, location, and type of real estate: how much is it worth?
  • given the time of day and type of post: how many likes will a post receive?
A linear regression animation

Classification: predict a class

With classification we try to predict to which class an input belongs to.

For example:

  • is an incoming email spam or not?
  • is a credit card transaction fraudulent?
  • is tumor malginent or benign?
examples of image classification

Extracting knowledge

Knowledge extraction can be used to extract structured data from an unstructured input.

For example:

  • for text: get the name, birth date, nationality from a Wikipedia biography;
  • for images: locate and label all cats.


Recommenders are used to make content or product recommendations based on similar content or products, or based on what others like you have enjoyed.

For example:

  • a list of Netflix shows to watch after finishing the last episode;
  • given your preference for romantic comedies, a list of what other people with that same taste enjoyed.


Generative models create entirely new content, sometimes based on some user input.

  • given a few words, write an article or story;
  • generate new images that you can use on your site.

Common techniques

Decision Tree

One way to classify inputs is to create a decision tree, which, by using a set of “if-then-else” evalutions, guides you to which class an input belongs.

A decision tree

However, instead of crafting it ourselves, we have a computer learn it (construct the three itself) from examples.

Random Forest

If we combine a number of decision trees, we get what is known as a “random forest”.

The program learns multiple decision trees to make a suggestion for the output (class or value) and in the final step we use a way (“voting”, “averaging”,…) to combine the results of individual trees to one final output.

A random forest

Neural Networks

A technique that won a lot of attention the last few years - although the idea has been around for decades - is “neural networks”.

Neural Networks

In essence a neural network builds a mapping

from inputs

  • an image
  • text
  • numerical data

to outputs

  • “malignent” or “benign”;
  • “spam” or “not spam”;
  • 74 ice creams sold;
  • a product recommendation;
  • a newly created image.

In a lot of introductions to “neural networks” you will find they are modeled after the brain, and then show you the picture below:

Neural networks are modelled after the brain

The only thing this image is trying to explain, is that our brain is made up of neurons that:

  1. take inputs;
  2. do some form of processing;
  3. produce an output.

And that is basically the end of the comparison to our human brain.

In practice, and without going into too much detail, this results into a basic building block consisting of two parts:

  • a linear part;
  • a non-linear part.

The linear part is multiplication and addition, starting from the bits of your image or words from your texts, these value will be multiplied by a set of numbers (weights) and then summed up together.


Secondly, the non-linear part makes use of an ‘activation function’, which transforms an input non-linearly. Although, that might sound scary, below are two examples of popular activation functions.

The one on the right (ReLU) can be translated to:

  • negative input: output ‘0’;
  • positive input: output the input.
Activation functions

Not scary at all…

Note: Granted, this is a bit technical, but do not worry, you do not need this kind of specialized knowledge to use the current state of the art. However, it might later help you understand what is really going on, and, most important, it should show you there is little magic involved.

Now, if we stack a lot of these building blocks (multiplication/addition + non-linear) on top of each other, we get what is called: “a neural network”.

Stacking building blocks

The “input layer” is the inputs we described above (image, text…), the output layer is the prediction the network makes (e.g. “spam or not”). In between we find the “hidden layers” where the computation is going on.

Neural networks have existed for a long time, but due to data and compute limitations, only small stacks of building blocks could but put on top of each other, resulting into what is called “shallow” neural networks, with limited performance.

In essence

Machine Learning gives the computers the ability to learn (gain experience) from data to perform a certain task without explicitely being told what to do.

Where to next

This post is part of our “Artificial Intelligence - A Practical Primer” series.

Or have a look at: