As a machine learning professional specialising in computational linguistics (helping machines to extract meaning from human text), I have confused people on multiple occasions by suggesting that their document processing problem could be solved by neural networks trained using a Graphics Processing Unit (GPU). You’d be well within your rights to be confused. To the uninitiated what I just said was “Let’s solve this problem involving reading lots of text by building a system that runs on specialised computer chips designed specifically to render images at high speed”.

“In the age of the neural network, Graphics Processing Unit (GPU) is one of the biggest misnomers of our time.”

Well it turns out that GPUs are good for more than playing Doom in high definition or rendering the latest Pixar movie. GPUs are great for doing maths. As it happens, they’re great for the kind of maths needed for training neural networks and other kinds of machine learning models. So what I’m trying to say here is that in the age of the neural network, Graphics Processing Unit is one of the biggest misnomers of our time. Really they should be called “Tensor-based linear algebra acceleration unit” or something like that (this is probably why I’m a data scientist and not a marketer).

Where did GPUs come from?

One of the earliest known uses of the term GPU is from a 1986 book called “Advances in Computer Graphics”. Originally, GPUs were designed to speed up the process of rendering computer games to the user’s display. Traditional processor chips used for running your computer’s operating system and applications process one instruction at a time in sequence. Digital images are made up of thousands or millions of pixels in a grid format. Traditional CPUs have to render images by running calculations on each pixel, one at a time: row by row, column by column. GPUs accelerate this process by building an image in parallel. The below video explains the key difference here quite well:

So I know what you’re thinking… “If GPUs are so magical and can do all this cool stuff in parallel, why don’t we just use them all the time instead of CPUS?” — am I right?

Well here’s the thing… GPUs are specialised for really fast maths and CPUs are generalised for many tasks. That means that GPUS are actually pretty rubbish at a lot of things CPUs are good at — they’ve traded flexibility for speed. Let me try and explain with a metaphor.

The Patisserie Chef and the Cake Factory

Let’s away from the computer for a second and take a few moments to think about a subject very close to my heart… food.

A patisserie chef is highly efficient at making yummy cakes and pastries to delight their customers. They can only really pay attention to one cake at a time but they can switch between tasks. For example, when their meringue is in the oven they can focus on icing a cake they left to cool earlier. Trained chefs are typically very flexible and talented and they can make many different recipes — switching between tasks when they get time.

At some point in history, Mr Kipling and those guys who make twinkies got so many orders that human bakers would never be able to keep up with demand. They had to add cake machines. A factory contains machines that spit out thousands of cakes in parallel. The key difference is that these machines are not flexible in the way that a human chef would be. What if we’re churning out a batch of 10,000 french fancies when we get a call to stop production and make country slices instead? Imagine how long it would take to go around and stop all the machines, put the new ingredients in and then start the process again! A human chef could just throw out the contents of their oven and get started on the new order right away! The factory probably can’t even handle doing lots of different jobs. I bet they have different machines for the different products or at the very least have to significantly alter the production line. In contrast, the patisserie chef can just change what they do with their hands! By the way, this post is not in any way sponsored by Hostess or Mr Kipling. I just like cake.

Did you spot the metaphor here? The slower but more flexible chef is a CPU — plodding along one order at a time, switching when he gets some availability. The cake factory is a GPU — designed to churn out thousands of similar things as quickly as possible at the cost of flexibility. This is why GPUs aren’t a one size fits all solution for all of our computing needs.


Like I said earlier, GPUs are great at maths. They can be employed to draw really pretty pictures but they can also be used for all sorts of real world, mathematical operations where you need to run the same calculations on large batches of data. Training a neural network uses a lot of these streamlined mathematical operations regardless of whether they are trained to detect cats, play Go or detect nouns, verbs and adjectives in text. Using a trained neural network to make predictions is less computationally expensive but you might still benefit from running it on a GPU if you are trying to make a lot of predictions very quickly!