Concurrency and parallelism in Python

In some cases, the script we are working on may be slower than expected. While a lower bound on computational complexity plagues every algorithms, there may be some ways to improve performance by leveraging concurrency and parallelism.

Let’s investigate some of the possible workarounds, starting by understanding tradeoffs between approaches.

Difference between Concurrency and Parallelism

The first step is to understand the difference between concurrency and parallelism.

Concurrency

Concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome.

Concurrent tasks usually share the same memory (e.g. data structures, allocated storage), making communication easier. They may or may not run simultaneously, depending on the underlying architecture and the way the program is implemented.

Loading diagram...

Parallelism

Parallelism is the ability of a program to execute multiple parts or units simultaneously, each being run by a separate core or processor. Parallel tasks are independent for the most part, but can communicate and synchronize with each other if needed.

Loading diagram...

Different kind of bottlenecks

Before diving into the solutions, it is also important to understand the different kinds of bottlenecks that can arise in a program.

CPU-bound

A CPU-bound bottleneck occurs when the CPU is the limiting factor in the performance of the program.
This can happen when the script is computationally intensive, meaning that the CPU is constantly busy performing calculations and has little time to do anything else.

import time

# Running this function with a large value of n
# will require the cpu to perform a lot of operations,
# taking a long time to complete
def fibonacci(n: int) -> int:
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

I/O-bound

An I/O-bound bottleneck occurs when the input/output operations are the limiting factor in the performance of the program.
This can happen when the script requires a lot of data to be read from or written to disk, or when it needs to communicate with other systems over a network. In this case, the CPU has to wait for the I/O operation to complete before it can continue with the next task.

import requests

# Running this function will require the program to wait
# for the response from the server, which can take a long time
# and is inherently orders of magnitude slower than a CPU cycle
def fetch_url(url: str) -> str:
    response = requests.get(url)
    return response.text

Concurrency and Parallelism in Python

When trying to achieve concurrency or parallelism in Python (CPython, to be precise), there is a notable obstacle one must overcome: the Global Interpreter Lock (GIL).

The GIL is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. In other words, no matter how many threads are spawned, only one of them can execute Python code at a time. As a consequence, it is not possible to achieve parallelism through threads.

Threads in Python

This limitation has its merits, as it makes managing race conditions and all sorts of complication arising from concurrent programming a non-issue. Threads can still be useful to handle blocking I/O operations, as for when the GIL is released, other threads can run, without having to busy-wait for the I/O operation to complete.

On the other hand, if the bottleneck is CPU-bound, the use of threads may even degrade the performance, as only one of them will run at any given time and the overhead of context switching between them will be added to the mix.

Loading diagram...

Multiprocessing

To achieve true parallelism in Python, one must use the multiprocessing module, which allows the creation of separate sub-processes that can run in parallel.

Loading diagram...

Asyncio

There is a third way to achieve concurrency in Python, which is through the asyncio module.

This approach closely resembles the thread-based concurrency model, but with a few key differences. Instead of using threads, asyncio uses coroutines, which are functions that can be paused and resumed at specific points. This allows for a more fine-grained control over the execution of the program, as the programmer can explicitly define where the execution should yield control to another coroutine with the await keyword. Moreover, the operating system is not involved in the context switch, making it more lightweight.

Loading diagram...

How to speed up slow scripts

Now that we have a better understanding of the different kinds of bottlenecks and the tools available to overcome them, let’s sum up the possible solutions.

Bottleneck type	Solution
No bottleneck	Use standard single thread programming. The GIL is optimized for this use case
I/O-bound	Use `asyncio` to achieve concurrency.
CPU-bound	Use `multiprocessing` to achieve parallelism.
Low control I/O	Use `threading` to achieve concurrency.