Concurrency in Python: Threads and Multiprocessing Explained
Table of Contents
- Fundamental Concepts
- Concurrency vs. Parallelism
- Threads
- Multiprocessing
- Usage Methods
- Using Threads in Python
- Using Multiprocessing in Python
- Common Practices
- Thread Synchronization
- Process Communication
- Best Practices
- When to Use Threads
- When to Use Multiprocessing
- Conclusion
- References
Fundamental Concepts
Concurrency vs. Parallelism
- Concurrency: Concurrency is about dealing with multiple tasks at the same time. It doesn’t necessarily mean that tasks are executed simultaneously. A single - core CPU can achieve concurrency by rapidly switching between different tasks. For example, in a web server, it can handle multiple client requests concurrently by switching between them.
- Parallelism: Parallelism is the actual simultaneous execution of multiple tasks. It requires multiple processing units, such as multiple CPU cores. A multi - core CPU can run different tasks in parallel.
Threads
A thread is the smallest unit of execution within a process. Multiple threads can exist within a single process and share the same memory space. Threads are lightweight compared to processes because they don’t require as much overhead to create and manage. However, due to the Global Interpreter Lock (GIL) in CPython, only one thread can execute Python bytecode at a time, which limits the parallel execution of CPU - bound tasks in Python threads.
Multiprocessing
Multiprocessing involves running multiple processes, where each process has its own independent memory space. Since each process has its own Python interpreter instance, the GIL doesn’t affect the parallel execution of CPU - bound tasks across different processes. However, creating and managing processes is more resource - intensive than creating and managing threads.
Usage Methods
Using Threads in Python
The threading module in Python provides a high - level interface for working with threads. Here is a simple example:
import threading
def print_numbers():
for i in range(5):
print(f"Thread: {i}")
# Create a new thread
thread = threading.Thread(target=print_numbers)
# Start the thread
thread.start()
# Wait for the thread to finish
thread.join()
print("Main thread finished")
In this example, we define a function print_numbers that prints numbers from 0 to 4. We then create a new thread using the threading.Thread class, start the thread, and wait for it to finish using the join method.
Using Multiprocessing in Python
The multiprocessing module in Python allows us to work with processes. Here is a simple example:
import multiprocessing
def print_numbers():
for i in range(5):
print(f"Process: {i}")
if __name__ == '__main__':
# Create a new process
process = multiprocessing.Process(target=print_numbers)
# Start the process
process.start()
# Wait for the process to finish
process.join()
print("Main process finished")
In this example, we define a function print_numbers and create a new process using the multiprocessing.Process class. We start the process and wait for it to finish using the join method. Note that the if __name__ == '__main__': guard is necessary on Windows and some other operating systems to avoid issues with process spawning.
Common Practices
Thread Synchronization
When multiple threads access shared resources, there is a risk of race conditions. A race condition occurs when the behavior of a program depends on the relative timing of events in different threads. To avoid race conditions, we can use synchronization mechanisms such as locks. Here is an example:
import threading
# Create a lock
lock = threading.Lock()
shared_variable = 0
def increment():
global shared_variable
for _ in range(100000):
# Acquire the lock
lock.acquire()
shared_variable += 1
# Release the lock
lock.release()
# Create two threads
thread1 = threading.Thread(target=increment)
thread2 = threading.Thread(target=increment)
# Start the threads
thread1.start()
thread2.start()
# Wait for the threads to finish
thread1.join()
thread2.join()
print(f"Shared variable: {shared_variable}")
In this example, we use a threading.Lock to ensure that only one thread can access and modify the shared_variable at a time.
Process Communication
When working with multiple processes, we often need to communicate between them. The multiprocessing module provides several ways to achieve this, such as Queue and Pipe. Here is an example using a Queue:
import multiprocessing
def producer(queue):
for i in range(5):
queue.put(i)
def consumer(queue):
while True:
item = queue.get()
if item is None:
break
print(f"Received: {item}")
if __name__ == '__main__':
# Create a queue
queue = multiprocessing.Queue()
# Create producer and consumer processes
producer_process = multiprocessing.Process(target=producer, args=(queue,))
consumer_process = multiprocessing.Process(target=consumer, args=(queue,))
# Start the processes
producer_process.start()
consumer_process.start()
# Wait for the producer to finish
producer_process.join()
# Send a sentinel value to the consumer to signal the end
queue.put(None)
# Wait for the consumer to finish
consumer_process.join()
In this example, the producer process puts numbers into the queue, and the consumer process retrieves and prints them.
Best Practices
When to Use Threads
- I/O - bound tasks: Threads are well - suited for I/O - bound tasks such as reading from or writing to files, making network requests, etc. Since the GIL is released during I/O operations, multiple threads can work concurrently on these tasks.
- Low - resource overhead: If you need to create a large number of concurrent tasks and resource usage is a concern, threads are a better choice because they are lightweight compared to processes.
When to Use Multiprocessing
- CPU - bound tasks: Multiprocessing is ideal for CPU - bound tasks such as numerical computations, image processing, etc. Since each process has its own Python interpreter instance, CPU - bound tasks can be executed in parallel across multiple processes.
- Isolation and safety: If you need to isolate different parts of your program for security or stability reasons, using multiple processes can provide that isolation.
Conclusion
In Python, both threads and multiprocessing are powerful tools for achieving concurrency. Threads are lightweight and suitable for I/O - bound tasks, while multiprocessing is better for CPU - bound tasks. Understanding the fundamental concepts, usage methods, common practices, and best practices of threads and multiprocessing can help you write more efficient and responsive Python programs.
References
- Python official documentation: threading and multiprocessing
- “Python Cookbook” by David Beazley and Brian K. Jones
By following these guidelines and examples, you should be able to make informed decisions when choosing between threads and multiprocessing in your Python projects.