Side Paths: How to Implement Retries More Elegantly in Python

By 苏剑林 | January 14, 2024

In this article, we discuss a programming topic: how to implement retries in Python more elegantly. In the post "Happy New Year! Recording the Development Experience of Cool Papers," I shared some experiences from developing Cool Papers, specifically mentioning the network communication steps required. Whenever network communication is involved, there is a risk of failure (no one can guarantee the network won't occasionally act up), so retrying is a fundamental operation in network communication. Furthermore, when dealing with multi-processing, databases, hardware interactions, etc., a retry mechanism is usually necessary.

Implementing retries in Python is not difficult, but there are certain techniques to doing it more simply without losing readability. Next, I will share my own attempts.

Looping Retries

A complete retry process generally includes parts like loop iteration, exception handling, delay waiting, and follow-up operations. Its standard implementation uses a for loop with try ... except ... to catch exceptions. A reference code snippet is as follows:

import time
from random import random

allright = False # Success flag
for i in range(5): # Retry up to 5 times
    try:
        # Code that might fail
        x = random()
        if x < 0.5:
            yyyy # yyyy is undefined, so it will raise an error
        allright = True
        break
    except Exception as e:
        print(e) # Print error message
        if i < 4:
            time.sleep(2) # Delay for two seconds

if allright:
    # Perform some operations
    print('Execution successful')
else:
    # Perform other operations
    print('Execution failed')

Our goal is to simplify the code before if allright:. You can see that it has a relatively fixed format: a for loop combined with a try ... break ... except ... sleep ... template. It's easy to imagine there is significant room for simplification.

Function Decorators

The problem with for loops is that if you need to retry in many places and the exception handling logic is the same, rewriting the except code every time becomes tedious. In such cases, the standard recommended approach is to wrap the error-prone code into a function and write a decorator to handle the exceptions:

import time
from random import random

def retry(f):
    """Retry decorator, adds retry functionality to a wrapped function
    """
    def new_f(*args, **kwargs):
        for i in range(5): # Retry up to 5 times
            try:
                return True, f(*args, **kwargs)
            except Exception as e:
                print(e) # Print error message
                if i < 4:
                    time.sleep(2) # Delay for two seconds
        return False, None
    return new_f

@retry
def f():
    # Code that might fail
    x = random()
    if x < 0.5:
        yyyy # yyyy is undefined, so it will raise an error
    return x

allright, _ = f() # Returns execution status and result
if allright:
    # Perform some operations
    print('Execution successful')
else:
    # Perform other operations
    print('Execution failed')

When multiple different code blocks need retries, you only need to write them as functions and add the @retry decorator to implement the same retry logic. Thus, the decorator approach is indeed a concise solution and is quite intuitive, which explains why it has become the standard. Most mainstream retry libraries, such as tenacity, or the older retry and retrying, are based on the decorator principle.

The Ideal Syntax

However, while the decorator approach is standard, it isn't perfect. First, you have to encapsulate the retry code into a separate function, which in many cases can feel disruptive—like a sudden jerk in the flow of the code. Second, because the code is encapsulated in a function, intermediate variables in the local scope cannot be used directly; any variables needed must be explicitly passed or returned, which feels a bit circuitous. Overall, while decorators simplify retry code, they still feel like they're missing something.

The perfect retry code I imagine would be based on a context manager syntax, similar to:

with Retry(max_tries=5) as retry:
    # Code that might fail
    x = random()
    if x < 0.5:
        yyyy # yyyy is undefined, so it will error
if retry.allright:
    # Perform some operations
    print('Execution successful')
else:
    # Perform other operations
    print('Execution failed')

However, after researching the principles of context managers, I realized that this ideal syntax is destined to be unachievable. This is because a context manager can only manage the "context"—it cannot manage the main block of code (the "potentially failing code" in this article). Specifically, a context manager is a class with __enter__ and __exit__ methods. it inserts __enter__ before the code runs (setup) and __exit__ after it runs (teardown), but it cannot control the code in the middle (e.g., making it run multiple times).

So, the attempt to implement a one-liner retry based purely on a context manager failed.

A Bit of a Struggle

The good news, however, is that while a context manager cannot implement a loop, its __exit__ method can handle exceptions. So it can at least replace try ... except ... for exception handling. Thus, we can write:

import time
from random import random

class Retry:
    """Custom context manager to handle exceptions
    """
    def __enter__(self):
        self.allright = False
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type is None:
            self.allright = True
        else:
            print(exc_val)
            time.sleep(2)
        return True

for i in range(5): # Retry up to 5 times
    with Retry() as retry:
        # Code that might fail
        x = random()
        if x < 0.5:
            yyyy # yyyy is undefined, so it will error
        break

if retry.allright:
    # Perform some operations
    print('Execution successful')
else:
    # Perform other operations
    print('Execution failed')

This latest version is actually very close to the ideal syntax mentioned in the previous section. There are two differences: 1. You have to write an extra for loop, but this is unavoidable because, as stated earlier, a context manager cannot initiate a loop, so you must start it externally with for or while; 2. You need to explicitly add a break, which is something we can try to optimize away.

Additionally, this version has a minor flaw: if all retries fail, then after the final attempt fails, it will still trigger a sleep, which is theoretically unnecessary. We should find a way to remove that.

Continued Optimization

To optimize away the break, the loop needs to "learn" to stop itself. There are two ways to do this: the first is to switch to a while loop and let the stop condition change based on the retry result, which leads to results similar to those in "Handling exceptions inside context managers"; the second is to keep the for loop but replace range(5) with an iterator that changes according to the retry results. This article focuses on the latter.

Upon analysis, I found that by using the built-in methods __call__ and __iter__, the retry object can act as both a context manager and a dynamic iterator, also solving the unnecessary sleep problem after the final failure:

import time
from random import random

class Retry:
    """Context manager + iterator for handling exceptions
    """
    def __call__(self, max_tries=5):
        self.max_tries = max_tries
        return self
    def __iter__(self):
        for i in range(self.max_tries):
            yield i
            if self.allright or i == self.max_tries - 1:
                return
            time.sleep(2)
    def __enter__(self):
        self.allright = False
    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type is None:
            self.allright = True
        else:
            print(exc_val)
        return True

retry = Retry()
for i in retry(5): # Retry up to 5 times
    with retry:
        # Code that might fail
        x = random()
        if x < 0.5:
            yyyy # yyyy is undefined, so it will error

if retry.allright:
    # Perform some operations
    print('Execution successful')
else:
    # Perform other operations
    print('Execution failed')

Attentive readers might wonder: you managed to delete one break, but added retry = Retry()—the total amount of code remains the same (and the context manager got more complex). Is it worth the effort? In fact, here retry is reusable. The user only needs to define retry = Retry() once, and then in subsequent retries, they only need:

for i in retry(max_tries):
    with retry:
        # Code that might fail

This is enough. Thus, although the context manager is slightly more complex, it is already infinitely close to the implementation of the ideal syntax.

The Ultimate Version

However, "defining retry = Retry() once and reusing retry" is only suitable for single-process environments. If multi-processing is involved, you still have to define retry = Retry() separately. Additionally, such reuse gives the feeling that "different retries are not completely isolated." Is it possible to remove that line entirely? I thought about it again and found it is possible! Here is the reference code:

import time
from random import random

class Retry:
    """Context manager + iterator for handling exceptions
    """
    def __init__(self, max_tries=5):
        self.max_tries = max_tries
    def __iter__(self):
        for i in range(self.max_tries):
            yield self
            if self.allright or i == self.max_tries - 1:
                return
            time.sleep(2)
    def __enter__(self):
        self.allright = False
    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type is None:
            self.allright = True
        else:
            print(exc_val)
        return True

for retry in Retry(5): # Retry up to 5 times
    with retry:
        # Code that might fail
        x = random()
        if x < 0.5:
            yyyy # yyyy is undefined, so it will error

if retry.allright:
    # Perform some operations
    print('Execution successful')
else:
    # Perform other operations
    print('Execution failed')

This time, the change was to swap __call__ for __init__, and then in __iter__, yield i was changed to yield self, returning the object itself. This way, there is no need to write a separate line for retry = Retry(5) to initialize. Instead, initialization and alias assignment happen simultaneously in for retry in Retry(5):. Furthermore, since a new instance is initialized for every retry loop, complete isolation between retries is achieved—killing two birds with one stone.

Summary

This article has relatively comprehensively explored how to write retry mechanisms in Python, attempting to reach what I consider the perfect implementation of retry code. The final result has more or less achieved what I had in mind.

However, I must admit that the motivation for this article was essentially just a case of "OCD" (perfectionism). It doesn't provide any substantial improvement in algorithmic efficiency. Spending too much time on programming details is, to some extent, a "side path" or "not attending to one's proper business," and is not necessarily something worth emulating.