What should Python dictionaries be used for?

Whenever I use a package and get a dictionary returned from an API call I struggle to know what is in it, what the meanings of the values are and whether those values are always going to be there or only in specific circumstances. Whilst dictionaries have their place and are the only choice for certain use cases, they also have limitations and shouldn’t be used in certain cases. Below I’ll discuss a few example scenarios where dictionaries are useful and where they should be avoided.

Let’s start with the use case they are best for – looking up values based on a key. For example, if you have a bunch of users and need to look up their address by their id, use a dictionary. Lists or other data structures would be inefficient since lookups in lists usually are O(N) whereas with dictionaries they are O(1).

The next example is where dictionaries are commonly used and lead to a poor developer experience – returning them from API calls. For example, let’s imagine a function which returns information about a customer:

def get_user():
    return {
        "id": 1,
        "name": "David Andersson",
        "location": "Australia",
        "products": [
            "apple",
            "chocolate",
            "pencil"
        ]
    }

Without knowing the implementation details, the caller doesn’t get any information about what is contained in the dictionary, what the values mean and whether they are always there or not. You might be aware of TypedDict which is indeed an improvement compared to providing no information at all. There is an even better alternative, providing a dataclass or NamedTuple as a return value. The advantage is that there are well established ways of documenting these kind of data structures using docstrings and they provide the same type safety as TypedDict. For example:

class User(typing.NamedTuple):
    """Information about a user.

    Attributes:
         id: Unique identifier for the user.
         name: The first and last name of the user.
         location: Where the user is located.
         products: The products the user has purchased.
    """

    id: int
    name: str
    location: str | None
    products: list[str]

def get_user() -> User:
    return User(
        id=1,
        name="David Andersson",
        location="Australia",
        products=[
            "apple",
            "chocolate",
            "pencil"
        ]
    )

This means that users of the API get a lot more information about the data being returned, get auto complete in their IDE and get more details about each attribute being returned by the API.

So in conclusion, dictionaries are best used for key-value lookup use cases and other data structures, like dataclass and NamedTuple are more appropriate for most APIs as they are more developer friendly.

Leave a comment