Have you ever tried to understand a new project by looking at the source code only to find that the code isn’t clear on its own and is lacking documentation, such as docstrings? I have had that experience a few times which slowed down being able to fix bugs and add new features and frequently also meant that the code wasn’t well structured. In this post we’ll look at best practices for documenting in code and its numerous benefits such as helping you be clear on what you actually need to code and reminding yourself and others about how the code works when you come back from an extended holiday. We will also look at a linter that checks your docstrings to make sure they are complete. Your future self will then never be frustrated about a lack of documentation in code again!
A useful place to start is with thinking about the target audience of docstrings. One obvious reader is consumers of your modules, classes and functions. The docstrings will show up, for example, when hovering over them after importing in a code editor such as VSCode:
Consumers may also go to your source code and read your docstrings. There are some tools that will even generate package documentation based on docstrings. Other readers of the docstrings, which is not as frequently considered, are developers who will later come and make changes to the code and reviewers who will read your code and help improve it. Docstrings can go a long way for them to understand the purpose and potentially implementation details of the module, class or function.
Now that we understand who may read the docstrings, let’s consider where in your Python code it can be valuable to include documentation. By convention (PEP257 and the Google Python Style Guide), docstrings are in the following locations:
- At the module level describing the module at a high level
- At the function level describing the purpose of the function, arguments, return values and so on
- At the class level describing the purpose of the class, it’s functions, attributes etc.
The most important modules, functions and classes to include docstrings for are those that are used by others. As we will see later, there are additional benefits to docstrings besides telling users how modules, functions and classes work which means that it can still be worthwhile to add docstrings to private modules, classes and functions. Presumably private modules, classes and functions are used somewhere in your code, so even they have user who will probably appreciate documentation. Documentation on test functions is also important. This post doesn’t discuss test functions, I have written a post about writing great test documentation which may also be of interest to you.
The following paragraphs include examples from the code of a docstring linter that automates docstring checks. There is a brief introduction to that linter at the end of this post.
Next, to help us understand what we should put in each of the module, function and class docstrings, we will discuss the purpose of these docstrings and provide examples for writing them. Starting with the module docstring, it should describe the module at a high level. You can consider it as the introduction or overview of a chapter in a book describing your code. It is always useful to start with a short description which is no longer than one line. This is similar to the elevator pitch for the module which describes why it is useful or what it does. It is tempting to just restate the name, such as “The attrs module.” It is fairly obvious that this is the attrs module from it’s name. Instead of stating the obvious, expand on the name of the module. For example: “Class docstring arguments section checks.” That makes it obvious what the module can do and what you might expect to find in that module. I often find I don’t write much more detail into the docstring, although more information can be useful. For example:
- A detailed description of the module if it warrants more detail than provided by the short description.
- A short description of each of the public classes and functions in the module. This is like an index for what to read next. You can think about it in terms of, now that you are interested in using this module based on the description, here are the useful classes and functions available which provide the functionality outlined in the description.
- A short description of the constants in the module. This is useful if the constants are not self-explanatory and to highlight certain important constants, e.g., those that you expect to be used a lot.
- Usage examples of classes and functions. This can be especially beneficial for classes and functions made available by a package. Consider carefully whether these will be useful as it will be more expensive to maintain this documentation as you evolve the module. Automated testing of this documentation can help ensure the examples keep working.
Here is an example module docstring which includes a short description of the module and for all the public functions and classes:
"""Class docstring arguments section checks. Functions: check: Check a class docstring attributes section. is_property_decorator: Checks whether a decorator implies a method is a property. Classes: VisitorWithinClass: visit AST nodes of a class to gather all of it's attributes. """
Now that we have looked at module docstrings, we will next turn our attention to functions. I usually tend to put the most effort into writing function docstrings because they typically encapsulate all of the important logic of a project. Function docstrings also benefit from a short and long description similar to modules. In addition to that:
- Describe the arguments the function takes. This is a key part of the interface and it should at least be clear why those arguments are needed and potentially how they will be used. Again, refrain from just restating the name of the argument in the description. If you find yourself using the name of the argument and not providing much additional information (e.g., “the node” for a node argument), you can probably enhance the description by providing more information (e.g., “The top level node of the AST of the class to check.”). Since Python has introduced type hints, instead of describing the type of the argument in the docstring, add a type hint to the argument in the function signature!
- Describe the return/ yield value if the function returns/ yields a value. This informs the caller what they will get back from the function.
- Describe the exceptions raised within the function. This tells the caller the exceptions they may need to handle when using the function. I tend not to include exceptions raised by functions called within the function since that can explode in complexity. The function should gracefully handle those exceptions anyway. This is also an important section along with the arguments and will make it easy for your users to make sure they handle all the exceptions you raise. I have rarely seen this done well in documentation and it can make it difficult to know which exceptions could be raised without a deep dive into the code.
- Usage examples of the function. Consider carefully whether these will be useful as it will be more expensive to maintain this documentation as you evolve the function. Automated testing of this documentation can help ensure the examples keep working.
- Although this is less common, I also sometimes describe the algorithm or more information on how the function works. Whilst this is not as useful to the caller, it helps with the design process and will be handy for reviewers. Similar to including examples in the docstring, carefully consider whether it is useful to include such implementation details in the docstring as you will have to maintain that documentation. An alternative is to write it as a comment just beneath the docstring so that it doesn’t pollute the docstring seen by the user of the function.
Here is an example of a function docstring for the
check function which includes a short and long description and a description of the arguments and yield value:
"""Check that all class attributes are described in the docstring. Check the class has at most one attrs section. Check that all attributes of the class are documented. Check that no attributes the class doesn't have are documented. Check that a class without attributes does not have an attrs section. Args: docstr_info: Information about the sections of the docstring, such as the attributes described in the attributes section. docstr_node: The AST node of the docstring, used to target messages related to the docstring. class_assign_nodes: The attributes of the class assigned at the class level. method_assign_nodes: The attributes of the class assigned in it's methods. Yields: All the problems with the attributes section of the docstring. """
Next we will discuss class docstrings. Class docstrings are similar to module level docstrings. You can consider the class docstring to be an introduction to the class, similar to that the module docstring is an introduction to the module. The short and long description for classes are quite similar to that of functions and modules. Additionally, include:
- A short description of each of the public methods in the class. This is like an index for what to read next. You can think about it in terms of, now that you are interested in using this class, here are the useful methods available which provide the functionality outlined in the description.
- A short description of each of the attributes of the class. Attributes can be defined on the class (outside of methods), in the
__init__method and within any method or
classmethodthrough assigning to
cls. Properties using the
propertydecorator on functions should also be considered attributes. The attribute description should include the purpose of the attribute and the data it provides. Since Python has introduced type hints, describe the type of an attribute using a type hint rather than in the docstring!
- Usage examples of the class. Consider carefully whether these will be useful as it will be more expensive to maintain this documentation as you evolve the class. Automated testing of this documentation can help ensure the examples keep working.
Here is an example class docstring which includes a short description of the class, attributes and functions of the
"""Visits AST nodes within a class but not nested class and functions nested within functions. Attrs: class_assign_nodes: All the assign nodes and properties encountered within the class. method_assign_nodes: All the assign nodes encountered within the class methods. Functions: visit_assign: triggered for each assign node which records the node on the instance. visit_any_function: triggered for each function node. visit_once: ensures that the visit functionality for a node is only executed once. visit_top_level: ensures that the visit functionality for a node is not executed for nested nodes. """
Now that we know about all the useful docstrings in Python and their purpose, let’s think about the benefits of documentation in code. A clear benefit is that your users will know how and why to use the modules, functions and classes you create. In addition to that:
- Writing the documentation will require you to think about what it is you are doing. You can do this anyway without the documentation, the documentation can act as a forcing function. This will lead you to write better code. Similar to TDD, it can be beneficial to write the docstrings even before you implement the tests. The docstrings may evolve as you are writing the tests and implementing the function or class.
- If you struggle writing the documentation, it could be an indication to rethink the overall design. For example, if a function or its arguments are difficult to explain, your users will struggle with understanding how to use it.
- Reviewers will find it easier to understand your code. This will make PRs easier and require fewer questions from reviewers. Questions from reviewers in a PR can be an indication that more information in a docstring could be beneficial for future readers.
You can see that docstrings are not only useful to your users. This is why I usually write docstrings even on private functions, classes and modules.
Finally, now that we have established when, how and why to write docstrings, let’s consider a few tools that can help improve the quality of docstrings. Firstly, the
pydocstyle linter checks for compliance with some of the PEP257 standards. I always use this linter to make sure I have docstrings where they should be defined and to check the overall structure of the docstring. In addition to
pydocstyle, I have been wanting a tool that checks, for example, that all of the arguments of a function are documented and that the names of the arguments in the docstring match the names in the signature. Even with the best of intention, it is easy to forget to update a docstring or to make a spelling mistake with the name of a variable in the docstring. The
flake8-docstrings-complete linter addresses this use case as well as (at the time of writing):
- If a function/ method has arguments, that the arguments section is included.
- If a function/ method has arguments, that all function/ method arguments are in the argument section.
- If an arguments section is in the function/ method docstring, the argument section contains no arguments the function/ method doesn’t have.
- If a function/ method has a return statement with a value, the return value section is included.
- If a function/ method has a yield statement with a value, the yield value section is included.
- If a function/ method raises an exception, the raises section is included with a description for each exception that is raised.
- If a class has public attributes, that the attributes section is included.
- If a class has public attributes, that all public attributes are in the attributes section.
- If an attributes section is in the class docstring, the attributes section contains no attributes the class doesn’t have.
- Any of the sections being checked are not present multiple times.
For example, on the following code which defines a function
foo with the
baz arguments and fails to document the
baz argument in the docstring:
# source.py def foo(bar, baz): """Perform foo action on bar. Args: bar: The value to perform the foo action on. """
The linter warns:
flake8 test_source.py source.py:2:14: DCO023 "baz" argument should be described in the docstring, more information: https://github.com/jdkandersson/flake8-docstrings-complete#fix-dco023
The code can be fixed by including a description of the
baz argument in the docstring:
# source.py def foo(bar, baz): """Perform foo action on bar. Args: bar: The value to perform the foo action on. baz: The modifier to the foo action. """
That concludes our discussion on code documentation and docstrings. In summary, docstring should be defined for modules, functions and classes with the most effort spent on public modules, classes and functions. Docstrings should be treated as an introduction to the class, module or function for users, developers and reviewers. The
pydocstyle tool checks for compliance with PEP257 and the
flake8-docstrings-complete tool checks that your docstring include all arguments, attributes, raised exceptions and return and yield values.