Documenting via docstrings

To avoid simulating the entire python interpreter in our minds, it is often easier to document the (intended) behavior of our code in a human readable format. Python offers the builtin function help() to display the documentation for a given function. For example, if we want to know what the numpy.sum function does we can just ask:

>>> import numpy as np
>>> help(np.sum)
Help on function sum in module numpy:

sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Sum of array elements over a given axis.
    
    Parameters
    ----------
    a : array_like
        Elements to sum.
    axis : None or int or tuple of ints, optional
        Axis or axes along which a sum is performed.  The default,
        axis=None, will sum all of the elements of the input array.  If
        axis is negative it counts from the last to the first axis.
    
        .. versionadded:: 1.7.0
    
        If axis is a tuple of ints, a sum is performed on all of the axes
        specified in the tuple instead of a single axis or all the axes as
        before.
    dtype : dtype, optional
        The type of the returned array and of the accumulator in which the
        elements are summed.  The dtype of `a` is used by default unless `a`
        has an integer dtype of less precision than the default platform
...

Where does help() get all this information from? In part, the information provided by help is part of the docstring for the enumerate function. We can view the docstring by viewing the __doc__ attribute of the function as follows:

>>> print(np.sum.__doc__)

    Sum of array elements over a given axis.

    Parameters
    ----------
    a : array_like
        Elements to sum.
    axis : None or int or tuple of ints, optional
        Axis or axes along which a sum is performed.  The default,
        axis=None, will sum all of the elements of the input array.  If
        axis is negative it counts from the last to the first axis.

        .. versionadded:: 1.7.0

        If axis is a tuple of ints, a sum is performed on all of the axes
        specified in the tuple instead of a single axis or all the axes as
        before.
...

Documentation vs commenting

There are two ways in which you can and should describe your code – documentation and commenting. These two ways of describing code have two audiences (which may overlap) – documentation is for the people who will use your code, whilst comments are for people who will develop your code. Both of these audiences include you, the original developer, some 6 months in the future when you have forgotten all the details about what you were doing. Quite simply:

Documentation is a love letter that you write to your future self.
Damian Conway

Comments

Comments should include design decisions, or explanations of difficult to interpret code chunks. Comments can include known/expected bugs or shortcomings in the code. Things that are not yet implemented, or hacks that deal with bugs in other modules, should also be in comments. Python comments come in two flavours: a single or part line comment which begins with a #, or a multiline comment which is any string literal.

'''
A comment that covers more than one line
because it is just so long
'''

def my_func(num):
    # assume that num is some numeric type, or at the very least
    # an object which supports division against an integer
    ans = num / 2 # A partial line comment
    return ans

The partial-line comment plus multi-line commands can be used to great effect when defining functions, dictionaries, or lists:

dict = {'key1': 0, # note about this item
        'key2': 1, # another note
       }

def my_func(num,
            ax, # a matplotlib axes object
            verbose=True, # TODO update to be logger.isEnabledFor(logging.DEBUG)
            **kwargs)

When python is interpreted (or compiled to byte-code), the interpreter will ignore the comments. The comments therefore only exist in the source code. Commenting your code has no effect on the behavior of your code, but it will (hopefully) increase your ability to understand what you did. Because the comments are ignored by the python interpreter only people with access to your source code will read them (developer usually), so this is a bad place to describe how your code should be used. For notes about code usage we instead use documentation.

Docstrings

Python provides a way for use to document the code inline, using docstrings. Docstrings can be attached to functions, classes, or modules, and are defined using a simple syntax as follows:

def my_func():
  """
  This is the doc-string for the function my_func.
  I can type anything I like in here.
  The only constraint is that I start and end with tripe quotes (' or ")
  I can use multi-line strings like this, or just a single line string if I prefer.
  """
  return

Docstrings can be any valid string literal, meaning that they can be encased in either single or double quotes, but they need to be triple quoted. Raw and Unicode strings are also fine.

Docstrings can be included anywhere in your code, however unless they immediately follow the beginning of a file (for modules) or the definition of a class or function, they will be ignored by the compiler. The docstrings which are defined at the start of a module/class/function will be saved to the __doc__ attribute of that object, and can be accessed by normal python introspection.

Docstring formats

While it is possible to include any information in any format within a docstring it is clearly better to have some consistency in the formatting.

There are, unfortunately, many ‘standard’ formats for python documentation, though they are all similarly human readable so the difference between the formats is mostly about consistency and automated documentation.

Scipy, Numpy, and astropy, all use the numpydoc format which is particularly easy to read. We will be working with the numpydoc format in this workshop.

Let’s have a look at an extensive example from the numpydoc website.

"""Docstring for the example.py module.

Modules names should have short, all-lowercase names.  The module name may
have underscores if this improves readability.

Every module should have a docstring at the very top of the file.  The
module's docstring may extend over multiple lines.  If your docstring does
extend over multiple lines, the closing three quotation marks must be on
a line by itself, preferably preceded by a blank line.

"""
from __future__ import division, absolute_import, print_function

import os  # standard library imports first

# Do NOT import using *, e.g. from numpy import *
#
# Import the module using
#
#   import numpy
#
# instead or import individual functions as needed, e.g
#
#  from numpy import array, zeros
#
# If you prefer the use of abbreviated module names, we suggest the
# convention used by NumPy itself::

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

# These abbreviated names are not to be used in docstrings; users must
# be able to paste and execute docstrings after importing only the
# numpy module itself, unabbreviated.


def foo(var1, var2, *args, long_var_name='hi', **kwargs):
    r"""Summarize the function in one line.

    Several sentences providing an extended description. Refer to
    variables using back-ticks, e.g. `var`.

    Parameters
    ----------
    var1 : array_like
        Array_like means all those objects -- lists, nested lists, etc. --
        that can be converted to an array.  We can also refer to
        variables like `var1`.
    var2 : int
        The type above can either refer to an actual Python type
        (e.g. ``int``), or describe the type of the variable in more
        detail, e.g. ``(N,) ndarray`` or ``array_like``.
    *args : iterable
        Other arguments.
    long_var_name : {'hi', 'ho'}, optional
        Choices in brackets, default first when optional.
    **kwargs : dict
        Keyword arguments.

    Returns
    -------
    type
        Explanation of anonymous return value of type ``type``.
    describe : type
        Explanation of return value named `describe`.
    out : type
        Explanation of `out`.
    type_without_description

    Other Parameters
    ----------------
    only_seldom_used_keywords : type
        Explanation.
    common_parameters_listed_above : type
        Explanation.

    Raises
    ------
    BadException
        Because you shouldn't have done that.

    See Also
    --------
    numpy.array : Relationship (optional).
    numpy.ndarray : Relationship (optional), which could be fairly long, in
                    which case the line wraps here.
    numpy.dot, numpy.linalg.norm, numpy.eye

    Notes
    -----
    Notes about the implementation algorithm (if needed).

    This can have multiple paragraphs.

    You may include some math:

    .. math:: X(e^{j\omega } ) = x(n)e^{ - j\omega n}

    And even use a Greek symbol like :math:`\omega` inline.

    References
    ----------
    Cite the relevant literature, e.g. [1]_.  You may also cite these
    references in the notes section above.

    .. [1] O. McNoleg, "The integration of GIS, remote sensing,
       expert systems and adaptive co-kriging for environmental habitat
       modelling of the Highland Haggis using object-oriented, fuzzy-logic
       and neural-network techniques," Computers & Geosciences, vol. 22,
       pp. 585-588, 1996.

    Examples
    --------
    These are written in doctest format, and should illustrate how to
    use the function.

    >>> a = [1, 2, 3]
    >>> print([x + 3 for x in a])
    [4, 5, 6]
    >>> print("a\nb")
    a
    b
    """
    # After closing class docstring, there should be one blank line to
    # separate following codes (according to PEP257).
    # But for function, method and module, there should be no blank lines
    # after closing the docstring.
    pass

The example above is intentionally extensive, but you should be able to see what is going on. There are a few parts to the documentation format, some of which are considered essential, good practice, or optional. See the numpy doc guide for a more gentle yet more complete discussion on the numpydoc standard.

Good practice documentation

The main goal of documentation is to describe the desired behavior or intended use of the code. As such every docstring should contain at least a one line statement that shows the intent of the code.

It is good practice to describe the expected input and output (or behavior) of your functions.

In the numpydoc format we put these into two sections:

Parameters: for the input
Returns: for the output

There is no “Modifies” section for the documentation (though you could add one if you like). If the function modifies an input but does not return the modified version as an output then this should be included as part of the long form description.

The generate_positions function from the example skysim module has the following docstring:

def generate_positions(ref_ra='00:42:44.3',
                       ref_dec='41:16:09',
                       radius=1.,
                       nsources=1000):
    """
    Create nsources random locations within radius of the reference position.

    Parameters
    ----------
    ref_ra, ref_dec : str
        Reference position in "HH:MM:SS.S"/"DD:MM:SS.S" format.
        Default position is Andromeda galaxy.

    radius : float
        The radius within which to generate positions. Default = 1.

    nsources : int
        The number of positions to generate

    Returns
    -------
    ra, dec : numpy.array
        Arrays of ra and dec coordinates in degrees.
    """

Optional documentation

The type of errors that are raised, and under what conditions, can be documented in the Raises section.

Notes, References, and Examples, are also useful sections but not usually applicable to all functions or classes that you will be writing. If I have used code snippets from stack-overflow or similar, then I find Notes/References section to be a good place to acknowledge and link to those resources.

The Examples section can be used to show intended use. There is an automated testing suite called doctest which will scan your docstrings looking for segments starting with >>> and then run those segments in an interactive python interpreter. A solid test suite will typically contain many tests for a single function, thus trying to embed all the tests into your docstrings just makes for very long docstrings. It is preferable to keep your testing code in the tests module/directory of your python module, and to use the Examples section only for demonstrating functionality to the end user.

Making use of documentation

Some IDEs (the good ones) provide syntax highlighting, linting, and inline help as you write code. By providing docstrings for all your functions you can make use of the linting and inline help. Below is an example from VSCode in which the docstring for a function is being shown:

You can use the help from the python console like this:

>>> from skysim import sim
>>> help(sim.generate_positions)

Help on function generate_positions in module skysim.sim:

generate_positions(ref_ra='00:42:44.3', ref_dec='41:16:09', radius=1.0, nsources=1000)
    Create nsources random locations within radius of the reference position.
    
    Parameters
    ----------
    ref_ra, ref_dec : str
        Reference position in "HH:MM:SS.S"/"DD:MM:SS.S" format.
        Default position is Andromeda galaxy.
    
    radius : float
        The radius within which to generate positions. Default = 1.
    
    nsources : int
        The number of positions to generate
    
    Returns
    -------
    ra, dec : numpy.array
        Arrays of ra and dec coordinates in degrees.
...

Additionally you can compile all the documentation into a website or other document using an automated documentation tool as described in the next section.

Automated Documentation

f your docstrings are formatted in a regular way then you can make use of an automated documentation tool. There are many such tools available with a range of sophistication.

The simplest to use is the pdoc package which can be obtained from pypi.org. The packaged can be installed via pip install pdoc, and then run on our test module using pdoc skysim.

By default pdoc will start a mini web sever with the documentation on it. This should be opened in your browser by default but if it isn’t you can navigate to localhost:8080 or 127.0.0.1:8080. Use <ctrl>+C when you want to stop the web server. For the example project this is the website that is generated:

To make documentation that is less ephemeral you can use the the -d docs option to cause all the documentation to be built and then placed into the docs folder. pdoc only supports html output, however other auto-documentation packages such as sphinx can write latex (and thus pdf), ePub, man pages, or plain text.

Documentation as part of your development cycle

A typical development cycle will consist of writing code, testing code, and writing documentation. The order in which this is done depends on the software development strategies that you set out for your project, or simply personal preference. At the end of the day the process is cyclic – with the end goal of having code, tests, and documentation that are all in agreement. Once your code/tests/documentation are consistent then you can package your code into a module and publish it for others to use.

Collaborative Software Development