Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Testing recommendations

Testing recommendations

In this guide, we will provide a roadmap and best-practices for creating test suites for python projects.

We will describe the most important types of test suites, the purposes they serve and differences between them. They will be presented in OutSide -> In order, which is our recommend approach. Starting with Public Interface tests, which test your code from the perspective of your users, focusing on the behavior of the public interface and the Features that your project provides. Then we will cover Project Level Integration tests, which test that the various parts of your package work together, and work with the other packages it depends on. Finally we will cover the venrable Unit Test, which test the correctness of your code from a perspective internal to your codebase, tests individual units in isolation, and are optimized to run quickly and often.

These 3 test suites will cover the bulk of your testing needs and help get your project to a reliable and maintainable state. We will also discuss some more specialized and advanced types of test cases in our Taxonomy of Test Cases section.

Advantages of Testing

Any test case is better than none

When in doubt, write the test that makes sense at the time.

While you are learning, and writing your first test suites, try not to get bogged down in the taxonomy of test types. As you write and use your test suite, the reason for classifying and sorting some types of tests into different test suites will become apparent.

As long as that test is correct

It can be surprisingly easy to write a test that passes when it should fail, especially when using complicated mocks and fixtures. The best way to avoid this is to deliberately break the code you are testing, hard-code a failure, and run the test-case to make sure it fails when the code is broken.

Public Interface Tests

A good place to start writing tests is from the perspective of a user of your module or library, as described in the Test Tutorial, and Testing with pytest guide. These tests follow the “Detroit School”, focusing on behavior, avoiding testing of private attributes, minimizing the use of mocks/patches/test-doubles.

Test Suites

Not all test cases are the same. In the following sections we will discuss many kinds of tests, which serve different purposes and provide different benefits. Tests should be divided up into different Suites, which can be run independently of one another.

Tests which “Fail Fast” save both developer and compute time. Some tests are by necessity very slow. Unit Tests should run extremely quickly in just a few seconds at most, while end-to-end require time to set up and may depend on slow and unreliable external services. By organizing tests into suites based on execution time, you can run fast suites first and stop if an error is encountered before running slower suites.

Advantages of Test Suites

Guidelines for Test Suites

Creating Test Suites

The simplest way to start, is separating tests into directories inside of the tests/ directory:

tests/
    |- unit/
    |- integration/
    |- e2e/

These suites can be run directly with pytest pytest tests/unit/.

Markers provide an additional layer of organization, by labeling individual tests. This lets us create specialized suites based on markers, independent of directory structure or type of test. For example we can mark extremely slow or flakey tests that are conceptually part of a larger suite, and skip them when needed.

First, define a new marker in pyproject.toml:

[tool.pytest]
markers = [
    "unit: marks unit tests",
    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
    "online: tests which require internet access",
]

To mark an individual test, decorate the test case:

import pytest


@pytest.mark.slow
def test_something_slow(): ...

To mark every test in a directory, add the following to the conftest.py in the target folder:

# tests/unit/conftest.py
import pytest


def pytest_collection_modifyitems(session, config, items):
    for item in items:
        item.add_marker(pytest.mark.unit)

Project Level Integration Tests

The term “Integration Test” is unfortunately overloaded, and used to describe testing that various components integrate with each other, at many levels of the system. These tests will loosely follow the “Detroit School” of test design, focusing on behavior, and the way components and dependencies interact.

These tests can be a good place for more extensive edge-case and fuzzy input testing. Which may include “private” functions/classes/attributes, which require more extensive validation.

The intended audience for these tests is developers working on the project.

Unit Tests

Unit tests loosely follow the “London School” of testing, where the smallest unit of code is tested in isolation.

These tests are written from an internal perspective, so they are a good place to test aspects of the codebase which are “private” (not directly exposed to users), but which still need to be tested. Some examples of units are: a single function, an attribute of an object, a method or property of a class.

They test only the code in the project, not code imported from other projects (or even other modules within the same project).

Advantages of unit testing

Unit tests ensure that the code, as written, is correct, and executes properly. They communicate the intention of the creator of the code, how the code is expected to behave, in its expected use-case.

Unit tests should be simple, isolated, and run very quickly. This allows us to run them frequently, while we make changes to the code (even automatically, each time we save a file for example) to ensure our changes did not break anything... or only break what we expected to.

Writing unit tests can reveal weaknesses in our implementations, and lead us to better design decisions:

When to write unit tests

Not all projects need full unit test coverage, some may not need unit tests at all.

Guidelines for unit testing

Importing in test files

Keep things local! Prefer to import only from the file-under-test when possible. This helps keep the context of the unit tests focused on the file-under-test.

Consider what happens when code we rely on from other modules is refactored and moved.

# src/project/lib.py
from project.cursed_utils import MyClass


def func(my_class: MyClass): ...


# src/project/tests/test_lib.py
from project.lib import MyClass, func


def test_func():
    ret = func(MyClass())
    ...

If our test file imported MyClass directly from cursed_utils, but the file-under-test was updated to import it from bettername, the test would fail on import and need to be updated. The test case does not care where MyClass is defined, or imported from, it only cares about the MyClass symbol that is used in the source code that it is testing. This way, when unrelated code is refactored, the test does not need to change at all.

# src/project/lib.py
from project.bettername import MyClass


def func(my_class: MyClass): ...


# src/project/tests/test_lib.py
from project.lib import MyClass, func


def test_func():
    ret = func(MyClass())
    ...

Importing from other source files is a code smell (for unit tests), It indicates that the test is not well isolated.

Prefer to import only the object that you actually use, not the entire library.

It is common practice to import and alias all of a library, such as import numpy as np. However, as we develop our unit tests, this can cause difficulty with mocking, and complicate refactoring.

To patch out numpy.sum in your test, you either need to patch the global numpy module, which can have unintended side-effects, or specifically patch numpy.sum within the module namespace, which can result in absurdly long namespace paths, and logical breaks in the path when we transition from the local module namespace into an imported dependency’s namespace, like so:

def test_func(mocker):
    mock_sum = mocker.patch(
        "project.lib.np.sum",
        autospec=True,
    )  # you'll need to patch the alias'd namespace
    ...

Consider the benefits of refactoring your imports like so:

from numpy import sum as np_sum, Array as NpArray

now you simply need to patch the imported function in the context of the file-under-test:

def test_func(mocker):
    np_sum = mocker.patch("project.lib.np_sum", autospec=True)
    ...

Finally consider giving the imported symbols aliases that are meaningful to your code, regardless of the module they are imported from:

# from numpy import sum as numeric_sum
from bettermath import superfast_numeric_sum as numeric_sum

total = numeric_sum(some_numeric_values)
Running unit tests

We recommend using Pytest for running tests in your development environments. To run unit tests in your source folder, from your package root, use pytest {path/to/source}. To run tests from an installed package (outside of your source repository), use pytest --pyargs {package name}.

You can set the default test path in pyproject.toml, see: Configuring pytest

We recommend configuring pytest to run ONLY your fastest, least demanding test suite by default.

Mocking and Patching to Isolate the code under test

When the unit you are testing touches any external unit (usually something you imported, or another unit that has its own tests), the external unit should be Patched, replacing it with a Mock for the durration of the test. The unit test will:

import pytest

SRC = "path.to.module.under.test"


def test_myfunction(mocker):
    patchme: Mock = mocker.patch(f"{SRC}.patchme", autospec=True)
    ret = myfunction()
    patchme.assert_called_with("input from myfunction")
    assert ret is patchme.return_value

Consider what needs to be mocked, and the level of isolation your unit test really needs.

Excessive mocking is a code smell! Consider ways to refactor the code, so that it needs fewer mocks, less setup, and fewer assertions in a single test case. This frequently leads us to write more readable and maintainable code.

It is worth cultivating a deep understanding of how python’s imports work. The interactions between imports and patches can sometimes be surprising, and cause us to write invalid tests... or worse, tests that pass when they should fail. These are a few of the cases that cause the most confusion.

When patches and imports are both used in a test case, the patch only applies to the specific context in which it is called, and does not override the import used elsewhere in the test file. In the following example:

# project.lib
def dangerous_sideffects():
    raise RuntimeError("BOOM")


def say_hello():
    dangerous_sideffects()
    return "hello world"
from project.lib import say_hello, dangerous_sideffects


def test_pytest(mocker):
    # Given this context
    mock_dangerous_sideffects = mocker.patch("project.lib.dangerous_sideffects")
    # When we run the code
    ret = say_hello()
    # Then we expect the result
    assert ret == "hello world"
    mock_dangerous_sideffects.assert_called_once()

    # But this will still raise an exception!
    dangerous_sideffects()

Extensive Input Testing

The range of inputs that test cases validate is an important decision.

When the need for extensive testing starts to conflict with the readability of test cases and their usefulness as documentation for users and other developers, the tests should be re-organized into public-facing (concise, expressive, easily readable), and technical (complex, extensive) test files.

In Public Interface Tests

These are the most appropriate place to test certain invalid inputs and dependencies. Public Interface tests act like a contract with users; each behavior that is tested is like a promise that users can rely on, and expect that it will not change without warning (and probably a major version bump). So any input/output and side-effects included in these tests should be considered officially supported behavior and given careful consideration.

In project level integration tests

These are a good place to handle more extensive input testing. Integration tests already tend to be more verbose, with a lot of setup and teardown, and much more behavior to cover than other kinds of tests. These are the kinds of tests that should focus on edgecases.

In Unit Tests

Unit Tests should focus on the “happy-path” of execution. In most cases one representative example of the expected input is sufficient. The test case should illustrate how the unit is expected to be used.

Invalid input should only be tested when the unit itself includes logic to handle that invalid input.

for example, this code:

def foo(x: int):
    return x + 1

should not test its behavior when passed a string (the type annotation already covers that).

This code should be tested with a string, to cover the exception path.

def bar(x):
    if type(x) is str:
        raise RuntimeError("invalid input")
    return x + 1

Additional Types of Test Suites

A non-exhaustive discussion of some common types of tests.

Dont Panic!

Depending on your project, you may not need many, or most of these kinds of tests.

Behavioral, Feature, or Functional Tests

High-level tests, which ensure a specific feature works. These are placed in a location like ‘project_root/tests/behavioral/’. Used for testing things like:

Fuzz Tests

Fuzz tests attempt to test the full range of possible inputs to a function. They are good for finding edge-cases, where what should be valid input causes a failure. Hypothesis is an excellent tool for this, and a lot of fun to use.

Integration Tests

The word “Integration” is a bit overloaded, and can refer to many levels of interaction between your code, its dependencies, and external systems.

End to End Tests

The slowest, and most brittle, of all tests. Here, you set up an entire production-like system, and run tests against it. Some examples are:

Fuzz Tests and other slow tests

Testing random input, using tools like Hypothesis, is similar to testing edge cases, but running these tests can take a very long time, and they can often be much more complex and difficult to read for new developers.

Diagnostic Tests

Diagnostic tests are used to verify the installation of a package. They should be runnable on production systems, like when we need to ssh into a live server to troubleshoot problems.

A diagnostic test suite may contain any combination of tests you deem pertinent. You could include all the unit tests, or a specific subset of them. You may want to include some integration tests, and feature tests. Consider them Smoke Tests, a select sub-set of tests meant to catch critical errors quickly, not to perform a full system check of the package. Good diagnostic tests:

Advantages of Diagnostic Tests

Guidelines for Diagnostic Tests

Mocking and Patching to Isolate the code under test

Test Isolation is less necessary in diagnostic tests than unit tests. We often want diagnostic tests to execute compiled code, or run a test on GPU hardware. In cases where we do need to mock some part of our code, unittest.mock.patch is similar to the pytest mocker module.

from unittest.mock import patch, Mock

SRC = "mymodule.path.to.source"


@patch(f"{SRC}.patchme", autospec=true)
def test_myfunction(t, patchme: Mock):
    ret = myfunction()
    patchme.assert_called_with("input from myfunction")
    t.assertIs(ret, patchme.return_value)

Running Diagnostic Tests

stdlib’s unittest can be used in environments where pytest is not available: