Analytical Code Assurance

Suggested reading

AF Duck book: Modular code

NHSBSA DDaT playbook: Development - Coding

At a glance

Modular code is code that is broken down into smaller, independent, reusable chunks. This guide outlines the benefits of writing modular code and how to put it into practice within the NHSBSA.

What is modular code?

Complex analysis projects and workflows can often lead to a lot of code being written with some or a lot of that code performing repetitive tasks. Modular code is a way to break down that code into smaller, independent, reusable chunks (modules) that can be easily maintained and shared across a series of analyses.

Each module should do one specific job and work with other modules through clear interfaces (known inputs, known outputs). You should be able to use a module without needing to know how it works inside.

Languages such as R, Python and SQL, and workflow tools like Alteryx provide ways to break up code into smaller logical units, including modules, classes, functions, macros, libraries, and packages.

Why should code be modular?

Modular code:

is better organised and easier to navigate
promotes readability by preventing long, complex walls of code that are hard to understand
minimises repetition, which reduces the risk of errors and inconsistencies
reduces the likelihood of code conflicts when multiple team members are working on the same codebase
simplifies testing of the code and its outputs
is easier to maintain or improve in future
speeds up the peer review process

How do we write modular code?

A recommended workflow for producing modular code is:

Start by splitting complex code into multiple ordered scripts grouped by concern. For example, data build, cleaning, analysis, visualisation. Organise scripts so that the intended run order is clear – this alone improves navigability and makes peer review easier.
Identify repetitive or reusable logic and extract it into functions. Keep functions small and focused on a single responsibility, with clear inputs and outputs.
Group related functions into modules. For example, a dedicated file or script for data validation helpers. If you find yourself scrolling endlessly to locate a particular piece of logic, that’s a signal to break it into smaller modules. Equally, if you need many tabs open just to follow a single change, you may have over-fragmented – aim for balance. Avoid overly complex solutions where simpler ones exist.
As the codebase matures, consider packaging related modules so they can be shared and reused across projects.
Starting from monolithic code and refactoring incrementally is a valid strategy – you don’t need to get it right first time.
Submit for peer review, which will surface opportunities to improve structure that are hard to spot yourself.