Analytical Code Assurance

Pre-requisite reading

AF Duck book: Testing code

NHSBSA DDaT playbook: Testing

What types of testing should be done?

Testing is fundamental to assuring quality of analytical processes. It helps us ensure the things we’ve written to generate our analyses (e.g. code, workflows, excel workbooks, etc) work as expected and reduces the risk of errors in our results. Testing also builds confidence that results are reproducible and helps anyone to easily verify this.

There are many different types of tests, each aimed to address a different aspect of quality assurance. In analytical work, the following are often the most important to consider:

Type	Description	Benefits
Unit Tests	Small isolated tests to ensure each individual component works as expected	Help you locate and identify the root cause of any issues or errors
Integration Tests	Testing interactions between two or more individual components (for example, can the data output by one component be used as the input to another?)	Verify that different components or modules work together as expected
End-to-end Tests	Testing the entire analysis from start to finish	Verify that the whole process produces the expected results, given certain inputs

Why should we test our analysis?

Verify Correctness: Ensure our processes behave as intended and meet the requirements.
Provide Confidence: Increase confidence in the code’s reliability and the reproducibility of results.
Catch Bugs (Early): Identify and fix issues during development, when they are cheapest and easiest to resolve.
Improve Design: Writing testable processes often leads to better, more modular, and loosely coupled designs.
Enable Refactoring: Act as a safety net to enable us to make improvements while knowing that our results are not impacted.
Serve as Documentation: Well-written tests can illustrate intended use and expected behaviour.
Facilitate Collaboration: Ensure that contributions from different team members integrate correctly.

Alignment with analytical code key principles

Transparent: Others can be more trusting of the results of the code, knowing it has been tested.
Reproducible: By testing the code, issues that could make it less reproducible are more easily found.
Quality assured: Testing is inherently an activity that increases assurance of quality.

How do we test our analysis?

Focus on Behaviour: Test what the logic should do, rather than how it does it.
Risk-based Approach: Write more tests for logic that is very new, more complex, or business critical.
Test Edge Cases and Errors: Include tests for boundary conditions, invalid inputs, and expected failure modes.
Write Testable Processes: Design and implement analytical processes with testing in mind.
Keep Tests Independent and Fast: Ensure tests can run independently and quickly to provide rapid feedback.
Write Clear and Readable Tests: Tests should be easy to understand, indicating what is being tested and why.
Maintain Tests: As your analysis evolves, be sure to update your tests, refactor them, remove obsolete tests and add new ones.

Testing code

If you are producing analysis using code, the following points can help you go even further.

Use Testing Frameworks: Use established testing frameworks appropriate for the coding language.
Automate Test Execution: Run tests automatically, especially as part of a Continuous Integration (CI) pipeline, to get fast feedback on changes.

It is important to also ensure that new changes or bug fixes haven’t introduced a bug in code that was already tested successfully. This is a good reason to regularly run all existing unit, integration and E2E tests and to do so by CI.

How do we define success?

Teams feel confident about their code.
Bugs discovered after release of code are rare.
Automated tests in CI quickly inform developers of potential issues.
Test quality and relevance is high, and regressions detected.
The tests are understandable and do not hinder refactoring of the code.
Tests clearly demonstrate that the code meets its specified requirements.

Lookout for:

apply the 80/20 rule - the most effective use of time writing tests will be spent on the first 20% of tests that cover 80% of the key logic
plan for long term maintenance - you’ll need to maintain your tests, so design them carefully and stop before you write more than you can realistically keep up-to-date
avoid testing implementation details - focusing on the what rather than the how will improve your designs, make refactoring easier and avoid you having to update tests every time you update your code