What is open source?
Open source is a way of developing and distributing software. The code can be viewed, downloaded, used and changed by anyone.
Why publish analytical code as open source?
Alignment with analytical code key principles
- Transparent: Others can see how analyses are performed, increasing trust in results.
- Reproducible: Open code makes it easier to verify and reproduce results, and to reuse code.
- Quality assured: Open review and feedback help identify issues and improve code.
How to publish analytical code openly
There are two main routes to open-sourcing analytical code:
- Work in the open from the start: this is the ideal scenario which maximises the benefits from open-source, but it is usually only suitable when you are working on open, published datasets - for example, using data from the NHSBSA Open Data Portal.
- Retrospectively publish closed-source codebases: many of the datasets we analyse at the NHSBSA are not publicly available and so to protect sensitive and confidential data, we often work within secure internal environments. It is still recommended to retrospectively publish your analytical code, although this may need additional review (see below for details).
To publish analytical code:
- Include a full LICENCE file: Add a file named
LICENCE(British English spelling) with the complete licence text in your repository, not just a statement in theREADME(see below to learn more about the licences we use at the NHSBSA). - Document your code: Provide clear instructions for use, dependencies, and purpose.
- Host on GitHub: Host your code on the NHSBSA Data Analytics GitHub.
- Follow NHSBSA guidance: Adhere to NHSBSA policies and approval processes for open source coding (only visible to internal NHSBSA colleagues).
- Avoid storing secret keys or credentials in source code: Use secret management systems and keep credentials out of repositories. Make sure you check the history as well as the current version of the codebase! See GOV.UK guidance.
- Check for sensitive information: Ensure no personal, confidential, or proprietary data is included. Make sure you check the history as well as the current version of the codebase!
See the GOV.UK Service Manual and NHS Digital RAP Community guidance for detailed steps and best practices.
Retrospective open sourcing
There are additional considerations if you are retrospectively open sourcing your code (after having developed it closed source).
- The NHSBSA Retrospective Open Sourcing Guidance (only visible to internal NHSBSA colleagues).
- Whether you would benefit from using a ‘fit-for-publishing checklist’ to ensure your code is ready for release, including internal and external review steps. See the NHS Digital RAP Community Fit for publishing checklist.
Licensing open code
When publishing code or content openly, it is essential to include a clear licence to specify how others can use, modify, and share your work. Open code should include a LICENCE file, with a copyright notice where the year should reflect first publication, or a range if significantly updated.
The NHSBSA uses two types of licences, to serve different purposes:
- For software/code use the Apache 2 Licence. This is generally the licence to use in the code repository via the
LICENCEfile. - For published content, documentation, and data use the Open Government Licence v3.0 (OGL v3). A link to the OGL v3 licence should be included in the footer or main documentation of published outputs. The NHSBSA uses a standard footer format which includes the appropriate licencing information and can be reused across NHSBSA products:
For more details on licencing, see the NHSBSA Digital Playbook.
How do we define success?
- The majority of NHSBSA analytical codebases are available in the open.
- Decisions to keep code closed source are justified and clearly recorded.
- Published outputs (reports, data, dashboards, etc) are linked to the published code that produced them.
- All outputs (code and publications) include a
LICENCEfile in line with the guidance above. - Sensitive or confidential information never appears in open repositories.
- Bugs or issues discovered after publication are rare.
Lookout for:
- sensitive or confidential information accidentally included in the repository or commit history.
- missing or unclear documentation on how to setup and use the code.
- lack of key repository files, including
LICENCEandCONTRIBUTE. - code that is difficult to understand or maintain.