Analysing sensitive data at scale doesn’t have to be a headache
Trying to access and share software-based tools or pieces of analysis between local authorities can be hard. Once a tool becomes too complex to be shared in an Excel file and requires a code-based solution programmed in a language like Python, several new barriers prevent it from being shared with a wider audience.
Historically, there have been two routes to giving access to such tools.
Local authorities sharing their data
In some cases, a complex tool will be held on a server, and local authorities will need to upload their data to wherever the tool is hosted. This can lead to lengthy information governance processes, depending on the personal or sensitive data involved.
Local authorities running the tool locally
In other cases, the code itself might be shared for local authorities to run themselves. This will usually require new software to be installed and similarly introduces a lengthy sign-off process from internal IT departments.
We faced the same problems when we set out to build a data cleaning tool for the SSDA903 data return, the statutory return used to report on looked after children to the Department for Education. Through a discovery process we had identified that data issues accumulated, creating more work. This is because local authority analysts were only able to check their data for errors during a three month window every year. With funding from the Local Digital Collaboration Unit, we aimed to build a version of this tool that would be accessible year-round, collaborating closely with Wigan Council and the Data to Insight community of children’s services analysts.
Creating a tool like this is complicated. Most of the validation rules were too complex for Excel, some required interaction with external data, and, on top of that, we needed to build the tool in a way that would enable multiple collaborators to contribute. With Excel ruled out we decided to use Python, but were then faced with similar problems: we had to make the tool available in such a way that the users did not have to share their data beyond their own networks (avoiding GDPR constraints and data sharing risk) but also so that users could run the tool without installing any external programs on their computers (avoiding the need for sign off).
This is how we came across Pyodide.
Pyodide is a way to run a local version of Python in the browser. This means that complex analysis and tools can be built without the need to install any software locally. The tool loads in the browser, but once loaded it no longer requires an internet connection: users can disconnect and input their data for validation, without the data leaving the user’s system.
The launch of our 903 data cleaning tool in December 2021 was the first time we put a Pyodide-enabled tool into the hands of analysts, and the feedback has been great. It is already being used by 48 local authorities and counting.
Whilst this is a great first application, what really excites us is the potential of Pyodide to unlock further tools and pieces of analysis whose creation was earlier inhibited by the inability for such tools to be seamlessly used across local authorities. If you are considering whether Pyodide could be useful for a project you are working on, here are three questions to consider:
- Is your tool too complicated to build in Excel?
- Could it benefit from the additional potential that Python-based tools enable?
- Does your tool or analysis depend on the use of personal data that is otherwise difficult and time-consuming to get permission to share?
If you answer yes to any of these questions, Pyodide might be a route worth exploring.
We would be happy to discuss any questions you may have about Pyodide, our tool, or the project more broadly – please email firstname.lastname@example.org.