Creating an open source tool with a community of data analysts
This blog is part of a series exploring our work building a data validation tool to help children’s services analysts file the SSDA903 stat return used to report on looked after children.
Previously in this series:
Analysing sensitive data at scale doesn’t have to be a headache
Better children’s services data at lower cost
Tasked with creating a data validator for the SSDA903 statistical return, Social Finance & Data to Insight (D2I) created a plan to build a browser-based tool written largely in Python. We did this to make use of Python’s powerful data science libraries and charming syntax while sidestepping the obstacles children’s services analysts often face with getting new software installed or arranging the sharing of sensitive data.
While considering the challenges involved in putting the plan into practice, and seeing the success of D2I’s suite of community-maintained Excel-based tools, we found ourselves examining a novel hypothesis: could analysts build a Python-based tool themselves? If they could play a similar role with a Python-based tool, not only would it address many of the challenges we’d identified, but it could have benefits for the project itself, the local authorities involved, and the wider children’s services community.
As well as analysts’ contributions to the codebase, the project would benefit from their knowledge of the SSDA903 dataset, meaning they could easily resolve ambiguities or typos in the guidance from experience, and would know how the user interface should be designed to fit their workflows. It would mean that analysts would be able to bring Python skills back to their own local authorities, enabling them to perform data manipulation and analysis tasks more quickly. Those skills, and the network of involved analysts, also meant we could be leaving behind the groundwork for further coordination between local authorities’ children’s services departments on the creation of tools which would benefit them all.
However, there wasn’t an active community of analysts creating, sharing and using Python code in children’s services – so we started to think about whether we could help such a community start emerging, embedding the relevant expertise to code the rules, maintain the codebase, and potentially continue to build and share tools into the future.
What we did
The two main challenges we needed to overcome were:
- Training the analysts in the use of Python and GitHub (without requiring them to install any new software on their computers)
- The coordination of the group around the tool’s development (allowing for their unpredictable work schedules)
We created a series of training materials – videos, exercises, and documentation – to teach analysts the Python programming language itself, the data science skills needed to code the validation checks, and the processes to contribute these to the shared project on GitHub. We also used a platform called Replit which allowed writing code and pushing it to GitHub from within the browser.
The D2I projects have enabled us to link into relevant training [which] has provided individuals in my team with programming skills, as well as ‘softer’ skills linked to networking with colleagues in other local authorities.
Local authority performance manager
Analysts proceeded through the training videos and exercises at their own pace, with Social Finance providing tutoring and support at twice-weekly drop-in sessions, as well as by email or one-to-one pair programming sessions. We also had weekly meetings to discuss progress and set out a plan for the week.
Once analysts had completed the training, they could start submitting validation checks. Each check had to be reviewed before being added to the site, which provided a natural opportunity to share tips, tricks, and best practices. The nature of the project lent itself well to this, as the meat of it consisted in writing the 250 validation checks, which essentially formed self-contained programming assignments. Overall, the approach turned out to be a huge success, with analysts coding the majority of rules within the tool and learning some Python in the process.
What we learned
In spite of this success, we did also learn some valuable lessons, which we hope to build on in the next phase of the project, as we build further tools for the children’s services community using a similar approach.
More practical skills: One piece of feedback we got from the group was that while overall they felt comfortable writing code to perform the validation checks, they didn’t feel they’d be able to start doing their everyday work using Python just yet. In retrospect, the course was very much based around the skills needed to code and submit rules for the project, and while this certainly includes transferable skills, people would benefit from some more general purpose knowledge to enable them to regularly use these skills.
In the next iteration of this project, we will include a module to teach skills such as installing and running Python locally, loading data from files, and creating visualisations.
Knowledge sharing: The group found it helpful seeing the code others had written, to learn different approaches to solving problems. We realised that our meetings had been a bit one-directional, and it would be helpful to emphasise sharing of knowledge among the group. This could help to turn the group into a more cohesive community and make the most of the shared insight around common problems.
Expect capacity constraints: The main challenge in terms of establishing a community around the work was people’s ability to consistently set time aside to take part. Initially around 30 people were involved, but over time more and more people dropped out due to capacity pressures.
Learning to code is a big challenge, and it can be hard to pick up where you left off after an interruption of a few weeks. To mitigate this, we ended up splitting the group into different levels of involvement depending on the amount of time people felt they would be able to commit. Next time we intend to make involvement in the development and advisory groups independent and make it easier for people to vary the amount of time they put in while still being kept in the loop.
Overall, the novel approach proved a success – the tool has now been released, and was used by over 50 local authorities in its first month. The analysts went from having no Python experience at all to coding the majority of the tool’s validation checks and remain involved in the tool’s ongoing development.
However, we learned many valuable lessons about how the next iteration could provide more benefits to the analysts involved and draw more effectively on their expertise. We will be putting these lessons into practice in a further set of tools co-developed with analysts, including a similar data validation site for the CIN Census stat return. If you are a children’s services analyst or manager who would be interested in taking part, either as a developer, or as part of the advisory group, please contact Data to Insight’s Alistair Herbert.