Agents will look for info elsewhere unless official sources sharpen up
The UK's hopes of fueling cutting-edge AI development and applications with a National Data Library (NDL) could be dashed unless it makes datasets easier to use.
With misleading titles and non-existent metadata, the data currently available cannot support any meaningful analysis, a study from the Open Data Institute (ODI) found.
In the Autumn Budget of 2024, the government confirmed plans for the NDL, promising researchers and businesses "powerful insights that will drive growth and transform people's quality of life through better public services and cutting‑edge innovation, including AI." In January, it published an update , saying the plan was backed by a £100 million investment as part of £1.9 billion being provided to the Department for Science, Innovation and Technology (DSIT) through 2028/29.
DSIT said it had completed an extensive discovery phase to map out "the biggest opportunities and priorities" and "test approaches to systemic reform" across the public sector.
However, the ODI has published an "NDL-Lite" prototype, with access to more than 100,000 public datasets.
It found some of the datasets – particularly on data.gov.uk – are badly labelled, out of date, or effectively invisible to AI tools.
When authoritative data is hard to access, AI systems turn to other sources, such as news reports or commercial data, which do not always give accurate information, the ODI warned.
The prototype gathered 38 GB of data from six public sector sources, processing and standardizing more than 100,000 files into a single resource.
While the study showed the NDL could be built at relatively low cost, it also highlighted the work needed to make the data AI-ready.
Professor Elena Simperl, director of research at the ODI, told The Register that the findings highlight a growing gap between the volume of public data available and its practical usability.
"For crime statistics, the AI agents then went and tried to find crime statistics from somewhere else.
If you don't update your data, if your metadata is not good quality and has lots of missing values, we could see from our experiments with the AI agent we built that they would just circumvent the available data.
It would go elsewhere on social media and other places to try to find that information in a report somewhere, because it's much easier for them," she said.
"The government's National Data Library has huge potential, but much of the data it would rely on is not yet usable by modern AI systems.
If that doesn't change, there is a risk that AI tools will increasingly rely on sources that are easier to access, rather than those that are most reliable."
A government spokesperson told us it wants to "maximise the benefits of public sector data" in a bid to make services "more efficient and grow the economy."
"Reflecting these findings, we're already overhauling the UK's digital public infrastructure through our Roadmap for Modern Digital Government .
The National Data Library is the latest project designed to help researchers and data scientists find all the publicly held data they need.
Launched in 2004, the Secure Research Service (SRS) offers curated, research-ready datasets to accredited researchers.
In 2020, the government planned to replace this system with the Integrated Data Service (IDS) from the ONS.
However, some of its budget of £240.8 million was used – with approval from His Majesty's Treasury – to fund more general tech and data costs as the ONS struggled to get off legacy IT systems.
Funding for the IDS was effectively cut in March , although existing services will continue to be available, largely within the ONS, missing one of the major objectives.
The NDL is the new plan for national data sharing to support research, machine learning, and AI.
ODI's study shows the work needed to avoid being another missed opportunity.
®
Related Stories
Source: This article was originally published by The Register
Read Full Original Article →
Comments (0)
No comments yet. Be the first to comment!
Leave a Comment