Share-PSI Best Practice: Enable quality assessment of open data

Outline

Improve the trust in government by enabling assessment and/or providing evidence related to the quality of the published information/data.

Links to the Revised PSI Directive

Data Quality

Challenge

Whether a dataset is of sufficient quality to be used for a specific task will depend entirely on the task in question. There is no single objective measure of quality. However, work carried out in the European Commission's Open Data Support project suggests 7 aspects to consider:

Accuracy: is the data correctly representing the real-world entity or event?
Consistency: Is the data not containing contradictions?
Availability: Can the data be accessed now and over time?
Completeness: Does the data include all data items representing the entity or event?
Conformance: Is the data following accepted standards?
Credibility: Is the data based on trustworthy sources?
Processability: Is the data machine-readable?
Relevance: Does the data include an appropriate amount of data?
Timeliness: Is the data representing the actual situation and is it published soon enough?

Users of data are likely to be making their assessment of whether the information and data available are sufficient for their needs based on these criteria. It is a challenge to understand what is required in terms of data quality and define a set of basic and measurable metrics to determine data quality in an objective way.

Solution

Implement a dataset publication pipeline that includes quality assessment and provenance information alongside the published data. The W3C Data Quality Vocabulary provides the means to make such information available in a machine readable and interoperable manner, covering:

annotations that describe the data's quality;
computed quality metrics;
certificates that describe the dataset production pipeline;
provenance information using the W3C PROV standards.

Best Practice Identification

Why is this a Best Practice? What’s the impact of the Best Practice

As a results of this practice, re-users will have greater trust the published data and will not need to carry, out or pay for others to carry out, a quality assessment.

Why is this a Best Practice?

As a results of this practice, re-users will have greater trust the published data and will not need to carry, out or pay for others to carry out, a quality assessment. Using this best practice, the publishers will support re-use, in particular, the creation of innovative commercial services.

How do I implement this Best Practice?

To assess the publishing process, consider the steps described by ODI Certificates (or similar).

Enable potential reusers to assess the quality of the dataset itself: provide provenance information and annotations using the W3C Data Quality Vocabulary.

For assessing the quality of the dataset itself prior to publishing, e.g. for publishing statistical data in RDF format an RDF Data Cube validator (PDF) can be used.

To enrich the data with quality assessment information and track provenance in RDF integration process, e.g. the UnifiedViews tool can be used.

Where has this best practice been implemented?

Country	Implementation	Contact Point
UK	ODI Certificate for the Westminster City Council	Westminster City Council
Serbia	Validating RDF Data Cube Models	Valentina Janev, Mihailo Pupin Institute, University of Belgrade, Belgrade, Serbia

References

David Corsar, Peter Edwards, Enhancing Open Data with Provenance, dot.rural Digital Economy Hub
ProvenanceWeek 2014

Giorgos Flouris, Yannis Roussakis, Marrıa Poveda-Villalon, Pablo N. Mendes, Irini Fundulaki, Using Provenance for Quality Assessment and Repair in Linked Open Data, 2nd Joint Workshop on Knowledge Evolution and Ontology Dynamics (EvoDyn-12) at the ISWC2012
Makx Dekkers, AMI Consult, How good is good enough?
Amanda Smith & Sumika Sakanishi, ODI, Publishing and improving the quality of open data with Open Data Certificates, United Kingdom

Contact Info

Valentina Janev, Institut Mihajlo Pupin .