Share-PSI 2.0 logo

Best Practice: Enable quality assessment of open data

Draft: 14 February 2016

This version
http://www.w3.org/2013/share-psi/bp/eqa-20160214/
Latest version
http://www.w3.org/2013/share-psi/bp/eqa/
Previous version
http://www.w3.org/2013/share-psi/bp/eqa-20151012/

This is one of a set of Best Practices developed by the Share-PSI 2.0 Thematic Network.

Creative Commons Licence Share-PSI Best Practice: Enable quality assessment of open data by Share-PSI 2.0 is licensed under a Creative Commons Attribution 4.0 International License.


Outline

Improve the trust in government by enabling assessment and/or providing evidence related to the quality of the published information/data.

Challenge

Whether a dataset is of sufficient quality to be used for a specific task will depend entirely on the task in question. There is no single objective measure of quality. However, work carried out in the European Commission's Open Data Support project suggests 7 aspects to consider:

  • Accuracy: is the data correctly representing the real-world entity or event?
  • Consistency: Is the data not containing contradictions?
  • Availability: Can the data be accessed now and over time?
  • Completeness: Does the data include all data items representing the entity or event?
  • Conformance: Is the data following accepted standards?
  • Credibility: Is the data based on trustworthy sources?
  • Processability: Is the data machine-readable?
  • Relevance: Does the data include an appropriate amount of data?
  • Timeliness: Is the data representing the actual situation and is it published soon enough?

Users of data are likely to be making their assessment of whether the information and data available are sufficient for their needs based on these criteria. It is a challenge to understand what is required in terms of data quality and define a set of basic and measurable metrics to determine data quality in an objective way.

Solution

Implement a dataset publication pipeline that includes quality assessment and provenance information alongside the published data. The W3C Data Quality Vocabulary provides the means to make such information available in a machine readable and interoperable manner, covering:

  • annotations that describe the data's quality;
  • computed quality metrics;
  • certificates that describe the dataset production pipeline;
  • provenance information using the W3C PROV standards.

Best Practice Identification

Why is this a Best Practice? What’s the impact of the Best Practice

As a results of this practice, re-users will have greater trust the published data and will not need to carry, out or pay for others to carry out, a quality assessment.

Why is this a Best Practice?

As a results of this practice, re-users will have greater trust the published data and will not need to carry, out or pay for others to carry out, a quality assessment. Using this best practice, the publishers will support re-use, in particular, the creation of innovative commercial services.

How do I implement this Best Practice?

To assess the publishing process, consider the steps described by ODI Certificates (or similar).

Enable potential reusers to assess the quality of the dataset itself: provide provenance information and annotations using the W3C Data Quality Vocabulary.

For assessing the quality of the dataset itself prior to publishing, e.g. for publishing statistical data in RDF format an RDF Data Cube validator (PDF) can be used.

To enrich the data with quality assessment information and track provenance in RDF integration process, e.g. the UnifiedViews tool can be used.

Where has this best practice been implemented?

Country Implementation Contact Point
UK ODI Certificate for the Westminster City Council Westminster City Council
Serbia Validating RDF Data Cube Models Valentina Janev, Mihailo Pupin Institute, University of Belgrade, Belgrade, Serbia

References

Contact Info

Valentina Janev, Institut Mihajlo Pupin .

$Id: Overview.html,v 1.4 2016/08/22 10:45:52 phila Exp $