Share-PSI 2.0 logo

Best Practice: Enable quality assessment of open data

Draft: 12 October 2015

This version
http://www.w3.org/2013/share-psi/bp/eqa-20151012/
Latest version
http://www.w3.org/2013/share-psi/bp/eqa/

This is one of a set of Best Practices developed by the Share-PSI 2.0 Thematic Network.

Creative Commons Licence Share-PSI Best Practice: Enable quality assessment of open data by Share-PSI 2.0 is licensed under a Creative Commons Attribution 4.0 International License.


Outline

Improve the trust in government by enabling assessment and/or providing evidence related to the quality of the published information/data.

Management Summary

Whether a dataset is of sufficient quality to be used for a specific task will depend entirely on the task in question. There is no single objective measure of quality. However, work carried out in the European Commission's Open Data Support project suggests 7 aspects to consider:

  • Accuracy: is the data correctly representing the real-world entity or event?
  • Consistency: Is the data not containing contradictions?
  • Availability: Can the data be accessed now and over time?
  • Completeness: Does the data include all data items representing the entity or event?
  • Conformance: Is the data following accepted standards?
  • Credibility: Is the data based on trustworthy sources?
  • Processability: Is the data machine-readable?
  • Relevance: Does the data include an appropriate amount of data?
  • Timeliness: Is the data representing the actual situation and is it published soon enough?

Users of data are likely to be making their assessment of whether the information and data available are sufficient for their needs based on these criteria.

Challenge

Understand what is required in terms of data quality and define a set of basic and measurable metrics to determine data quality in an objective way.

Solution

Implement a dataset publication pipeline that includes quality assessment and provenance information alongside the published data. The W3C Data Quality Vocabulary provides the means to make such information available in a machine readable and interoperable manner, covering:

  • annotations that describe the data's quality;
  • computed quality metrics;
  • certificates that describe the dataset production pipeline;
  • provenance information using the W3C PROV standards.

Best Practice Identification

Why is this a Best Practice? What’s the impact of the Best Practice

As a results of this practice, re-users will have greater trust the published data and will not need to carry, out or pay for others to carry out, a quality assessment.

Why is there a need for this Best Practice?

To support re-use, in particular, the creation of innovative commercial services.

What do you need for this Best Practice?

To assess the publishing process, consider the steps described by ODI Certificates (or similar).

To potnetial re-users to assess the quality of the dataset itself, provide provenance information and annotations using the W3C Data Quality Vocabulary. Feedback from existing users is always of interest to potential users.

Applicability by other Member States

The approach is applicable to any Member State.

Contact Info

Valentina Janev, Institut Mihajlo Pupin.

$Id: Overview.html,v 1.4 2016/08/19 09:12:48 phila Exp $