Catch the quality line to Quality St.

joe leach
data architect
enterprise architecture team

Quality street (see β€œThe Government Data Quality Framework: Guidance β€” Gov.uk”)

πŸ”Ž Identify critical data

  • Find data driving operational success
  • Find data that is vital for decision-making
  • Find data where there is a high impact of low quality

…sometimes it is all three at once - like in the waste emergency!

each point in the background represents a typical single service attached to a council function (see β€œGitHub - Joel-Lbth/Council-Data-Network: Reference Data Network Based on Lginform Function-Service Mappings β€” Github.com”)

πŸ”Ž Find data driving operational success - waste emergency example

(see β€œLocal Authority Data Explorer - DLUHC Data Dashboards β€” Oflog.data.gov.uk”)

πŸ”Ž Find data driving operational success - waste emergency example

Successful operations have clarity on provision (data is authoritative):

  • πŸ§‘πŸ½ person (a user of a service)
  • πŸ“ location (a place where services happen)
  • πŸ›οΈ thing (assets involved in service delivery)

…having a clear link to the original source, in the form of master or reference data, helps services to plan

πŸ”Ž Find data driving operational success and πŸ† win πŸŽ‰ awards πŸ†

β€œThe Winner of the 2023 Data Linking Award was Lewisham Council for integrating its Waste and Recycling Service to the Local Land and Property Gazetteer … the council has ensured that households eligible for food and garden waste collection can access the appropriate services, but do so without incurring additional costs.” Shout to William for pulling this off with postgis and qgis πŸ¦„

(see β€œGeoPlace Announces Winners of the 2023 Exemplar Awards β€” Geoplace.co.uk”)

πŸ”Ž Find data that is vital for decision-making

The 2022-26 strategic plan records 52 key performance indicators, key political decisions are made based on this data

example KPIs showing that in y2, recycling is below minimum acceptable levels and that missed collections data is missing for the first two quarters of y2

πŸ”Ž Find data where there is a high impact of low quality

  • enforcement?
  • planning?

πŸ“ƒ Identify your data quality rules

  1. Completeness
  2. Uniqueness
  3. Consistency
  4. Timeliness
  5. Validity
  6. (In)accuracy

(see Askham et al. 2013)

βœ… Completeness

A school collects forms from parents on emergency contact telephone numbers.

There are 300 students, but 294 responses are collected and recorded.

completeness = 294/300 x 100 = 98%

βœ… Uniqueness

A school has 120 current students and 380 former students (i.e. 500 in total).

The student database shows 501 different student records.

This includes Bob Tables and Bobby Tables as separate records, despite only one student at the school named Bob Tables.

uniqueness = 500/501 x 100 = 99.8%

βœ… Consistency

In a school, a student’s date of birth has the same value and format in the school register as that stored within the student database.

This is an example where reference data may be used, if it exists for a person.

βœ… Timeliness

A school has a service level agreement that a change to an emergency contact will occur within 2 days.

A parent gives an updated emergency contact number on 1 June.

It is entered into the student database on the 4 June.

It has taken 3 days to update the system which breaches the agreed data quality rule.

βœ… Validity

Primary and Junior School applications capture the age of a child. This age is entered into the database and the age checked to ensure it is between 4 and 11. Any values outside of this range are rejected as invalid.

βœ… (In)accuracy

A school receives applications for its annual September intake and requires students to be aged 5 before 31 August of the intake year.

Someone completes the Date of Birth (D.O.B) on the application in the US date format. The student is accepted in error as the date of birth given is 09/08/YYYY rather than 08/09/YYYY.

Inaccuracy is also important like when anonymising or de-identifying data so it remains linkable across datasets.

🀹🏽 User needs and trade-offs

The six rules may require juggling at times:

In 2018 the Office for National Statistics (ONS) introduced a new model for publishing Gross Domestic Product (GDP). This enabled monthly estimates of GDP to be published. However, there was a trade-off between timeliness and accuracy of the data.

πŸ‘©πŸ½β€πŸ”¬ Assess initial KPIs

  • percentages: measuring the whole data set, or a part of it - percentages can indicate the scale of a problem
  • count: typically counts are used to measure incorrect data
  • true or false: things that will compromise the data set if they are wrong
  • ratio: the ratio of errors or problems to data without errors or problems

✍🏽 Document your findings πŸ•΅πŸ½

  • understand previous data quality problems
    • know where improvements may need to be made in the future
      • get information about where data quality may limit the use of the data

🌱 Root cause analysis

(see β€œThe Government Data Quality Framework: Guidance β€” Gov.uk”)

  1. Log data quality problems
    1. Understand the data journey
      1. Estimate the cost of fixing and not fixing
        1. Fix as close to source as possible
          1. Is it correct for its original purpose?
            1. Continue to monitor your data

βš–οΈ Plan improvements

UPRN, always UPRN! This is sometimes achieved with systems integration (e.g. postcode to address completion), and sometims with data matching and linking (Extract Transform Load pipelines)

πŸ“ˆ Define goals for data quality improvement

e.g. creation of systems and/or data integrations that implement data standards (yep UPRN… again)

πŸ“° Report on your data quality

Convert the quality measurements into Key Performance Indicators (KPIs) that can be reliably monitored (hence the use of numeric metric in the initial investigation).

♻️ Repeat measurements of data quality over time

Automate the monitoring of the KPIs derived in the initial assessment: make some dashboards

Once you’ve got to Quality St. there’s more lines to ride

πŸ“š References

β€œ38,380b β€” N06.” https://www.flickr.com/photos/82567656@N06/53159412872.
Askham, Nicola, Denise Cook, Martin Doyle, Helen Fereday, Mike Gibson, Ulrich Landbeck, Rob Lee, Chris Maynard, Gary Palmer, and Julian Schwarzenbach. 2013. β€œThe Six Primary Dimensions for Data Quality Assessment.” DAMA UK Working Group, 432–35.
β€œBascule Bridge, Shadwell Basin β€” Flickr.com.” https://www.flickr.com/photos/londonlesstravelled/53133947647.
Desai, Tanvi, Felix Ritchie, Richard Welpton, et al. 2016. β€œFive Safes: Designing Data Access for Research.” Economics Working Paper Series 1601: 28.
β€œDigital, Data and Technology Profession Capability Framework.” https://ddat-capability-framework.service.gov.uk.
β€œGeoPlace Announces Winners of the 2023 Exemplar Awards β€” Geoplace.co.uk.” https://www.geoplace.co.uk/press/2023/geoplace-announces-winners-of-the-2023-exemplar-awards.
β€œGitHub - Joel-Lbth/Council-Data-Network: Reference Data Network Based on Lginform Function-Service Mappings β€” Github.com.” https://github.com/joel-lbth/council-data-network.
β€œGraffiti (Gee), Mile End, East London, England. β€” Flickr.com.” https://www.flickr.com/photos/64joe/9542221590.
β€œGraffiti (Jimmy C), Old Ford, East London, England. β€” Flickr.com.” https://www.flickr.com/photos/64joe/9544645679/.
β€œGraffiti (Joe), Bethnal Green, East London, England. β€” Flickr.com.” https://www.flickr.com/photos/64joe/9584318907/in/photostream/.
β€œGraffiti (Various Artists), Off Brick Lane, East London, England. β€” Flickr.com.” https://www.flickr.com/photos/64joe/9580259019/.
β€œGWL β€” N00.” https://www.flickr.com/photos/24987280@N00/20173201553.
β€œHelicopter Ambulance, Royal London Hospital β€” Flickr.com.” https://www.flickr.com/photos/londonlesstravelled/53137459375.
hippisley-cox, julia. 2011. β€œOpenPseudononymiser β€” Openpseudonymiser.org.” https://www.openpseudonymiser.org.
β€œLocal Authority Data Explorer - DLUHC Data Dashboards β€” Oflog.data.gov.uk.” https://oflog.data.gov.uk/waste-management?local_authority=Tower+Hamlets.
β€œMetro Designer - Tennessine β€” Tennessine.co.uk.” https://tennessine.co.uk/metro/.
β€œProud 🌈 β€” N06.” https://www.flickr.com/photos/154427287@N06/53019378298.
β€œStreet Art - 4490 β€” Flickr.com.” https://www.flickr.com/photos/padraiccollins/50903104071.
β€œSupermoon β€” N07.” https://www.flickr.com/photos/65798210@N07/53224848439.
β€œThe Government Data Quality Framework: Guidance β€” Gov.uk.” https://www.gov.uk/government/publications/the-government-data-quality-framework/the-government-data-quality-framework-guidance.
Wilkinson, Mark D, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. β€œThe FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 1–9.