Jump to content

Wikidata data validation

From Wikimedia Belgium

Wikidata data must be validated.

Still it is only eventually complete; which is a theoretical status, because it will never be attained.

Characteristics

[edit]
  • Items and properties have unique and immutable keys
  • Triplestore database with statements (predicates) consisting of object-property-subject/value pairs (entity relationship like in a relational database)
  • Easily updateable (simple concept; user driven data model)
  • Never 100% complete nor correct (eventual consistent)
  • Multilingual:
    • Labels, Descriptions, and Aliases are language sensitive
    • Properties are described like items (Property namespace)
      • New properties can be defined (by developers, upon user request, and approval)

Tips

[edit]
  • Items must be unique
    • Unique Label - Description combination (homonym distinction)
    • Description must be different from Label
    • Label can not be repeated in Alias
  • Items must either have an instance or a class
  • Notability:
    1. Item could be created because a Wikipedia page exists
    2. notoriety
    3. Item is required to describe another item
  • Labels in lowercase (nouns) or initial capital (proper names) -- i.e. German is an exception
  • Add sources (references) to statements (more information, proof of validity)
  • Use (P6104) (maintained by WikiProject) to list related items

Techniques

[edit]
  • Constraints:
    • additional classes might be added
    • reciproque property
    • reverse property
  • Missing language labels
  • Homonym dedection and (P1889) registration

Tools

[edit]

Known problems

[edit]
  • Constraints are not proactively enforced
    • Duplicates, data quality problems

See also

[edit]