Structured, linked, open data

The interlinking of the Wikimedia projects.

The Wikimedia projects are all about connections and wiki-linking to find out more. So the projects all interlink with one another too.

Click through to view Robert Louis Stevenson’s Wikipedia page.

We found a text called Edinburgh (1914). This was a text recently uploaded to Wikisource from a djvu scan on the Internet Archive. The text was OCR-ed and proofread by two Wikisource users to ensure it was correct. Now it is 100% searchable HTML and the images have been cropped out so they can be shared individually as openly-licensed images on Wikimedia Commons.

As a result we now have:

  1. The illustrated text described as “to the Scot it ought to be a sort of Bible” in 100% searchable HTML on Wikisource.
  2. Illustrations shared to Wikimedia Commons for anyone to share and reuse.
  3. A new Wikipedia article created on the book with a link to these images and to the text on Wikisource. 1 click away!
  4. A link to the text on Wikisource added to the Wikipedia page for Edinburgh so that the text is surfaced on a relevant page where people can discover it.

Don’t believe me about the 100% searchable HTML? Type “moist eyebrows” into the search bar on Wikisource and see if it can find where Stevenson uses it in one of his novels. Make sure you use the speech marks so it can find the exact phrase.

How is data structured on Wikidata?

Every item of data on Wikidata represents a unique entity so is given a unique Q number to identify it.

Douglas Adams is Q42. Can you guess what Q13 is?

Within these item pages, information is stored in a series of statements with each statement being a triplet of the form “subject – predicate – object”, for example:

  • Edinburghis acapital city
  • Leithis located inEdinburgh

Things that can be used as subject or object are called items, and things that can be used as predicate are called properties. Properties have a unique P number and any new properties have to be community agreed. New items with a unique Q number can be created by any one. There are a few thousand properties in Wikidata, and literally millions of items.

Properties and Values on Wikidata (Slide by Andrew Lih)

The most important property is P31 (instance of) as this is the all important ‘what is it?’ statement.

The Value can be another item of data on Wikidata so in this way links are built up between items on Wikidata.

Statements should be referenced so the information is verifiable. Adding the Reference URL (P854) is important so we can add in the provenance of where the data was sourced from.

Qualifiers may also be needed to ensure the information is accurate and true. e.g. Jane Belson was only Douglas Adams’ wife for a certain period of time so we need to add a qualifier on the dates for the statement about his spouse to be true.

The GeneWiki example.

GeneWiki

Wikidata as linking hub of the internet.

Wikidata as a universal (library) thesaurus – Presentation by Koninklijke Bibliotheek, National Library of the Netherlands

The Possibilities are Endless

  1. Descendents of Genghis Khan.
  2. Timeline of works by Robert Louis Stevenson.
  3. Map of places of birth of female saxophonists.
  4. Map showing the birthplace of over 6,000 notable women born in Latin America and listed on Wikipedia but with no article in the Spanish language edition.

Some more example use cases

1. The GeneWiki project – queries. (video)

2. If Voltaire had used Wikipediatimeline of Voltaire’s works.

2. The collections of the National Library of Wales. – Histropedia timeline. (video)

3. Scholia – create on-the-fly scholarly profiles. (video) Author disambiguator tool.

4. The EveryPolitician project. (video).

5. The Sum of All paintings project – a WikiProject to get an item for every notable painting. InteGraality tool. Worklists

6. Crotos – a search and display engine for visual artworks powered by Wikidata and Wikimedia Commons

7. Filter results on Crotos to only show images that have particular things depicted e.g. images with boats. Add depicts statements using Art Explorer.

8. IIIF Cropper on Crotos.  – crop parts of images to show only what you are interested it is depicting. e.g. kisses                     Use the Image Positions tool

9. The WikiCite project – an initiative (and a series of events) aiming to build a bibliographic database in Wikidata to serve free knowledge. WikiProject Source MetaData is the place on Wikidata where coordination of these efforts happens and published research papers can be imported to Wikidata by inputting DOI or Pubmed IDs into the Source MD tool.

10. The Zika Corpus (timeline).

11. MPs’ occupations and MPs’ place of education.