Tag: Scholia

Wikidata in the Classroom and the WikiCite project

The following post was presented by Wikimedian in Residence, Ewan McAndrew, at the Repository Fringe Conference 2018 held on 2nd & 3rd July 2018 at the Royal Society of Edinburgh.

 

Hi, my name’s Ewan McAndrew and I work at the University of Edinburgh as the Wikimedian in Residence.

My talk’s in two parts;

The first is part is on teaching data literacy with the Survey of Scottish Witchcraft database and Wikidata.

Contention #1:  since the City Region deal is there is a pressing need for implementing data literacy in the curriculum to produce a workforce equipped with the data skills necessary to meet the needs of Scotland’s growing digital economy and that this therefore presents a massive opportunity for educators, researchers, data scientists and repository managers alike.

Wikidata is the sister project of Wikipedia and it the backbone to all the Wikimedia projects, a centralised hub of structured, machine-readable, multilingual linked open data. An introduction to Wikidata can be found here.

I was invited along with 13 other ‘problem holders’ to a ‘Data Fair’ on 26 October 2017 hosted by course leaders on the Data Science for Design MSc. We were each afforded just five minutes to pitch a dataset for the 45 students on the course to work on in groups as a five week long project.

The ‘Data Fair’ held on 26 October 2017 for Data Science for Design MSc students. CC-BY-SA, own work.

Two groups of students were enthused to volunteer to help surface the data from the Survey of Scottish Witchcraft database, a fabulous piece of work at the University of Edinburgh from 2001-2003 chronicling information about accused witches in Scotland from the period 1563-1736, their trials and the individuals involved in those trials (lairds, sheriffs, prosecutors etc.) which remained somewhat static and unloved in an Microsoft Access database since the project concluded in 2003. So students at the 2017 Data Fair were invited to consider what could be done if the data was exported into Wikidata with attribution, linking back to the source database to provide verifiable provenance, given multilingual labels and linked to other complementary datasets? Beyond this, what new insights & visualisations of the data could be achieved?

There were several areas of interest: course leaders on the Data Science for Design MSc were keen for the students to work with ‘real world’ datasets in order to give them practical experience ahead of their dissertation projects.

 “A common critique of data science classes is that examples are static and student group work is embedded in an ‘artificial’ and ‘academic’ context. We look at how we can make teaching data science classes more relevant to real-world problems. Student engagement with real problems—and not just ‘real-world data sets’—has the potential to stimulate learning, exchange, and serendipity on all sides, and on different levels: noticing unexpected things in the data, developing surprising skills, finding new ways to communicate, and, lastly, in the development of new strategies for teaching, learning and practice.”

Towards Open-World Scenarios: Teaching the Social Side of Data Science by Dave Murray Rust, Joe Corneli and Benjamin Bach.

Beyond this, there were other benefits to the exercise. Tim Berners-Lee, the inventor of the Web, has suggested a 5-star deployment scheme for Open Data (illustrated in the picture and table below). Importing data into Wikidata makes it 5 star data!

By Michael Hausenblas, James G. Kim, five-star Linked Open Data rating system developed by Tim Berners-Lee. (http://5stardata.info/en/) [CC0], via Wikimedia Commons
Number of stars Description Properties Example format
make your data available on the Web (whatever format) under an open license
  • Open license
PDF
★★ make it available as structured data (e.g., Excel instead of image scan of a table)
  • Open license
  • Machine readable
XLS
★★★ make it available in a non-proprietary open format (e.g., CSV instead of Excel)
  • Open license
  • Machine readable
  • Open format
CSV
★★★★ use URIs to denote things, so that people can point at your stuff
  • Open license
  • Machine readable
  • Open format
  • Data has URIs
RDF
★★★★★ link your data to other data to provide context
  • Open license
  • Machine readable
  • Open format
  • Data has URIs
  • Linked data
LOD

Importing data into Wikidata makes it 5 star data!

Open data producers can use Wikidata IDs as identifiers in datasets to make their data 5 star linked open data. As of June 2018, Wikidata featured in the latest Linked Open Data cloud diagram on lod-cloud.net as a dataset published in the linked data format containing over 5,100,000,000 triples.

Over a series of workshops, the Wikidata assignment also afforded the students the opportunity to develop their understanding of, and engagement with, issues such as: data completeness; data ethics; digital provenance; data analysis; data processing; as well as making practical use of a raft of tools and data visualisations. It also motivated student volunteers to surface a much-loved repository of information as linked open data to enable further insights and research. A project that the students felt proud to take part in and found “very meaningful”. (The students even took the opportunity to consult with professors of History at the university in order to gain even more of an understanding of the period in which these witch trials took place, such was their interest in the subject).

Feedback from students at the conclusion of the project included:

  • “After we analysed the data, we found we learned the stories of the witches and we learned about European culture especially in the witchhunts.”
  • “We had wanted to do a happy project but finally we learned much more about these cultures so it was very meaningful for us.”
  • “In my opinion, it’s quite useful to put learning practice into the real world so that we can see the outcome and feel proud of ourselves… we learned a lot.”
  • “Thank you for inviting us and appreciating our video. It’s an unforgettable experience in my life. Thank you so much.”

As a  result of the students’ efforts, we now have 3219 items of data on the accused witches in Wikidata (spanning 1563 to 1736). We also now have data on 2356 individuals involved in trying these accused witches. Finally we have 3210 witch trials themselves. This means we can link and enrich the data further by adding location data, dates, occupations, places of residence, social class, marriages, and penalties arising from the trial.

The fact that Wikidata is also linked open data means that students can help connect to and leverage from a variety of other datasets in multiple languages; helping to fuel discovery through exploring the direct and indirect relationships at play in this semantic web of knowledge.

 

Descendents of King James VI and I, king during union of English and Scottish crowns

And we can see an example of this semantic web of related entities, or historical individuals in this case, here in this visualisation of the descendants of King James I of England and James VI of Scotland (as shown in the pic above but do click on the link for a live rendering).

We can also see the semantic web at play in the below class level overview of gene ontologies (505,000 objects) loaded into Wikidata, and linking these genes to items of data on related proteins and items of data on related diseases, which, in turn, have related chemical compounds and pharmaceutical products used to treat these diseases. Many of these datasets have been loaded into Wikidata, or are maintained by, the GeneWiki initiative – around a million Wikidata items of biomedical data – but, importantly, they are also leveraging from other datasets imported from the Centre for Disease Control (CDC) among other sources. This allows researchers to add to and explore the direct and, perhaps more importantly, the indirect relationships at play in this semantic web of knowledge to help identify areas for future research.

 

Using Wikidata as an open, community-maintained database of biomedical knowledge – CC-BY: Andrew Su, Professor at The Scripps Research Institute.

Which brings me onto…

Contention #2 – Building a bibliographical repository: the sum of all citations

Sharing your data to Wikidata, as a linking hub for the internet, is also the most cost-effective way to surface your repository’s data and make it 5 star linked open data. As a centralised hub for linked open data on the internet, it enables you to leverage from many other datasets while you can still have  your own read/write applications on top of Wikidata. (Which is exactly what the GeneWiki project did to encourage domain experts to contribute to knowledge gaps on Wikidata through providing a user-friendly read/write interface to enable the “consumption and curation” of gene annotation data using the Wiki Genome web application).

Within Wikidata, we have biographical data, geographical data, biomedical data, taxomic data and importantly, bibliographic data.

The WikiCite project are building a bibliographic repository of sources within Wikidata.

“How does the Wikimedia movement empower individuals to assess reliable sources and arm them with quality information so they can make decisions based in facts? This question is relevant not only to Wikipedia users​ but to consumers of media around the globe. Over the past decade, the Wikimedia movement has come together to answer that question. Efforts to design better ways to support sourcing have begun to coalesce around Wikidata – the free knowledgebase that anyone can edit. With the creation of a rich, human-curated, and machine-readable knowledgebase of sources, the WikiCite initiative is crowdsourcing the process of vetting information​ and its provenance.” – WikiCite Report 2017

Wikidata tools can be used to create Wikidata items on scholarly papers automatically from scraping source metadata from:

  • DOIs,
  • PMIDs,
  • PMCIDs
  • ORCIDs (NB: Multiple items of data can be created simultaneously to represent multiple scholarly papers using one ORCID identifier input in the Orcidator tool).

Indeed, 1 out of 4 items of data in Wikidata represents a creative work. Wikidata currently includes 10 million entries about citable sources, such as books, scholarly papers, news articles and over 75 million author string statements and 84 million citation links in Wikidatas between these authors and sources. 17 million items with a Pubmed ID and 12.4 million items with a DOI.

Mike Bennett, our Digital Scholarship Developer at the University of Edinburgh, is working to develop a tool to translate the Edinburgh Research Archives’ thesis collection data from ALMA into a format that Wikidata can accept but there are ready-made tools that Wikidatans have developed that will automatically create a Wikidata item of data for scholarly papers scraping the source metadata from DOIs, Pubmed IDs and ORCID identifiers, allowing for a bibliographic record of scholarly papers and their authors to be generated as structured, machine-readable, multilingual linked open data.

Why does this matter?

Well…​the Initiative for Open Citations (I4OC) is a new collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data. Over 150 publishers have now chosen to deposit and open up citation data. As a result, the fraction of publications with open references has grown from 1% to more than 50% out of 38 million articles with references deposited with Crossref.

Citations are the links that knit together our scientific and cultural knowledge. They are primary data that provide both provenance and an explanation for how we know facts. They allow us to attribute and credit scientific contributions, and they enable the evaluation of research and its impacts. In sum, citations are the most important vehicle for the discovery, dissemination, and evaluation of all scholarly knowledge.”

Once made open, the references for individual scholarly publications may be accessed within a few days through the Crossref REST API.  Open citations are also available from the OpenCitations Corpus that is progressively and systematically harvesting citation data from Crossref and other sources. An advantage of accessing citation data from the OpenCitations Corpus is that they are available i n machine-readable RDF format which is systematically being added to Wikidata.

Because this is data on scholars, scholarly papers and citations is stored as linked data on Wikidata, the citation data can be linked to, and leverage from, other complementary datasets enabling the direct and indirect relationships to be explored in this semantic web of knowledge.

This means we can parse the data to answer a range of queries such as:

  • Show me all works which cite a New York Times article/Washington Post article/Daily Telegraph article etc. (delete as appropriate).
  • Show me the most popular journals cited by statements of any item that is a subclass of economics/archaeology/mathematics etc. (delete as appropriate).
  • Show me all statements citing the works of Joseph Stiglitz/Melissa Terras/James Loxley/Karen Gregory etc. (delete as appropriate).
  • Show me all statements citing journal articles by physicists at Oxford University in 1960s/1970s/1980s etc. (delete as appropriate).
  • Show me all statements citing a journal article that was retracted.

And much more besides.

Screengrab of the Scholia profile for the developmental psychologist, Uta Frith, generated from the structured linked data in Wikidata.

 

Like the WikiGenome web application already mentioned, other third party applications can be built with user-friendly UIs to read/write from Wikidata. For instance, the Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and research topics. Leveraging from information in Wikidata, Scholia displays information on total number of publications, co-authors, citation statistics in a variety of visualisations. Another way of helping to demonstrate the impact and reach of your research.

Citation statistics for developmental psychologist Uta Frith, visualised on the Scholia web service and generated from the citation data in Wikidata.
Co-author graph for Polly Arnold, Professor of Chemistry at the University of Edinburgh in the School of Chemistry visualised in the Scholia Web Service and generated from bibliographic data in Wikidata. Professor Arnold is the Crum Brown Chair of Chemistry at the University of Edinburgh.

To  conclude, the many benefits and power of linked open data to aid the teaching of data literacy and to help share knowledge between different institutions and different repositories, between geographically and culturally separated societies, and between languages is a beautiful empowering thing. Here’s to more of it and entering a brave new world of linked open data. Thank you.

By way of closing I’d like to show you the video presentations the students on the Data Science for Design MSc students came up with as the final outcome of their project to import the Survey of Scottish Witchcraft database into Wikidata.

Here are two data visualisation videos they produced:

Further reading

 3 steps to better demonstrate your institution’s commitment to Open Knowledge and Open Science.

  1. Allocate time/buy out time for academics & postdoctoral researchers to add university research (backed up with citations) to Wikipedia in existing/new pages. Establishing relevance is the most important aspect of adding university research so an understanding of the subject matter is important along with ensuring the balance of edits meets the ethos of Wikipedia so that any possible suggestion of promotion/academic boosterism is outweighed by the benefit of subject experts paying knowledge forward for the common good. At least three references are required for a new article on Wikipedia so citing the work of fellow professionals goes some way to ensuring the article has a wider notability and helps pay it forward. Train contributors prior to editing to ensure they are aware of Wikipedia’s policies & guidelines and monitor their contributions to ensure edits are not reverted.
  2. Identify the most cited works by your university’s researchers which are already on Wikipedia using Altmetric software. Once identified, systematically add in the Open Access links to any existing (paywalled) citations on Wikipedia and complete the edit by adding in the OA symbol (the orange padlock) using the {{open access}} template. Also join WikiProject Open Access.
  3. Help build up a bibliographic repository of structured machine-readable (and multilingual) linked open data on both university researchers AND research papers in Wikidata using the easy-to-use suite of tools available.
Wikipedia's front page 11 May 2017

Did you know – Mary Susan McIntosh

Did you know that that sociologist, feminist, and campaigner for lesbian and gay rights Mary Susan McIntosh was deported from the U.S. in 1960 for speaking out against the House Un-American Activities Committee?

Mary Susan McIntosh (1936–2013) sociologist, feminist, political activist and campaigner for lesbian and gay rights in the UK. A 1974 colour photograph from her time as a Research Fellow at Nuffield College, Oxford. CC-BY-SA
Mary Susan McIntosh (1936–2013) sociologist, feminist, political activist and campaigner for lesbian and gay rights in the UK. A 1974 colour photograph from her time as a Research Fellow at Nuffield College, Oxford. CC-BY-SA

Yesterday this ‘Did You Know‘ fact was on Wikipedia’s front page. The front page is viewed, on average, 25 million times a day.

Mary’s page was only written in March during our International Women’s Day event here at the University of Edinburgh by one of our attendees, Lorna Campbell (read Lorna’s blog article on Mary here).

While her page has only been live on Wikipedia for two months, Mary’s page has now been viewed in excess of 7000 times because a) editors were motivated to address Wikipedia’s gender gap problem where less than 15% of editors are female and less than 17% of biographies are of notable women and b) we felt Mary’s story was important enough that it should be shared on Wikipedia’s front page and introduced to an audience of up to 25 million.

Did you know you could do that? Nominate a page newly created in the last seven days, or significantly expanded on, to be included on Wikipedia’s front page in this way?

View the guidelines for Did You Know here.

The Wikimedia residency at the University of Edinburgh has been as much about demystifying the largest reference work on the internet as anything else so here are some other things I feel are worth knowing in the spirit of ‘did you know‘?:

 

  • Did you know that Wikipedia works with Turnitin to address issues of plagiarism and copyright violation using the Copyvio tool and that the Dashboard for managing assignments now offers Authorship Highlighting of students’ edits thereby making it easier to visualize and evaluate student work.
  • Did you know that Wikipedia does not want you to cite it? It is a tertiary source; an aggregator of articles with facts backed up from reliable published secondary sources. You can’t cite Wikipedia but you can cite the references it uses. In this way it is reframed as the digital gateway to further research sources.
  • Did you know that Wikipedia editing teaches source evaluation as a core skill hence Wikipedia education assignments help students combat fake news?
  • Did you know that Dr. Alex Chow at the University of Edinburgh’s School of Divinity has developed a script to help assess the word count of Wikipedia articles for use with student assignments?
  • Did you know that only 7% of edits to Wikipedia areconsidered vandalism and that research has found that, unlike other parts of the internet, Wikipedia editing actually de-radicalises its editors of partisan political leanings?
  • Did you know you can learn:
  • Did you know that you can upload openly-licensed longer texts to Wikisource (the free content library) which are transcribed into 100% searchable HTML so that works such as Thomas Jehu’s digitised PhD thesis can be linked to, one click away, from his Wikipedia article or out-of-copyright texts such as Robert Louis Stevenson’s book on ‘Edinburgh’ (1914) can be enjoyed by new audiences?
  • Did you know that Wikidata, Wikimedia’s repository of structured open data, now has 3 million linked citations added to it which can be queried using the new Scholia tool – a tool to handle scientific bibliographic information? (The Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and for research topics. To collect the data, it queries the SPARQL-based Wikidata Query Service).
  • Did you know that you can now add automatically generated citations to millions of books on Wikipedia? Wikipedia editors can now draw on WorldCat, the world’s largest database of books, to generate citations on Wikipedia thanks to a collaboration between OCLC (Online Computer Library Center) and the Wikimedia Foundation’s Wikipedia Library program.
  • Did you know that the latest estimates by Crossref show that Wikipedia has risen from the 8th most prolific referrer to DOIs to the 5th. And this is thought to be a gross underestimate of its actual position?
  • Did you know that Altmetric include Wikipedia citations in their impact metrics and that Altmetric automatically picks up on citations through Wikipedia’s citation generator?
  • Did you know that Wikimedia has received a $3 million grant from the Alfred P. Sloan Foundation to make a ‘Structured Commons’ to make freely-licensed images accessible and reusable across the web?
  • Did you know that releasing images through Wikimedia Commons can result in a huge increase in views with detailed metrics about the number of views these images are accruing? E.g. Images released by the Bodleian Library have accrued 218,460,571 views to date.
  • Did you know about the WikiCite initiative? Tidying up the citations on Wikipedia to make a consistent, queryable bibliographic repository enhancing the visibility and impact of research.
  • Did you know that thanks to the new I4OC initiative (April 2017) there exists a collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data? Before I4OC started, publishers releasing references in the open accounted for just 1% of citation metadata collected annually by Crossref. Following discussions over the past months, several subscription-access and open-access publishers have recently made the decision to release reference list metadata publicly. These include: American Geophysical Union, Association for Computing Machinery, BMJ, Cambridge University Press, Cold Spring Harbor Laboratory Press, EMBO Press, Royal Society of Chemistry, SAGE Publishing, Springer Nature, Taylor & Francis, and Wiley. These publishers join other publishers who have been opening their references through Crossref for some time.
  • Did you know that thanks to Wikidata you can now query, analyse & visualise the largest reference work on the internet? You can also add your research data to combine datasets on Wikidata.
  • Did you know that the University of Portsmouth have been running a Wikipedia assignment called Human Geography for the last five years where each student is assigned a different short stub article for a village in England and Wales, and asked to expand it to provide a rounded description of the place and, in particular, an account of its historical development?
  • Did you know that, so far, they have left Scotland untouched and so there will be many villages and towns in Scotland ripe to have articles created and improved?
  • Did you know that Wikivoyage is Wikipedia’s sister project and a Lonely Planet-esque travel guide? Students can write articles about their hometown area with bullet-pointed sections on ‘Things to do’, ‘Things to See’, ‘Things to Buy’, ‘Places to stay’ with Open Street Maps included and images added from Wikimedia Commons.
  • Did you know how students and staff at the University of Edinburgh have reacted to the Wikipedia in the Classroom assignments we have run this year? You can view a compilation of their feedback in this video.
  • Did you know that students can create entire textbooks, chapters of textbooks, on Wikipedia’s sister project, Wikibooks?
  • Did you know that every September the world’s largest photography competition takes place, Wiki Loves Monuments? Participants are encouraged to photograph and upload images of listed buildings and monuments to document our cultural heritage.
  • Did you know that the WikiShootme tool helps identify notable buildings in your area that require an image uploading?
  • Did you know that taking part in Wikimedia activities does not always require a heavy time component and that short, fun activities can also help: adding a citation through the Citation Hunt tool (“Whack-a-mole for citations”), playing the Wikidata game, adding images through WikiShootMe and FIST; taking part in fun Wiki Races (6 degrees of separation for Wiki links between articles).
  • Did you know that you can become a Wikipedia trainer with our new lesson plan and slide deck (available on Tes.com)?
  • Did you know that you can learn how to edit at our 90 minute training sessions and how to become a trainer at our 3 hour Train the Trainer events?
  • Did you know that I can deliver presentations and training as you require; be it on Wikisource (the free content library), Wikidata (the free and open respository of structured data), Wikimedia Commons (the free media respository), the Wikicite initiative, WikiVoyage (the free travel guide), writing articles for Wikipedia, adding your research to Wikipedia or something else entirely?

If you would like to find out more then feel free to contact me at ewan.mcandrew@ed.ac.uk

 

  • Want to become a Wikipedia editor?
  • Want to become a Wikipedia trainer?
  • Want to run a Wikipedia course assignment?
  • Want to contribute images to Wikimedia Commons?
  • Want to learn more about Wikisource?
  • Want to contribute your research to Wikipedia?
  • Want to contribute your research data to Wikidata?