data science for design – Wikimedian in Residence

Our Digital Humanities award-winning interactive map (witches.is.ed.ac.uk) caught the public’s attention when it launched in September 2019 and has helped to change the way the stories of these women and men were being told with a campaign group, Witches of Scotland, successfully lobbying the Scottish Government into issuing a formal apology from the former First Minister, Nicola Sturgeon, for the grave wrong done to these persecuted women.(BBC News, 2022)

The Survey of Scottish Witchcraft

The map is built upon the landmark Survey of Scottish Witchcraft Database project. Led by Professor Julian Goodare the database collates historical records about Scotland’s accused witches (1563-1736) in one place. This fabulous resource began life in the 1990s before being realised in 2001-2003. It’s a dataset that has the power to fascinate.

However, since 2003, the Survey data has remained static in an MS Access database so I invited groups of students on the University of Edinburgh’s Design Informatics MFA/MA to consider at the course’s annual “Data Fair” in October 2017 what could be done if the data were exported into Wikipedia’s sister project, Wikidata, as machine-readable linked open data? Beyond this, what new insights & visualisations could be achieved if groups of students worked with this real-world dataset and myself as their mentor over a 6-7 week project?

Design Informatics students at the Suffer the Witch symposium at the Patrick Geddes centre displaying the laser-cut 3d map of accused witches in Scotland. CC-BY-SA, Ewan McAndrew

The implementation of Wikidata in the curriculum presents a huge opportunity for students, educators, researchers and data scientists alike. Especially when there is a pressing need for universities to meet the demands of our digital economy for a data literate workforce.

“A common critique of data science classes is that examples are static and student group work is embedded in an ‘artificial’ and ‘academic’ context. We look at how we can make teaching data science classes more relevant to real-world problems. Student engagement with real problems…has the potential to stimulate learning, exchange, and serendipity on all sides.” (Corneli, Murray-Rust and Bach, 2018)

The ‘success of the Data Fair’ model, year on year, prompted questions as to what more could be done over an even more extended project. So I lobbied senior managers for a new internship dedicated to geographically locating the places recorded in the database as linked open data as the next logical step.

Recruiting the ‘Witchfinder General’

Geography student Emma Carroll worked closely under my mentorship and supervision for three months in Summer 2019 with her detective work geolocating historic placenames involving colleagues from the National Library of Scotland, the Scottish Studies Archive, the Scottish Place-Name Society. The website creation itself involved my working with the creativity and expertise of the university’s e-learning developers.

Geography undergraduate student, Emma Carroll, our first ‘Witchfinder General’ intern in Summer 2019.

Since the map’s launch, this project has gained media coverage across Scotland and the world in allowing users to explore, for the first time, where these accused women resided, local to them, and learn all about their stories in a tremendously powerful way. It also shows the potential of engaging with linked open data to help the teaching of data science and to fuel discovery through exploring the direct and indirect relationships at play in this semantic web of knowledge, enabling new insights. There is always more to do and we have since worked with another four student interns on this project since 2022.

Our latest, Ruby Imrie, will be returning following her exams and a Summer break on 15th July to continue her work quality-assuring the vast amount of Scottish witchcraft data in Wikidata and creating new features, new visualisations, fixing any bugs and generally making our Map of Accused Witches in Scotland website as useful, as engaging and as user-friendly as possible so that when it is ready for relaunch in Autumn/Winter 2024 we have something that truly does justice in respecting all the work that has gone before and all the individual women and men persecuted during the Scottish witch trials.

Ruby Imrie and Professor Julian Goodare, Project Director of the Survey of Scottish Witchcraft at University of Edinburgh Library 23 August 2023

Almost five years on – the legacy of the project

The legacy of the project is that our students, year-on-year, are highly engaged and motivated to learn important histories from Scotland’s dark past AND the important data skills required for Scotland’s future digital economy. Many of our colleagues at the University (and beyond) also seek our advice on how to meet research grant stipulations that they make their research outcomes open both in terms of producing open access papers and releasing their data as open data. Lukas Engelmann, History of Medicine, is using Wikidata to document the history of 20th century epidemiology. Dr. Chris Langley and Asst. Prof. Mikki Brock have worked with myself to create a similar website, Mapping the Scottish Reformation, (as a proof-of-concept Project B to our Project A) and have shared their experiences with other similar projects such as: the Argyll and Sutherland Highlanders Military Museum in Stirling; Faversham Local History Group, Places of Worship in Scotland database team and more.

References

1. “Nicola Sturgeon apologises to people accused of witchcraft”. BBC News. 2022-03-08.
2. Corneli, J, Murray-Rust, D & Bach, B 2018, Towards Open-World Scenarios: Teaching the Social Side of Data Science.

Media Responses

Balance for Better – Teaching Matters

By Ewan McAndrew

On June 27, 2019

In Uncategorized

Wikimedian in Residence @emcandre highlights how staff & students are engaging with Wikipedia to address the diversity of editors & content shared online.

“The information that is on Wikipedia spreads across the internet. What is right or wrong or missing on Wikipedia affects the entire internet.” (Wadewitz, 2014)

Wikipedia, the free, online, encyclopaedia is building the largest open knowledge resource in human history. Now aged eighteen, Wikipedia ranks among the world’s top ten sites for scholarly resource lookups and is extensively used by virtually every platform used on a daily basis, receiving over 500 million views per month, from 1.5 billion unique devices. As topics on Wikipedia become more visible on Google, they receive more press coverage and become better known amongst the public.

“Wikipedia is today the gateway through which millions of people now seek access to knowledge.”- (Cronon, 2012)

At the University of Edinburgh, we have quickly generated real examples of technology-enhanced learning activities appropriate to the curriculum and transformed our students, staff and members of the public from being passive readers and consumers to being active, engaged contributors. The result is that our community is more engaged with knowledge creation online and readers all over the world benefit from our teaching, research and collections.

While Wikipedia has significant reach and influence, it also has significant gaps in its coverage of topics, articles in other languages and the diversity of its editors. Most editors are white men, and topics covered reflect this with less than 18 percent of biographies on English Wikipedia about women. The Wikimedia community are committed to diversity and inclusivity and have developed, and worked with, a number of initiatives to ensure knowledge equity such as Whose Knowledge.org and WikiProject Women in Red, with Wikimedia’s campaign for 200 more biographies of female sportswomen (Levine, 2019) just one recent example of looking at ways to address this systemic bias.

Our Wikimedia in the Curriculum activities bring benefits to the students who learn new skills and have immediate impact in addressing both the diversity of editors and diversity of content shared online:

Global Health MSc students add 180-200 words to Global Health related articles e.g. their edits to the page on obesity are viewed 3,000 times per day on average.
Digital Sociology MSc students engage in workshops with how sociology is communicated and how knowledge is created and curated online each year as a response to the recent ASA article.
Reproductive Biology Honours – a student’s article on high-grade serous carcinoma, one of the most common forms of ovarian cancer, includes 60 references and diagrams she created, has been viewed over 67,000 times since 2016.
Translation Studies MSc students gain meaningful published practice by translating 2,000 words to share knowledge between two different language Wikipedias on a topic of their own choosing.
World Christianity MSc students undertake a literature review assignment to make the subject much less about White Northern hemisphere perspectives; creating new articles on Asian Feminist Theology, Sub-Saharan Political Theology and more.
Data Science for Design MSc – Wikipedia’s sister project, Wikidata, affords students the opportunity to work practically with research datasets, like the Survey of Scottish Witchcraft Database, and surface data to the Linked Open Data Cloud and explore the direct and indirect relationships at play in this semantic web of knowledge to help further discovery.

We also work with student societies (Law & Technology, History, Translation, Women in STEM, Wellcomm Kings) and have held events for Ada Lovelace Day, LGBT History Month, Black History Month and celebrated Edinburgh’s Global Alumni; working with the UncoverEd project and the Commonwealth Scholarship Commission.

Students are addressing serious knowledge gaps and are intrinsically motivated to do so because their scholarship is published and does something lasting for the common good, for an audience of not one but millions.

Representation matters. Gender inequality in science and technology is all too real. Gaps in our shared knowledge excludes the vitally important contributions of many within our community and you can’t be what you can’t see. To date, 65% of our participating editors at the University of Edinburgh have been women. The choices being made in creating new pages and increasing the visibility of topics and the visibility of inspirational role models online can not only shape public understanding around the world for the better but also help inform and shape our physical environments to inspire the next generation.

“It’s an emotional connection… Within, I’d say, less than 2 hours of me putting her page in place it was the top hit that came back in Google when I Googled it and I just thought that’s it, that’s impact right there!” (Hood & Littlejohn, 2018)

Rosie Taylor and Isobel Cordrey from the student support group, Wellcomm Kings, co-hosted the Wikipedia Diversithon event for LGBT History Month at the Festival of Creative Learning 2019.

Bibliography

Wadewitz, A. (2014). 04. Teaching with Wikipedia: the Why, What, and How. Retrieved from https://www.hastac.org/blogs/wadewitz/2014/02/21/04-teaching-wikipedia-why-what-and-how
Cronon, W. (2012). Scholarly Authority in a Wikified World | Perspectives on History | AHA. Retrieved from https://www.historians.org/publications-and-directories/perspectives-on-history/february-2012/scholarly-authority-in-a-wikified-world
Levine, N. (2019). A Ridiculous Gender Bias On Wikipedia Is Finally Being Corrected. Retrieved from https://www.refinery29.com/en-gb/2019/06/234873/womens-world-cup-football-wikipedia
Mathewson, J., & McGrady, R. (2018). Experts Improve Public Understanding of Sociology Through Wikipedia. Retrieved from https://www.asanet.org/news-events/footnotes/apr-may-2018/features/experts-improve-public-understanding-sociology-through-wikipedia
Hood, N., & Littlejohn, A. (2018). Becoming an online editor: perceived roles and responsibilities of Wikipedia editors. Retrieved from http://www.informationr.net/ir/23-1/paper784.html
McAndrew, E., O’Connor, S., Thomas, S., & White, A. (2019). Women scientists being whitewashed from Wikipedia. Retrieved from https://www.scotsman.com/news/opinion/women-scientists-being-whitewashed-from-wikipedia-ewan-mcandrew-siobhan-o-connor-dr-sara-thomas-and-dr-alice-white-1-4887048
McMahon, C.; Johnson, I.; and Hecht, B. (2017). The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies.

The Wikimedia residency is a free resource available to all staff and students interested in exploring how to benefit from and contribute to the free and open Wikimedia projects.

If you would like to find out more contact ewan.mcandrew@ed.ac.uk

In the news

The University of Edinburgh recently won at the Herald Higher Education Awards 2019 for its Wikimedia in the Curriculum work.
Physicist and Wikipedia editor Jess Wade creates a Wikipedia entry every day to address the encyclopedia’s under-representation of women and people of colour in science. She was named in Nature’s “Ten people who mattered in 2018” and was recently awarded the British Empire Medal in the Queen’s Birthday Honours for her campaigning for diversity in science.

Wikidata in the Classroom and the WikiCite project

By Ewan McAndrew

On August 9, 2018

In Uncategorized

The following post was presented by Wikimedian in Residence, Ewan McAndrew, at the Repository Fringe Conference 2018 held on 2nd & 3rd July 2018 at the Royal Society of Edinburgh.

Hi, my name’s Ewan McAndrew and I work at the University of Edinburgh as the Wikimedian in Residence.

My talk’s in two parts;

The first is part is on teaching data literacy with the Survey of Scottish Witchcraft database and Wikidata.

Contention #1: since the City Region deal is there is a pressing need for implementing data literacy in the curriculum to produce a workforce equipped with the data skills necessary to meet the needs of Scotland’s growing digital economy and that this therefore presents a massive opportunity for educators, researchers, data scientists and repository managers alike.

Wikidata is the sister project of Wikipedia and it the backbone to all the Wikimedia projects, a centralised hub of structured, machine-readable, multilingual linked open data. An introduction to Wikidata can be found here.

I was invited along with 13 other ‘problem holders’ to a ‘Data Fair’ on 26 October 2017 hosted by course leaders on the Data Science for Design MSc. We were each afforded just five minutes to pitch a dataset for the 45 students on the course to work on in groups as a five week long project.

The ‘Data Fair’ held on 26 October 2017 for Data Science for Design MSc students. CC-BY-SA, own work.

Two groups of students were enthused to volunteer to help surface the data from the Survey of Scottish Witchcraft database, a fabulous piece of work at the University of Edinburgh from 2001-2003 chronicling information about accused witches in Scotland from the period 1563-1736, their trials and the individuals involved in those trials (lairds, sheriffs, prosecutors etc.) which remained somewhat static and unloved in an Microsoft Access database since the project concluded in 2003. So students at the 2017 Data Fair were invited to consider what could be done if the data was exported into Wikidata with attribution, linking back to the source database to provide verifiable provenance, given multilingual labels and linked to other complementary datasets? Beyond this, what new insights & visualisations of the data could be achieved?

There were several areas of interest: course leaders on the Data Science for Design MSc were keen for the students to work with ‘real world’ datasets in order to give them practical experience ahead of their dissertation projects.

“A common critique of data science classes is that examples are static and student group work is embedded in an ‘artiﬁcial’ and ‘academic’ context. We look at how we can make teaching data science classes more relevant to real-world problems. Student engagement with real problems—and not just ‘real-world data sets’—has the potential to stimulate learning, exchange, and serendipity on all sides, and on different levels: noticing unexpected things in the data, developing surprising skills, ﬁnding new ways to communicate, and, lastly, in the development of new strategies for teaching, learning and practice.”

– Towards Open-World Scenarios: Teaching the Social Side of Data Science by Dave Murray Rust, Joe Corneli and Benjamin Bach.

Beyond this, there were other benefits to the exercise. Tim Berners-Lee, the inventor of the Web, has suggested a 5-star deployment scheme for Open Data (illustrated in the picture and table below). Importing data into Wikidata makes it 5 star data!

By Michael Hausenblas, James G. Kim, five-star Linked Open Data rating system developed by Tim Berners-Lee. (http://5stardata.info/en/) [CC0], via Wikimedia Commons

Number of stars	Description	Properties	Example format
★	make your data available on the Web (whatever format) under an open license	Open license	PDF
★★	make it available as structured data (e.g., Excel instead of image scan of a table)	Open license Machine readable	XLS
★★★	make it available in a non-proprietary open format (e.g., CSV instead of Excel)	Open license Machine readable Open format	CSV
★★★★	use URIs to denote things, so that people can point at your stuff	Open license Machine readable Open format Data has URIs	RDF
★★★★★	link your data to other data to provide context	Open license Machine readable Open format Data has URIs Linked data	LOD

Importing data into Wikidata makes it 5 star data!

Open data producers can use Wikidata IDs as identifiers in datasets to make their data 5 star linked open data. As of June 2018, Wikidata featured in the latest Linked Open Data cloud diagram on lod-cloud.net as a dataset published in the linked data format containing over 5,100,000,000 triples.

Over a series of workshops, the Wikidata assignment also afforded the students the opportunity to develop their understanding of, and engagement with, issues such as: data completeness; data ethics; digital provenance; data analysis; data processing; as well as making practical use of a raft of tools and data visualisations. It also motivated student volunteers to surface a much-loved repository of information as linked open data to enable further insights and research. A project that the students felt proud to take part in and found “very meaningful”. (The students even took the opportunity to consult with professors of History at the university in order to gain even more of an understanding of the period in which these witch trials took place, such was their interest in the subject).

Feedback from students at the conclusion of the project included:

“After we analysed the data, we found we learned the stories of the witches and we learned about European culture especially in the witchhunts.”
“We had wanted to do a happy project but finally we learned much more about these cultures so it was very meaningful for us.”
“In my opinion, it’s quite useful to put learning practice into the real world so that we can see the outcome and feel proud of ourselves… we learned a lot.”
“Thank you for inviting us and appreciating our video. It’s an unforgettable experience in my life. Thank you so much.”

As a result of the students’ efforts, we now have 3219 items of data on the accused witches in Wikidata (spanning 1563 to 1736). We also now have data on 2356 individuals involved in trying these accused witches. Finally we have 3210 witch trials themselves. This means we can link and enrich the data further by adding location data, dates, occupations, places of residence, social class, marriages, and penalties arising from the trial.

The fact that Wikidata is also linked open data means that students can help connect to and leverage from a variety of other datasets in multiple languages; helping to fuel discovery through exploring the direct and indirect relationships at play in this semantic web of knowledge.

Descendents of King James VI and I, king during union of English and Scottish crowns

And we can see an example of this semantic web of related entities, or historical individuals in this case, here in this visualisation of the descendants of King James I of England and James VI of Scotland (as shown in the pic above but do click on the link for a live rendering).

We can also see the semantic web at play in the below class level overview of gene ontologies (505,000 objects) loaded into Wikidata, and linking these genes to items of data on related proteins and items of data on related diseases, which, in turn, have related chemical compounds and pharmaceutical products used to treat these diseases. Many of these datasets have been loaded into Wikidata, or are maintained by, the GeneWiki initiative – around a million Wikidata items of biomedical data – but, importantly, they are also leveraging from other datasets imported from the Centre for Disease Control (CDC) among other sources. This allows researchers to add to and explore the direct and, perhaps more importantly, the indirect relationships at play in this semantic web of knowledge to help identify areas for future research.

Using Wikidata as an open, community-maintained database of biomedical knowledge – CC-BY: Andrew Su, Professor at The Scripps Research Institute.

Which brings me onto…

Contention #2 – Building a bibliographical repository: the sum of all citations

Sharing your data to Wikidata, as a linking hub for the internet, is also the most cost-effective way to surface your repository’s data and make it 5 star linked open data. As a centralised hub for linked open data on the internet, it enables you to leverage from many other datasets while you can still have your own read/write applications on top of Wikidata. (Which is exactly what the GeneWiki project did to encourage domain experts to contribute to knowledge gaps on Wikidata through providing a user-friendly read/write interface to enable the “consumption and curation” of gene annotation data using the Wiki Genome web application).

Within Wikidata, we have biographical data, geographical data, biomedical data, taxomic data and importantly, bibliographic data.

The WikiCite project are building a bibliographic repository of sources within Wikidata.

“How does the Wikimedia movement empower individuals to assess reliable sources and arm them with quality information so they can make decisions based in facts? This question is relevant not only to Wikipedia users but to consumers of media around the globe. Over the past decade, the Wikimedia movement has come together to answer that question. Efforts to design better ways to support sourcing have begun to coalesce around Wikidata – the free knowledgebase that anyone can edit. With the creation of a rich, human-curated, and machine-readable knowledgebase of sources, the WikiCite initiative is crowdsourcing the process of vetting information and its provenance.” – WikiCite Report 2017

Wikidata tools can be used to create Wikidata items on scholarly papers automatically from scraping source metadata from:

DOIs,
PMIDs,
PMCIDs
ORCIDs (NB: Multiple items of data can be created simultaneously to represent multiple scholarly papers using one ORCID identifier input in the Orcidator tool).

Indeed, 1 out of 4 items of data in Wikidata represents a creative work. Wikidata currently includes 10 million entries about citable sources, such as books, scholarly papers, news articles and over 75 million author string statements and 84 million citation links in Wikidatas between these authors and sources. 17 million items with a Pubmed ID and 12.4 million items with a DOI.

17120945	Items with a PubMed ID
13416958	Items with a DOI

Mike Bennett, our Digital Scholarship Developer at the University of Edinburgh, is working to develop a tool to translate the Edinburgh Research Archives’ thesis collection data from ALMA into a format that Wikidata can accept but there are ready-made tools that Wikidatans have developed that will automatically create a Wikidata item of data for scholarly papers scraping the source metadata from DOIs, Pubmed IDs and ORCID identifiers, allowing for a bibliographic record of scholarly papers and their authors to be generated as structured, machine-readable, multilingual linked open data.

Why does this matter?

Well…the Initiative for Open Citations (I4OC) is a new collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data. Over 150 publishers have now chosen to deposit and open up citation data. As a result, the fraction of publications with open references has grown from 1% to more than 50% out of 38 million articles with references deposited with Crossref.

“Citations are the links that knit together our scientific and cultural knowledge. They are primary data that provide both provenance and an explanation for how we know facts. They allow us to attribute and credit scientific contributions, and they enable the evaluation of research and its impacts. In sum, citations are the most important vehicle for the discovery, dissemination, and evaluation of all scholarly knowledge.”

Once made open, the references for individual scholarly publications may be accessed within a few days through the Crossref REST API. Open citations are also available from the OpenCitations Corpus that is progressively and systematically harvesting citation data from Crossref and other sources. An advantage of accessing citation data from the OpenCitations Corpus is that they are available i n machine-readable RDF format which is systematically being added to Wikidata.

Because this is data on scholars, scholarly papers and citations is stored as linked data on Wikidata, the citation data can be linked to, and leverage from, other complementary datasets enabling the direct and indirect relationships to be explored in this semantic web of knowledge.

This means we can parse the data to answer a range of queries such as:

Show me all works which cite a New York Times article/Washington Post article/Daily Telegraph article etc. (delete as appropriate).
Show me the most popular journals cited by statements of any item that is a subclass of economics/archaeology/mathematics etc. (delete as appropriate).
Show me all statements citing the works of Joseph Stiglitz/Melissa Terras/James Loxley/Karen Gregory etc. (delete as appropriate).
Show me all statements citing journal articles by physicists at Oxford University in 1960s/1970s/1980s etc. (delete as appropriate).
Show me all statements citing a journal article that was retracted.

And much more besides.

Screengrab of the Scholia profile for the developmental psychologist, Uta Frith, generated from the structured linked data in Wikidata.

Like the WikiGenome web application already mentioned, other third party applications can be built with user-friendly UIs to read/write from Wikidata. For instance, the Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and research topics. Leveraging from information in Wikidata, Scholia displays information on total number of publications, co-authors, citation statistics in a variety of visualisations. Another way of helping to demonstrate the impact and reach of your research.

Citation statistics for developmental psychologist Uta Frith, visualised on the Scholia web service and generated from the citation data in Wikidata.

Co-author graph for Polly Arnold, Professor of Chemistry at the University of Edinburgh in the School of Chemistry visualised in the Scholia Web Service and generated from bibliographic data in Wikidata. Professor Arnold is the Crum Brown Chair of Chemistry at the University of Edinburgh.

To conclude, the many benefits and power of linked open data to aid the teaching of data literacy and to help share knowledge between different institutions and different repositories, between geographically and culturally separated societies, and between languages is a beautiful empowering thing. Here’s to more of it and entering a brave new world of linked open data. Thank you.

By way of closing I’d like to show you the video presentations the students on the Data Science for Design MSc students came up with as the final outcome of their project to import the Survey of Scottish Witchcraft database into Wikidata.

Here are two data visualisation videos they produced:

Supporting the University of Edinburgh's commitments to digital skills, information literacy, and sharing knowledge openly

Tag: data science for design

Teaching data literacy with real world (witchy) datasets