data science – Wikimedian in Residence

This post was written by LLB student Dervla Craig on her first month as Information and Data Literacy intern this Summer.

My name is Dervla and I am going into my second year of the Graduate LLB at the University of Edinburgh. This summer I am doing a 12-week internship with the University’s Information Services Group (ISG) on one of the most fascinating projects I have ever had the opportunity to be a part of: the Accused Witches of Scotland project.

I am one of a long line of interns who has been involved in this project each year since 2019, which aims to commemorate and spread awareness about those who were persecuted as witches in Scotland during the 16th to 18th centuries. While previous interns have primarily been focused on processing and importing data from the University’s landmark The Survey of Scottish Witchcraft database (2003) into Wikidata and created our witches website with new map and timeline visualisations, this year my role looks a bit different.

My remit for the 12 weeks is to prepare a bid to the National Lottery Heritage Fund to secure funding for what we hope can be the next phase of the project. Our goal is to preserve the accused witches’ data in the long-term and ensure that people can connect with and participate in this heritage now and in the future. As it has come to the end of my first month, I wanted to join the tradition of blogging about my internship experience so far.

Getting to know the individual stories of the accused

My first week was spent diving down the rabbit hole to explore all there is to learn about the Scottish witch trials. If you had asked me to describe an accused witch before this week, I would’ve told you they wore a pointy hat and flew around on a broomstick. I couldn’t have been more wrong! Now when I picture these women, they look like my mother, or my grandmother, or me. They were ordinary people who suffered an egregious injustice, and I was extremely moved when learning about their stories.

: Josep Garcia-Reyero (2022)

: Maggie Lin (2022)

: Claire Panella (2023)

: Ellie Whitehead (2024)

: Prof Julian Goodare and Ruby Imrie (2023-2024)

: Geography undergraduate student, Emma Carroll, our first ‘Witchfinder General’ intern in Summer 2019.

Some of the most valuable tools for my research included The Survey of Scottish Witchcraft, created by Julian Goodare, a history professor at the University of Edinburgh, and his team in the late 90’s, and the interactive witches map resource created by Emma Carroll, a geography student and our former Data Visualisation intern, and our late developer colleague Richard Lawson in September 2019. The Survey made use of the historic records of all the accused witches in Scotland between 1563 – 1736 and organised the details into a MS Access 1997 database, and our new 2019 map brings this data to life in a new and engaging way through importing the data into Wikipedia’s sister project, Wikidata, as linked open machine-readable data.

Learning about the great work that has been done previously definitely made me realise I have a lot to live up to but also motivated me to give it my all in the next twelve weeks and hopefully produce an end result that meets the standards of my predecessors.

Writing a National Lottery Heritage Fund bid

The next thing to familiarise myself with before I could dive into my writing was the bid itself. I spent a few days combing through the NLHF website to understand what it is they are looking for and how our project fits those needs. By getting to know my audience, I could ensure that my writing was intriguing, evidence-based and persuasive. I quickly found out that before submitting the 10,000-word application, I must submit a 1000-word ‘Expression of Interest’. This EoI asks questions about the heritage of the project, what our project aims to achieve, and why it is needed now. At this point, I felt I could write a dissertation for each of these questions, so the tight word count was my biggest enemy. I had lots of help from some lovely colleagues who offered their feedback and advice, including the Project Director of the Survey of Scottish Witchcraft, Professor Julian Goodare. After many tweaks and a few redrafts, I am happy to say at the end of week four that the EoI is pretty much ready to go.

Exploring avenues for community engagement

Alongside writing, writing and more writing, I have also been brainstorming ideas for the community engagement side of our project. There have been so many great ideas by the team which have led to interesting and helpful discussions with different people and organisations, including the National Museum of Scotland, Reforesting Scotland, and some really talented artists. Excitingly, most of the responses we have received have been positive and enthusiastic. In the upcoming weeks, I hope to visit some of these places and see firsthand the primary sources from the witch trials.

Workshops and all-staff events

Another educational and fun aspect of my experience so far, outside of my bid writing role, has been the plethora of events hosted by ISG. In the past month I have taken part in ‘intern welcome’ socials, Wikipedia writing workshops and even a workshop on an introduction to blogging! Alongside these, I have also attended two all-staff events, one for all Information Services Group (ISG) staff and one for the Learning, Teaching and Web (LTW) division. Not only have I learned so much about the behind-the-scenes and the people who have made my studies possible for the past five years, but these events have also been an opportunity to get to know more of my colleagues and socialise with other interns. At ISG there is a strong emphasis on having a healthy work-life balance and making sure that you and those around you have what you need to produce your best work.

View of Edinburgh Castle from Floor K, Argyle House. CC-BY-SA by Dervla Craig.

In conclusion, I have had an amazing first month as an intern with the University of Edinburgh. I have learned a lot, met new people, and pushed myself outside of my comfort zone. Plus the amazing view of Edinburgh Castle from Floor K has been a real motivator to work from the office and not from home! I am nothing but hopeful that the next eight weeks will be even more exciting and productive, and that I can blog again soon with positive updates!

P.S. If you haven’t already, definitely visit The Survey of Scottish Witchcraft and the Map of Accused Witches in Scotland websites! They are both amazing (and important) educational resources that I could browse for hours (and have).

Teaching data literacy with real world (witchy) datasets

By Ewan McAndrew

On June 5, 2024

In Uncategorized

Our Digital Humanities award-winning interactive map (witches.is.ed.ac.uk) caught the public’s attention when it launched in September 2019 and has helped to change the way the stories of these women and men were being told with a campaign group, Witches of Scotland, successfully lobbying the Scottish Government into issuing a formal apology from the former First Minister, Nicola Sturgeon, for the grave wrong done to these persecuted women.(BBC News, 2022)

The Survey of Scottish Witchcraft

The map is built upon the landmark Survey of Scottish Witchcraft Database project. Led by Professor Julian Goodare the database collates historical records about Scotland’s accused witches (1563-1736) in one place. This fabulous resource began life in the 1990s before being realised in 2001-2003. It’s a dataset that has the power to fascinate.

However, since 2003, the Survey data has remained static in an MS Access database so I invited groups of students on the University of Edinburgh’s Design Informatics MFA/MA to consider at the course’s annual “Data Fair” in October 2017 what could be done if the data were exported into Wikipedia’s sister project, Wikidata, as machine-readable linked open data? Beyond this, what new insights & visualisations could be achieved if groups of students worked with this real-world dataset and myself as their mentor over a 6-7 week project?

Design Informatics students at the Suffer the Witch symposium at the Patrick Geddes centre displaying the laser-cut 3d map of accused witches in Scotland. CC-BY-SA, Ewan McAndrew

The implementation of Wikidata in the curriculum presents a huge opportunity for students, educators, researchers and data scientists alike. Especially when there is a pressing need for universities to meet the demands of our digital economy for a data literate workforce.

“A common critique of data science classes is that examples are static and student group work is embedded in an ‘artificial’ and ‘academic’ context. We look at how we can make teaching data science classes more relevant to real-world problems. Student engagement with real problems…has the potential to stimulate learning, exchange, and serendipity on all sides.” (Corneli, Murray-Rust and Bach, 2018)

The ‘success of the Data Fair’ model, year on year, prompted questions as to what more could be done over an even more extended project. So I lobbied senior managers for a new internship dedicated to geographically locating the places recorded in the database as linked open data as the next logical step.

Recruiting the ‘Witchfinder General’

Geography student Emma Carroll worked closely under my mentorship and supervision for three months in Summer 2019 with her detective work geolocating historic placenames involving colleagues from the National Library of Scotland, the Scottish Studies Archive, the Scottish Place-Name Society. The website creation itself involved my working with the creativity and expertise of the university’s e-learning developers.

Geography undergraduate student, Emma Carroll, our first ‘Witchfinder General’ intern in Summer 2019.

Since the map’s launch, this project has gained media coverage across Scotland and the world in allowing users to explore, for the first time, where these accused women resided, local to them, and learn all about their stories in a tremendously powerful way. It also shows the potential of engaging with linked open data to help the teaching of data science and to fuel discovery through exploring the direct and indirect relationships at play in this semantic web of knowledge, enabling new insights. There is always more to do and we have since worked with another four student interns on this project since 2022.

Our latest, Ruby Imrie, will be returning following her exams and a Summer break on 15th July to continue her work quality-assuring the vast amount of Scottish witchcraft data in Wikidata and creating new features, new visualisations, fixing any bugs and generally making our Map of Accused Witches in Scotland website as useful, as engaging and as user-friendly as possible so that when it is ready for relaunch in Autumn/Winter 2024 we have something that truly does justice in respecting all the work that has gone before and all the individual women and men persecuted during the Scottish witch trials.

Ruby Imrie and Professor Julian Goodare, Project Director of the Survey of Scottish Witchcraft at University of Edinburgh Library 23 August 2023

Almost five years on – the legacy of the project

The legacy of the project is that our students, year-on-year, are highly engaged and motivated to learn important histories from Scotland’s dark past AND the important data skills required for Scotland’s future digital economy. Many of our colleagues at the University (and beyond) also seek our advice on how to meet research grant stipulations that they make their research outcomes open both in terms of producing open access papers and releasing their data as open data. Lukas Engelmann, History of Medicine, is using Wikidata to document the history of 20th century epidemiology. Dr. Chris Langley and Asst. Prof. Mikki Brock have worked with myself to create a similar website, Mapping the Scottish Reformation, (as a proof-of-concept Project B to our Project A) and have shared their experiences with other similar projects such as: the Argyll and Sutherland Highlanders Military Museum in Stirling; Faversham Local History Group, Places of Worship in Scotland database team and more.

References

1. “Nicola Sturgeon apologises to people accused of witchcraft”. BBC News. 2022-03-08.
2. Corneli, J, Murray-Rust, D & Bach, B 2018, Towards Open-World Scenarios: Teaching the Social Side of Data Science.

Media Responses

Wikidata in the Classroom and the WikiCite project

By Ewan McAndrew

On August 9, 2018

In Uncategorized

The following post was presented by Wikimedian in Residence, Ewan McAndrew, at the Repository Fringe Conference 2018 held on 2nd & 3rd July 2018 at the Royal Society of Edinburgh.

Hi, my name’s Ewan McAndrew and I work at the University of Edinburgh as the Wikimedian in Residence.

My talk’s in two parts;

The first is part is on teaching data literacy with the Survey of Scottish Witchcraft database and Wikidata.

Contention #1: since the City Region deal is there is a pressing need for implementing data literacy in the curriculum to produce a workforce equipped with the data skills necessary to meet the needs of Scotland’s growing digital economy and that this therefore presents a massive opportunity for educators, researchers, data scientists and repository managers alike.

Wikidata is the sister project of Wikipedia and it the backbone to all the Wikimedia projects, a centralised hub of structured, machine-readable, multilingual linked open data. An introduction to Wikidata can be found here.

I was invited along with 13 other ‘problem holders’ to a ‘Data Fair’ on 26 October 2017 hosted by course leaders on the Data Science for Design MSc. We were each afforded just five minutes to pitch a dataset for the 45 students on the course to work on in groups as a five week long project.

The ‘Data Fair’ held on 26 October 2017 for Data Science for Design MSc students. CC-BY-SA, own work.

Two groups of students were enthused to volunteer to help surface the data from the Survey of Scottish Witchcraft database, a fabulous piece of work at the University of Edinburgh from 2001-2003 chronicling information about accused witches in Scotland from the period 1563-1736, their trials and the individuals involved in those trials (lairds, sheriffs, prosecutors etc.) which remained somewhat static and unloved in an Microsoft Access database since the project concluded in 2003. So students at the 2017 Data Fair were invited to consider what could be done if the data was exported into Wikidata with attribution, linking back to the source database to provide verifiable provenance, given multilingual labels and linked to other complementary datasets? Beyond this, what new insights & visualisations of the data could be achieved?

There were several areas of interest: course leaders on the Data Science for Design MSc were keen for the students to work with ‘real world’ datasets in order to give them practical experience ahead of their dissertation projects.

“A common critique of data science classes is that examples are static and student group work is embedded in an ‘artiﬁcial’ and ‘academic’ context. We look at how we can make teaching data science classes more relevant to real-world problems. Student engagement with real problems—and not just ‘real-world data sets’—has the potential to stimulate learning, exchange, and serendipity on all sides, and on different levels: noticing unexpected things in the data, developing surprising skills, ﬁnding new ways to communicate, and, lastly, in the development of new strategies for teaching, learning and practice.”

– Towards Open-World Scenarios: Teaching the Social Side of Data Science by Dave Murray Rust, Joe Corneli and Benjamin Bach.

Beyond this, there were other benefits to the exercise. Tim Berners-Lee, the inventor of the Web, has suggested a 5-star deployment scheme for Open Data (illustrated in the picture and table below). Importing data into Wikidata makes it 5 star data!

By Michael Hausenblas, James G. Kim, five-star Linked Open Data rating system developed by Tim Berners-Lee. (http://5stardata.info/en/) [CC0], via Wikimedia Commons

Number of stars	Description	Properties	Example format
★	make your data available on the Web (whatever format) under an open license	Open license	PDF
★★	make it available as structured data (e.g., Excel instead of image scan of a table)	Open license Machine readable	XLS
★★★	make it available in a non-proprietary open format (e.g., CSV instead of Excel)	Open license Machine readable Open format	CSV
★★★★	use URIs to denote things, so that people can point at your stuff	Open license Machine readable Open format Data has URIs	RDF
★★★★★	link your data to other data to provide context	Open license Machine readable Open format Data has URIs Linked data	LOD

Importing data into Wikidata makes it 5 star data!

Open data producers can use Wikidata IDs as identifiers in datasets to make their data 5 star linked open data. As of June 2018, Wikidata featured in the latest Linked Open Data cloud diagram on lod-cloud.net as a dataset published in the linked data format containing over 5,100,000,000 triples.

Over a series of workshops, the Wikidata assignment also afforded the students the opportunity to develop their understanding of, and engagement with, issues such as: data completeness; data ethics; digital provenance; data analysis; data processing; as well as making practical use of a raft of tools and data visualisations. It also motivated student volunteers to surface a much-loved repository of information as linked open data to enable further insights and research. A project that the students felt proud to take part in and found “very meaningful”. (The students even took the opportunity to consult with professors of History at the university in order to gain even more of an understanding of the period in which these witch trials took place, such was their interest in the subject).

Feedback from students at the conclusion of the project included:

“After we analysed the data, we found we learned the stories of the witches and we learned about European culture especially in the witchhunts.”
“We had wanted to do a happy project but finally we learned much more about these cultures so it was very meaningful for us.”
“In my opinion, it’s quite useful to put learning practice into the real world so that we can see the outcome and feel proud of ourselves… we learned a lot.”
“Thank you for inviting us and appreciating our video. It’s an unforgettable experience in my life. Thank you so much.”

As a result of the students’ efforts, we now have 3219 items of data on the accused witches in Wikidata (spanning 1563 to 1736). We also now have data on 2356 individuals involved in trying these accused witches. Finally we have 3210 witch trials themselves. This means we can link and enrich the data further by adding location data, dates, occupations, places of residence, social class, marriages, and penalties arising from the trial.

The fact that Wikidata is also linked open data means that students can help connect to and leverage from a variety of other datasets in multiple languages; helping to fuel discovery through exploring the direct and indirect relationships at play in this semantic web of knowledge.

Descendents of King James VI and I, king during union of English and Scottish crowns

And we can see an example of this semantic web of related entities, or historical individuals in this case, here in this visualisation of the descendants of King James I of England and James VI of Scotland (as shown in the pic above but do click on the link for a live rendering).

We can also see the semantic web at play in the below class level overview of gene ontologies (505,000 objects) loaded into Wikidata, and linking these genes to items of data on related proteins and items of data on related diseases, which, in turn, have related chemical compounds and pharmaceutical products used to treat these diseases. Many of these datasets have been loaded into Wikidata, or are maintained by, the GeneWiki initiative – around a million Wikidata items of biomedical data – but, importantly, they are also leveraging from other datasets imported from the Centre for Disease Control (CDC) among other sources. This allows researchers to add to and explore the direct and, perhaps more importantly, the indirect relationships at play in this semantic web of knowledge to help identify areas for future research.

Using Wikidata as an open, community-maintained database of biomedical knowledge – CC-BY: Andrew Su, Professor at The Scripps Research Institute.

Which brings me onto…

Contention #2 – Building a bibliographical repository: the sum of all citations

Sharing your data to Wikidata, as a linking hub for the internet, is also the most cost-effective way to surface your repository’s data and make it 5 star linked open data. As a centralised hub for linked open data on the internet, it enables you to leverage from many other datasets while you can still have your own read/write applications on top of Wikidata. (Which is exactly what the GeneWiki project did to encourage domain experts to contribute to knowledge gaps on Wikidata through providing a user-friendly read/write interface to enable the “consumption and curation” of gene annotation data using the Wiki Genome web application).

Within Wikidata, we have biographical data, geographical data, biomedical data, taxomic data and importantly, bibliographic data.

The WikiCite project are building a bibliographic repository of sources within Wikidata.

“How does the Wikimedia movement empower individuals to assess reliable sources and arm them with quality information so they can make decisions based in facts? This question is relevant not only to Wikipedia users but to consumers of media around the globe. Over the past decade, the Wikimedia movement has come together to answer that question. Efforts to design better ways to support sourcing have begun to coalesce around Wikidata – the free knowledgebase that anyone can edit. With the creation of a rich, human-curated, and machine-readable knowledgebase of sources, the WikiCite initiative is crowdsourcing the process of vetting information and its provenance.” – WikiCite Report 2017

Wikidata tools can be used to create Wikidata items on scholarly papers automatically from scraping source metadata from:

DOIs,
PMIDs,
PMCIDs
ORCIDs (NB: Multiple items of data can be created simultaneously to represent multiple scholarly papers using one ORCID identifier input in the Orcidator tool).

Indeed, 1 out of 4 items of data in Wikidata represents a creative work. Wikidata currently includes 10 million entries about citable sources, such as books, scholarly papers, news articles and over 75 million author string statements and 84 million citation links in Wikidatas between these authors and sources. 17 million items with a Pubmed ID and 12.4 million items with a DOI.

17120945	Items with a PubMed ID
13416958	Items with a DOI

Mike Bennett, our Digital Scholarship Developer at the University of Edinburgh, is working to develop a tool to translate the Edinburgh Research Archives’ thesis collection data from ALMA into a format that Wikidata can accept but there are ready-made tools that Wikidatans have developed that will automatically create a Wikidata item of data for scholarly papers scraping the source metadata from DOIs, Pubmed IDs and ORCID identifiers, allowing for a bibliographic record of scholarly papers and their authors to be generated as structured, machine-readable, multilingual linked open data.

Why does this matter?

Well…the Initiative for Open Citations (I4OC) is a new collaboration between scholarly publishers, researchers, and other interested parties to promote the unrestricted availability of scholarly citation data. Over 150 publishers have now chosen to deposit and open up citation data. As a result, the fraction of publications with open references has grown from 1% to more than 50% out of 38 million articles with references deposited with Crossref.

“Citations are the links that knit together our scientific and cultural knowledge. They are primary data that provide both provenance and an explanation for how we know facts. They allow us to attribute and credit scientific contributions, and they enable the evaluation of research and its impacts. In sum, citations are the most important vehicle for the discovery, dissemination, and evaluation of all scholarly knowledge.”

Once made open, the references for individual scholarly publications may be accessed within a few days through the Crossref REST API. Open citations are also available from the OpenCitations Corpus that is progressively and systematically harvesting citation data from Crossref and other sources. An advantage of accessing citation data from the OpenCitations Corpus is that they are available i n machine-readable RDF format which is systematically being added to Wikidata.

Because this is data on scholars, scholarly papers and citations is stored as linked data on Wikidata, the citation data can be linked to, and leverage from, other complementary datasets enabling the direct and indirect relationships to be explored in this semantic web of knowledge.

This means we can parse the data to answer a range of queries such as:

Show me all works which cite a New York Times article/Washington Post article/Daily Telegraph article etc. (delete as appropriate).
Show me the most popular journals cited by statements of any item that is a subclass of economics/archaeology/mathematics etc. (delete as appropriate).
Show me all statements citing the works of Joseph Stiglitz/Melissa Terras/James Loxley/Karen Gregory etc. (delete as appropriate).
Show me all statements citing journal articles by physicists at Oxford University in 1960s/1970s/1980s etc. (delete as appropriate).
Show me all statements citing a journal article that was retracted.

And much more besides.

Screengrab of the Scholia profile for the developmental psychologist, Uta Frith, generated from the structured linked data in Wikidata.

Like the WikiGenome web application already mentioned, other third party applications can be built with user-friendly UIs to read/write from Wikidata. For instance, the Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and research topics. Leveraging from information in Wikidata, Scholia displays information on total number of publications, co-authors, citation statistics in a variety of visualisations. Another way of helping to demonstrate the impact and reach of your research.

Citation statistics for developmental psychologist Uta Frith, visualised on the Scholia web service and generated from the citation data in Wikidata.

Co-author graph for Polly Arnold, Professor of Chemistry at the University of Edinburgh in the School of Chemistry visualised in the Scholia Web Service and generated from bibliographic data in Wikidata. Professor Arnold is the Crum Brown Chair of Chemistry at the University of Edinburgh.

To conclude, the many benefits and power of linked open data to aid the teaching of data literacy and to help share knowledge between different institutions and different repositories, between geographically and culturally separated societies, and between languages is a beautiful empowering thing. Here’s to more of it and entering a brave new world of linked open data. Thank you.

By way of closing I’d like to show you the video presentations the students on the Data Science for Design MSc students came up with as the final outcome of their project to import the Survey of Scottish Witchcraft database into Wikidata.

Here are two data visualisation videos they produced:

Supporting the University of Edinburgh's commitments to digital skills, information literacy, and sharing knowledge openly

Tag: data science

Preserving Scottish Heritage: The Accused Witches of Scotland