Supporting the University of Edinburgh's commitments to digital skills, information literacy, and sharing knowledge openly

Month: July 2021

Wikimedia and the Diversity of Languages online – Guest post by Clea Strathmann

Globally, over 7,000 languages are spoken – only around 4% of people are native English speakers. Despite this, English holds the title of the “Language of the internet”.  It dominates with Chinese almost 50% of global web traffic with the top ten languages accounting for 76.9 percent of global internet users. The majority of African and Indigenous languages are not recognised by Google’s search engine. 

When an English speaker searches for something on Google, a Wikipedia article typically appears as a top hit, often as a convenient infobox at the side of the browser. This is because English Wikipedia has over 6 million articles. Wikipedias in other languages are more limited – only two other Wikipedias (Cebuano and Swedish) have over 3 million articles, and the 20 largest Wikipedias have around 1 million entries each. Many of these articles are comparatively shorter than those in English Wikipedia. 

Percentage of Wikipedia articles in each language group – Western European language groups dominate Wikipedia.

This lack of diversity restricts a significant portion of the world from access to knowledge that is readily-available to English speakers, and disproportionately affects those who live in less-developed regions who may not speak any of the internet’s other dominant languages. Access to knowledge is vital for bridging the understanding between languages and cultures. 

Knowledge creates understanding – understanding is sorely lacking in today’s world. – Katherine Maher, Executive Director Wikimedia Foundation. 

The United Nations has, as part of their sustainable development goals, emphasised a need for equitable education and lifelong learning. To enable this, resources of knowledge must be available in all languages. But alongside access to knowledge, the lack of linguistic diversity is a pressing issue for smaller languages, including indigenous languages which are dying out at a rate of two languages per month. For speakers of these languages, their extinction may also reflect the extinction of their culture and identity. 

Watch Dr. Sara Thomas speak about Scots Wikipedia at the Arctic Knot.

The role of Wikimedia in improving linguistic diversity 

Wikipedia is attempting to increase global access to knowledge, and it is one of the aims of The Wikimedia Foundation to ensure that knowledge is diverse, inclusive, and accessible to all. When considering linguistic diversity, the aim is for the number of Wikipedia articles to be evenly distributed across languages. Theoretically, this could be done by simply translating articles from one language Wikipedia into another. 

However, translating Wikipedia would not be enough to create linguistic diversity. Take the Game of Thrones article on Welsh “Wicipedia”, for instance, which highlights the similarity of the fictional languages in the series to Welsh and emphasises its Welsh actors. This demonstrates the impact of culture on what is important, or not, to the readers of Wikipedia. The relationship between the language and culture is heavily-entangled, and makes it even more important that these are represented and preserved online. 

Watch the opening speeches by Aili Keskitalo, President of the Sámi Parliament of Norway and Guri Melby, Minister of Education Norway at the Arctic Knot 2021

One of the best ways that we can support linguistic diversity is through collaborative efforts with Wikipedia projects. In 2017, The University of Edinburgh and Wikimedia UK started the ‘Celtic Knot’ Wikipedia Language conference, which aims to bring together smaller language communities to collaborate on ideas for how to improve the Wikipedia content in these languages and to increase their linguistic presence across other language Wikipedias. The Celtic Knot also developed into the Arctic Knot conference, hosted by Wikimedia Norway this year, which aims to improve the visibility of indigenous arctic languages. These conferences allow speakers to address the importance of engaging with their language, and provide practical resources for encouraging contributions to Wikipedia. The Toolkit for language activism, for instance, supports the creation of digital skills and written language skills which can help people who speak minority languages to contribute to Wikipedia. Through such projects, people are encouraged to contribute to Wikipedia to improve both representation and usability of languages. 

Using Wikidata to build linguistic diversity online 

From the collaborative efforts of dedicated Wikimedians, communities are already seeing successes in increasing the presence of their languages. But for smaller languages, including many indigenous languages, writing entire Wikipedia articles is challenging and time-consuming. This is where Wikipedia’s sister project – Wikidata – has proven to be an important contributor to improving language diversity online.

This chart, made using Wikidata, shows the amount of Wikipedia articles about Greek citizens that are available on English Wikipedia but not on Greek Wikipedia. The majority are sports players, but it also includes a number of artists and academics.

Wikidata is a free and open knowledge base of machine-readable facts. Each data item has  a unique identifier (a ‘Q’ number). The label, description and all of the statements within each data item can be labelled in any language and, because of this, the data can be instantly transformed into any language. This means that any search can make this knowledge both discoverable and understandable in any language. Items from Wikidata are important for modern technologies such as Amazon’s Alexa and Siri, which use Wikidata’s machine-readable entries to answer questions – but, importantly, these can only provide responses in the languages it is labelled in, and the number of Wikidata language labels, beyond European languages, is scarce.

As an example, take disease and health data, which constitute vital information that needs to be easily-accessible. A search of diseases uploaded to Wikidata reveals over 13,000 diseases have been uploaded to the database, but around 5,000 of these entries are only labelled in 1 language. So whilst Wikidata is a useful tool to aid knowledge discovery, it will take the work of native language speakers from around the world to develop it into the linguistically diverse database that it has the potential to become. In growing both the number of items in Wikidata, and its language labels, technologies can become more accessible for different languages. Ultimately, this is crucial in enabling smaller languages to thrive, rather than just to survive. 

What can we do to promote linguistic diversity?

Governments have highlighted the importance of actively increasing linguistic diversity. UNESCO has produced a 10-year plan for the preservation of indigenous languages, referred to as the Decade of Indigenous Languages, which calls into action the human rights of Indigenous Peoples. A key part of the plan surrounds the use of technology to support access to Indigenous languages – this can involve the use of Wikipedia and Wikidata as impactful open platforms for building global understanding about different languages and, alongside this, different cultures. Encouraging people to contribute to Wikipedia may seem difficult, but events including the Celtic and Arctic Knot conferences, and outreach projects such as Indigenizing Wikipedia, have demonstrated how successfully Wikipedia can be used as a platform for language activism. 

By contributing to both Wikipedia and Wikidata, we can increase the use and representation of smaller languages, contributing to the preservation of the important cultures that are intertwined with them. 

Clea Strathmann, Open Data and Knowledge Equity intern

Watch the whole Arctic Knot conference on YouTube here.

Welsh Wikipedia Thinking Big – Keynote address by Jason Evans at the Celtic Knot

A state of the question – the Catalan language project – Àlex Hinojo, Executive Director, Amical Wikimedia

The Scottish Gaelic Uicipeid project – Susan Ross at the Celtic Knot

Celtic Knot – Panel discussion & closing plenary: The Politics of Language Online

Supporting Open Collections – Guest post by Wikisourceror intern, Erin Boyle

Figure 1: ‘Main Library Rainbow’, Stewart Lamb Cromar 2021 CC BY-SA, File:’Main Library Rainbow’ (2 3) (51239066072).jpg – Wikimedia Commons

I am now at the end of week four of my role as a Wikisourceror – Open Collections intern, and the learning process has continued; albeit now I am a bit more familiar with the world of Wiki! I have now created two new articles on Wikipedia (for Hannah Shields and Iona McGregor), and this week I uploaded some of Stewart Lamb Cromar’s (@stubot) Lego Library images to Wikimedia Commons. You can now find one of the Lego Library pictures on the Wikipedia page for the University of Edinburgh Main Library!

I also had the pleasure to attend the Arctic Knot – Wikipedia Language Conference last week, during which I listened to many incredibly interesting and insightful talks. This included several talks about Arctic languages and indigenous languages, digital language activism, and I participated in an Intro to Wikisource workshop led by Nicolas Vigneron; during which we proofread pages from a book in French Breton – it was a bit of a challenge! However, I am getting the hang of Wikisourcing a little more now.

I have also been playing around with the Wikidata query service; especially looking at interesting queries made by others, such as Martin Poulter. Some queries that I found really interesting were those to return items in particular galleries/libraries/museums, organisations founded by people born in Edinburgh, people who invented scientific instruments, and places of education of Members of Parliament of the United Kingdom. I really enjoyed looking at the several different ways of visualising the results of the query, such as plotting geocoordinates associated with the items on a map or making an interactive graph of connections between the items returned.

Over the past week or two I have been turning my attention to drafting content for the University’s Wikimedia website and corresponding resources (PDFs and videos) and laying the foundations for designing the workflows for library staff. This has involved looking at the website as it is currently and deciding where the content gaps are that need filled.

As I am planning content for the website, I am also investigating good examples of GLAM WikiProjects; especially those which involve working with Wikisource. Examples of best practice and advice will both help inform the resources that I am going to create, and demonstrate to people who are thinking about getting involved in Wikimedia (especially those working in Library & University Collections) that their contributions can have a positive impact; and that in the case of the Library they can use Wikimedia to significantly raise awareness and engagement with their collections both within the University, as well as nationally and globally. Institutions that I am investigating involve the Rijksmuseum, Europeana, Wellcome Library, National Library of Wales, The Smithsonian and more.

Knowledge doesn’t belong in silos. The interlinking of the Wikimedia projects exemplified through Robert Louis Stevenson. Media files on Commons, his written works on Wikisource, machine-readable linked open data on Wikidata. All linked to from Wikipedia.

Whilst planning my content and resources, I needed to think a lot about what the needs and prior knowledge of the users would be. Thinking about potential barriers that people face when contributing content to the Wikimedia projects, many difficulties stem from a lack of accessible, easy-to-use documentation, and users not being aware of how to find resources easily or get started with the various platforms. Over the next couple of weeks, I will be having conversations with some members of staff at the university to find out about their previous experiences with the Wikimedia projects, what their feelings towards them are and how they could be better supported to feel able to contribute.

I am creating resources with a Library focus, meaning that for example, my Wikisource resources will focus more on guiding users on how to upload digitised texts to Wikimedia Commons, add structured data for the texts, and then set up the text for proofreading, validating and transcluding on Wikisource. I will also be creating a guide for making an author page on Wikisource and for showing users how they can link content across the Wikimedia projects: such as adding a template to an author’s Wikipedia page that will show an associated works box, so that users who are interested in an author can quickly and easily access their works.

The goal for this week is to begin creating the content that I have planned in my draft last week. This will involve preparing scripts for how-to videos and beginning to carve out some rough drafts for supporting PDF guides.

Updates to come soon!

Erin Boyle – Wikisourceror Open Collections intern

Powered by WordPress & Theme by Anders Norén