Supporting the University of Edinburgh's commitments to digital skills, information literacy, and sharing knowledge openly

Tag: indigenous languages

Wikimedia and the Diversity of Languages online – Guest post by Clea Strathmann

Globally, over 7,000 languages are spoken – only around 4% of people are native English speakers. Despite this, English holds the title of the “Language of the internet”.  It dominates with Chinese almost 50% of global web traffic with the top ten languages accounting for 76.9 percent of global internet users. The majority of African and Indigenous languages are not recognised by Google’s search engine. 

When an English speaker searches for something on Google, a Wikipedia article typically appears as a top hit, often as a convenient infobox at the side of the browser. This is because English Wikipedia has over 6 million articles. Wikipedias in other languages are more limited – only two other Wikipedias (Cebuano and Swedish) have over 3 million articles, and the 20 largest Wikipedias have around 1 million entries each. Many of these articles are comparatively shorter than those in English Wikipedia. 

Percentage of Wikipedia articles in each language group – Western European language groups dominate Wikipedia.

This lack of diversity restricts a significant portion of the world from access to knowledge that is readily-available to English speakers, and disproportionately affects those who live in less-developed regions who may not speak any of the internet’s other dominant languages. Access to knowledge is vital for bridging the understanding between languages and cultures. 

Knowledge creates understanding – understanding is sorely lacking in today’s world. – Katherine Maher, Executive Director Wikimedia Foundation. 

The United Nations has, as part of their sustainable development goals, emphasised a need for equitable education and lifelong learning. To enable this, resources of knowledge must be available in all languages. But alongside access to knowledge, the lack of linguistic diversity is a pressing issue for smaller languages, including indigenous languages which are dying out at a rate of two languages per month. For speakers of these languages, their extinction may also reflect the extinction of their culture and identity. 

Watch Dr. Sara Thomas speak about Scots Wikipedia at the Arctic Knot.

The role of Wikimedia in improving linguistic diversity 

Wikipedia is attempting to increase global access to knowledge, and it is one of the aims of The Wikimedia Foundation to ensure that knowledge is diverse, inclusive, and accessible to all. When considering linguistic diversity, the aim is for the number of Wikipedia articles to be evenly distributed across languages. Theoretically, this could be done by simply translating articles from one language Wikipedia into another. 

However, translating Wikipedia would not be enough to create linguistic diversity. Take the Game of Thrones article on Welsh “Wicipedia”, for instance, which highlights the similarity of the fictional languages in the series to Welsh and emphasises its Welsh actors. This demonstrates the impact of culture on what is important, or not, to the readers of Wikipedia. The relationship between the language and culture is heavily-entangled, and makes it even more important that these are represented and preserved online. 

Watch the opening speeches by Aili Keskitalo, President of the Sámi Parliament of Norway and Guri Melby, Minister of Education Norway at the Arctic Knot 2021

One of the best ways that we can support linguistic diversity is through collaborative efforts with Wikipedia projects. In 2017, The University of Edinburgh and Wikimedia UK started the ‘Celtic Knot’ Wikipedia Language conference, which aims to bring together smaller language communities to collaborate on ideas for how to improve the Wikipedia content in these languages and to increase their linguistic presence across other language Wikipedias. The Celtic Knot also developed into the Arctic Knot conference, hosted by Wikimedia Norway this year, which aims to improve the visibility of indigenous arctic languages. These conferences allow speakers to address the importance of engaging with their language, and provide practical resources for encouraging contributions to Wikipedia. The Toolkit for language activism, for instance, supports the creation of digital skills and written language skills which can help people who speak minority languages to contribute to Wikipedia. Through such projects, people are encouraged to contribute to Wikipedia to improve both representation and usability of languages. 

Using Wikidata to build linguistic diversity online 

From the collaborative efforts of dedicated Wikimedians, communities are already seeing successes in increasing the presence of their languages. But for smaller languages, including many indigenous languages, writing entire Wikipedia articles is challenging and time-consuming. This is where Wikipedia’s sister project – Wikidata – has proven to be an important contributor to improving language diversity online.

This chart, made using Wikidata, shows the amount of Wikipedia articles about Greek citizens that are available on English Wikipedia but not on Greek Wikipedia. The majority are sports players, but it also includes a number of artists and academics.

Wikidata is a free and open knowledge base of machine-readable facts. Each data item has  a unique identifier (a ‘Q’ number). The label, description and all of the statements within each data item can be labelled in any language and, because of this, the data can be instantly transformed into any language. This means that any search can make this knowledge both discoverable and understandable in any language. Items from Wikidata are important for modern technologies such as Amazon’s Alexa and Siri, which use Wikidata’s machine-readable entries to answer questions – but, importantly, these can only provide responses in the languages it is labelled in, and the number of Wikidata language labels, beyond European languages, is scarce.

As an example, take disease and health data, which constitute vital information that needs to be easily-accessible. A search of diseases uploaded to Wikidata reveals over 13,000 diseases have been uploaded to the database, but around 5,000 of these entries are only labelled in 1 language. So whilst Wikidata is a useful tool to aid knowledge discovery, it will take the work of native language speakers from around the world to develop it into the linguistically diverse database that it has the potential to become. In growing both the number of items in Wikidata, and its language labels, technologies can become more accessible for different languages. Ultimately, this is crucial in enabling smaller languages to thrive, rather than just to survive. 

What can we do to promote linguistic diversity?

Governments have highlighted the importance of actively increasing linguistic diversity. UNESCO has produced a 10-year plan for the preservation of indigenous languages, referred to as the Decade of Indigenous Languages, which calls into action the human rights of Indigenous Peoples. A key part of the plan surrounds the use of technology to support access to Indigenous languages – this can involve the use of Wikipedia and Wikidata as impactful open platforms for building global understanding about different languages and, alongside this, different cultures. Encouraging people to contribute to Wikipedia may seem difficult, but events including the Celtic and Arctic Knot conferences, and outreach projects such as Indigenizing Wikipedia, have demonstrated how successfully Wikipedia can be used as a platform for language activism. 

By contributing to both Wikipedia and Wikidata, we can increase the use and representation of smaller languages, contributing to the preservation of the important cultures that are intertwined with them. 

Clea Strathmann, Open Data and Knowledge Equity intern

Watch the whole Arctic Knot conference on YouTube here.

Welsh Wikipedia Thinking Big – Keynote address by Jason Evans at the Celtic Knot

A state of the question – the Catalan language project – Àlex Hinojo, Executive Director, Amical Wikimedia

The Scottish Gaelic Uicipeid project – Susan Ross at the Celtic Knot

Celtic Knot – Panel discussion & closing plenary: The Politics of Language Online

Seeing the links at the ‘Celtic Knot’ – Wikipedia Language Conference

By David J. Fred [CC BY-SA 2.5 (http://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons

Tying the Celtic Knot. Pic by David J. Fred [CC BY-SA 2.5 (http://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons

The first ‘Celtic Knot’ – Wikipedia Language Conference will take place Thursday 6 July 2017 at the University of Edinburgh in collaboration with Wikimedia UK. This Wikimedia event will focus on Celtic Languages and Indigenous Languages, showcasing innovative approaches to open education, open knowledge and open data that support and grow language communities.

CC-BY-SA (Own work)

CC-BY-SA (Own work)

To assist with seeing the connections and areas of commonality between your work and the Celtic Knot conference please read the below guide to the Wikimedia projects:

The Celtic Knot conference is jointly supported by the University of Edinburgh and Wikimedia UK.

Wikimedia UK logo

Wikimedia UK logo

Wikimedia UK is the registered charity that supports and promotes Wikipedia and the other Wikimedia projects, and the volunteers who write, edit and curate the content of the projects.

Our mission is to help people and organisations create and preserve open knowledge and to provide easy access for all. We support the widest possible public access to, use of and contribution to open content of an encyclopaedic or educational nature.

  • Culture: We work closely with cultural institutions, including galleries, libraries, archives and museums (GLAMs) to help them realise the potential of openly-licensed content for public benefit.
  • Education: Wikipedia is more than a reference work. All over the world people and institutions are exploring the ways that Wikipedia can be used as a formal education tool. It belongs in education.
  • Volunteers: The Wikimedia projects are written, edited and curated by volunteers who are just like you. There are many ways to get involved – there are activities to suit the interests of everybody. You can also become a member of the charity.

Wikimedia's family of Open Knowledge projects

Wikimedia’s family of Open Knowledge projects

 

Wikimedia’s family of Open Knowledge projects include:

  • Wikipedia: the free online encyclopaedia exists in each Celtic and Indigenous language and Wikipedia’s new Content Translation tool allows articles to be translated easily between different language Wikipedias.
  • Wikimedia Commons: a media file repository making available public domain and freely-licensed educational media content (images, sound and video clips) to everyone, in their own language.
  • Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects and many other sites and services beyond. Wikidata can connect other databases and collections of information, allowing computers and software to see connections between hundreds of data sources. GLAM institutions (galleries, libraries, archives and museums) realise that their collections become more useful and reusable when they are deeply interlinked with other collections around the world. Creating open structured data for their collections increases their impact on the public.
  • WikisourceThe Free Library – is a multilingual project to create a growing free content library of OCR-ed source texts, as well as translations of source texts in any language including constitutional documents, court rulings, plays, poems, songs, novels, short stories, letters, travel writing, speeches, obituaries, news articles and more.
  • Wiktionary, a collaborative project to produce a free-content multilingual dictionary.
  • Wikibooks is a multilingual project for collaboratively writing open-content textbooks that anyone can edit including textbooks, annotated texts, instructional guides, and manuals. These materials can be used in a traditional classroom, an accredited or respected institution, a home-school environment or for self-learning.
  • Wikivoyage—a multilingual, web-based project to create a free, complete, up-to-date, and reliable worldwide travel guide.

In addition, the Wiki Education Foundation connects secondary & higher education to the publishing power of Wikipedia. Bridging Wikipedia and academia creates opportunities for any learner to contribute to, and access, open knowledge. We cultivate deeper learning for students as they expand Wikipedia articles for course assignments. We work with libraries to expand the public’s access to their resources. We support academic associations as they expand and improve Wikipedia’s coverage of their field.

If you can see a clear commonality between your work and the projects above then we welcome diverse attendees and presenters working in Celtic and Indigenous languages ranging from Wikimedians, educators, researchers, information professionals, media professionals, linguists, translators, learning technologists and more coming together to share good practice and find fruitful new collaborations to support language communities as a result of the event.

Conference Themes

  • Building language confidence: participation, public engagement & social equality.
  • Putting our language on the map: preserving & opening up our cultural heritage.
  • Languages on the road to open: ongoing or new projects and initiatives in open knowledge, open education and open data.
  • The politics of language: Local, national, and international policy and practice; advocacy for funding, institutional and community support and investment
  • Hacking; making; sharing

The offical call for session proposals has now closed but email ewan.mcandrew@ed.ac.uk if you would like to attend or have a session you would like to showcase.

NB: Abstracts have now been reviewed as of April 2017 and notifications sent out to speakers.

Powered by WordPress & Theme by Anders Norén