Supporting the University of Edinburgh's commitments to digital skills, information literacy, and sharing knowledge openly

Tag: minority languages

Celtic Knot swag

Celtic Knot Wikipedia Language Conference – “Strength in Unity”

“We are not minority languages, we are minoritised. And we are the global majority” – Tura Arutura, Social Justice activist, creative artist and dancer.

At the end of September, I had the great good fortune to be invited to the Celtic Knot conference in Waterford, Ireland hosted by Wikimedia Ireland and Wikimedia UK. This conference focuses on the minority language Wikipedias (not all of the 345 language Wikipedias are as well supported or well developed as English Wikipedia, see the list of Wikipedias here) and allows a Venn diagram of participants from all kinds of backgrounds, ages and experiences to come together as a community of ‘language activists’ to showcase, discuss and advocate for how best to support minoritised languages around the world.

We held the first ever Celtic Knot conference at the University of Edinburgh back in July 2017 as a way to demonstrate our support for the Scots Gaelic Wikipedia residency at the National Library of Scotland (watch the video presentation here) and to see where we could add some significant value by helping shine a light on some incredibly worthwhile language projects that could do with the space and time to outline the particular challenges (and opportunities) that regional and minority languages face whether technical, socio-economic or political. Initially, the conference focused on bringing the Celtic languages together (Scots Gaelic, Irish Gaelic, Welsh, Breton, Cornish, Manx) to help form a strong bond or ‘Celtic knot’ through working together and sharing experiences but we quickly realised that there was much to be gained from expanding to include Basque, Catalan, Saami, the Romance languages and more. I hosted the first event at the University as a one day experiment with 50-60 attendees to see if there was value to such knowledge and cultural exchange and I was ecstatic to see in Waterford that the need and desire for the conference had not diminished. Indeed, despite the upheaval of the past few years from Brexit, Covid, the Ukraine War, war in the Middle East, the cost of living crisis and more since I last was able to attend the conference in 2018,  the Celtic Knot under the auspices of Wikimedia UK had expanded to a three day event and included more minoritised languages from around the world than ever before including Dagbani, Indonesian, Amazigh and Tashelhit from Morocco and many more who had wanted to attend & present but were unfortunately denied visas owing to some bureaucratic red tape. It is heartening to see that the ‘strength in unity’ between the Celtic language participants and our original conference participants was still there and stronger than ever and that there were welcoming arms extended both by the conference organisers, and importantly, by the Wikimedia Foundation to exploring a larger more inclusive conference to support minoritised languages across the globe. It was also heartening to see attendees from the Wikimedia Research team attend and present on efforts to make the process of creating a new language Wikipedia much easier to move from incubation to graduation in much less than the c. 9-18 years it has historically taken, until now.

It was fitting also that the conference was held in Waterford, Ireland’s oldest city, and a place that was described to me as somewhere that had perhaps lost its way and/or fallen upon hard times in the latter part of the 20th century/early 21st as a rather depressed port area ignored by industry, retail and tourism and needing some love and support. But also now in recent times that its city officials had successfully rebranded and rejuvenated the city through embracing its rich Viking and medieval history and Waterford’s treasures. It also was not lost on me that Scotland-based Irish artist, Aoife Cawley, had created a special linoprint design depicting the marriage of Aoife MacMurrough (c. 1145 – 1188), a Princess of Leinster, being forced (against Irish law and tradition) in marriage to the English lord ‘Strongbow’, earl of Pembroke, in  Christchurch Cathedral in Waterford as part of a pact between Strongbow (also known as Richard fitz Gilbert and Richard de Clare) and the King of Leinster, Dermot MacMurrough (c. 1110 – 1171), to help him reclaim his lands. This marriage on 25 August 1170 marked the first significant arrival of the English people (and the English language) becoming involved in Irish politics, history and culture with all that has ensued since.

Jason Evans, Wikimedian and Open Data Manager at the National Library of Wales on using AI summaries to help write Welsh Wicipedia articles, CC-BY-SA by Ewan McAndrew

Jason Evans, Wikimedian and Open Data Manager at the National Library of Wales on using AI summaries to help write Welsh Wicipedia articles, CC-BY-SA by Ewan McAndrew

Jason Evans, National Library of Wales Wikimedian and Open Data Manager was the conference’s opening keynote address and expounded on generating Welsh Wicipedia articles using AI generated summaries (checked by two humans for grammar and factual accuracy) to help create more knowledge shared in the Welsh language online. He outlined his work in public outreach at the National Library of Wales, and work with schools and universities in particular where he found translation tasks were exceedingly popular with students – they felt very motivated to share knowledge and address knowledge gaps online. Maristella Gatto further reinforced the motivation of students for translation work in a presentation sharing details on a University of Bari translation project where students chose their words carefully when translating articles about Irish historical events, such as Bloody Sunday, into Italian by using computational analysis of the vocabulary. They implicitly realised AI tools make use of Wikipedia so this can replicate problems in representation of topics if language and vocabulary used in articles was not chosen correctly. Words have meaning and they matter. Representation matters.

“Aithníonn ciaróg ciaróg eile” translates as “One beetle recognises another”.

This Irish saying (above) is a nod to the notion of comradeship, community and solidarity between people(s). I believe this is certainly true of conference participants who recognised, despite their different languages, that there was true commonality in their shared language activism. Activism that could sometimes lead to becoming political prisoners in the case of Martial Menard, namechecked in the talk by Dr. Tristan Loarer, Opening Sources in the Breton language: Offering the ‘Minoritised’ Language to the Majority”.

We must take what we are entitled to, not hold out our hands.” – Breton activist and political prisoner Martial Menard (1951-2016)

Dr. Tristan Loarer at the Celtic Knot, CC-BY-SA by Ewan McAndrew

Dr. Tristan Loarer at the Celtic Knot, CC-BY-SA by Ewan McAndrew

Loarer discussed the availability (or lack thereof) of pragmatic tools for the Breton language and the need for feeding the A.I. ‘beast’ with quality assured Breton text whether from Breton transcriptions in Wikisource, the free and open wiki hyper library, or from the creation of the new DEVRI tool, offering free access to a dichroic dictionary of the Breton language.

Two particularly affecting sessions, for me, were on the Irish language. Nóirín Ní Bhraoin, a psychologist from Dublin, noted that when she walked the streets of Dublin she hardly ever heard Gaeilge, which she thought was astounding for the Republic of Ireland’s capital city. She wanted to see if the problem was down to “one Irish speaker not being able to recognise another” so wore a badge that said “Speak Irish to me” and invited shop staff at ten Dublin shops to wear these badges and record how often customers spoke to them in Irish each day. The results showed that on average 3.6 people spoke Gaeilge to the staff each day across the ten stores. This encouraged Nóirín Ní Bhraoin to work with a developer to create a mobile app called “Gaelgoer” (Gael as in Irish Gaeilge speaker and ‘go-er’ as in the English for someone to get up and go!) which would allow app users to view (1) upcoming Irish events happening near them or all around the world (2) businesses that had speakers happy to speak Irish to you, and (3) even geolocate Irish speakers on the map so you could start an online/sms chat with them, if both were happy to do so. NB: an extra ‘Tinder’ style dating function was considered and requested by surveyed Irish speakers but Nóirín Ní Bhraoin and her developer shelved that idea for now.

Nóirín Ní Bhraoin (GaelGoer app), CC-BY-SA by Ewan McAndrew

Nóirín Ní Bhraoin (GaelGoer app), CC-BY-SA by Ewan McAndrew

This project underscores a powerful truth: knowledge belongs to everyone” Joe Kelly, Mayor of Waterford, speaking on the Wiki Women Erasmus+ Project.

The key event of the Conference was the Wiki Women Erasmus+ panel introduced by the Mayor of Waterford, Joe Kelly, who spoke of how genuinely impressed he had been by the initiative and the potential it had for expansion. He was followed by four impressive high school Irish students who took turns to present (both in Irish and with an English translation) on their experiences on the Wiki Women Erasmus project where this EU funded scheme allowed the students to attend the Basque country as part of a cross cultural language exchange with Basque and Friesland students and teachers with the ultimate goal to highlight the gender gap in content online and empower students in minority language communities (Gaeltacht regions, Basque, Friesland) to write Wikipedia articles about underrepresented women in their languages. Another goal of the project has been to produce a ‘teacher’s toolkit’ that could be translated and used in any language to support further work in other regional and minoritised languages.

Mayor of Waterford, Joe Kelly, introducing the Wiki Women Erasmus+ project, CC-BY-SA by Ewan McAndrew

Mayor of Waterford, Joe Kelly, introducing the Wiki Women Erasmus+ project, CC-BY-SA by Ewan McAndrew

“While working on this project, we also learned a lot about the history of women from our own country […] by the end we had a wealth of information […] we improved lots of skills during this project” – a student who participated in the Wiki Women Erasmus+ Project

Keynote speaker, and Irish Gaeilge Wikipedia editor, Dr. Kevin Scannell is a leading mind in tech for under-resourced languages and has revolutionised how Gaelic languages interact with modern tech. Scannell outlined some of the very real problems in the use of AI and the difference in distribution of knowledge (and power) between hegemonic languages like English and minoritised languages like Irish. If every word in Irish was committed to paper or computer and fed into a large language model, this would equate to 1 billion words or less. This equates to a knowledgebase 30,000 times smaller than Llama 3.1 LLM. Further, Irish data included in standard LLMs is of low quality with Wikipedias used as standard to train LLMS but minoritised language Wikipedias varying wildly in quality and other sources, such as CommonCrawl, heavily polluted with machine translation. The problem, Scannell asserted, was that big tech companies with non Irish-speaking researchers don’t care about the training data being ‘garbage in’ and thereby don’t care that this produces ‘garbage out’ so Scannell has started an Irish language corpus building project called Fiontar at Dublin City University where the 150 million words in it are being quality assured.

Further talks by Dresden University student researchers, Hannah Yule Heetmann and Joanna Dieckmann, on Unpacking Power Dynamics in Language Policy showed again how words and intentions matter through the analysis they had conducted of the language used in Irish Government’s 20-Year Strategy for the Irish language. Their fascinating findings highlighted how the words “going to” were entirely absent from the policy document, that timescales were almost never included, and that there was also a lack of specific actions and specific labelling of which government or non-government actors were actually to undertake those actions. They concluded with a series of recommendations to combat this for use in future policy documents so that any future Irish language strategy is truly fit for purpose, actionable, accountable and with specific tasks and timescales detailed.

When a language stops having the vocabulary to be able to speak about modern politics, socio-economics and technologies that affect and influence our daily lives then that language ceases to be useful and risks dying out so watching talks showing a range of initiatives, open education resources & toolkits, new ways of thinking about language activism (combining your passions to write about forensic science in Scots Gaelic for instance) and even ensuring that the word for a Wikipedia ‘edit-a-thon’ is now in Irish Gaeilge, gave me great hope that breathing new life into languages is possible and that new safe, open spaces (following the demise of Twitter) can be made to work to support language communities.

This pragmatic and inspiring ‘can do’ spirit, and the strength of feeling behind it coupled with the sheer pride being taken in every speaker’s linguistic heritage and its potential for the future in a global digital world, was the thing that impressed me most during the conference. The recognition that government policies can be advocated for and shaped, and that A.I. and other digital tools and initiatives can be harnessed and made to work to help and massively support languages, cultures, and histories being shared for the betterment of knowledge & cultural exchange and understanding across the world. As Nóirín Ní Bhraoin concluded (and I’m paraphrasing here) it’s about caring, and getting up off your backside to actual do something if you do care about your language, to say “Here we are”.

And if I may add, in a nod to the future of the Celtic Knot, “and here we remain.”

Onwards and upwards… and outwards! And here’s to a bigger, more inclusive Wikipedia language conference next time!

Thanks and Sláinte to Amy and Sophie, our wonderful Wikimedia Ireland hosts, and conference co-organisers, Lea, Richard and Daria, from Wikimedia UK. Thanks also to Tura for a wonderful display of traditional Irish dance. 

Wikimedia and the Diversity of Languages online – Guest post by Clea Strathmann

Globally, over 7,000 languages are spoken – only around 4% of people are native English speakers. Despite this, English holds the title of the “Language of the internet”.  It dominates with Chinese almost 50% of global web traffic with the top ten languages accounting for 76.9 percent of global internet users. The majority of African and Indigenous languages are not recognised by Google’s search engine. 

When an English speaker searches for something on Google, a Wikipedia article typically appears as a top hit, often as a convenient infobox at the side of the browser. This is because English Wikipedia has over 6 million articles. Wikipedias in other languages are more limited – only two other Wikipedias (Cebuano and Swedish) have over 3 million articles, and the 20 largest Wikipedias have around 1 million entries each. Many of these articles are comparatively shorter than those in English Wikipedia. 

Percentage of Wikipedia articles in each language group – Western European language groups dominate Wikipedia.

This lack of diversity restricts a significant portion of the world from access to knowledge that is readily-available to English speakers, and disproportionately affects those who live in less-developed regions who may not speak any of the internet’s other dominant languages. Access to knowledge is vital for bridging the understanding between languages and cultures. 

Knowledge creates understanding – understanding is sorely lacking in today’s world. – Katherine Maher, Executive Director Wikimedia Foundation. 

The United Nations has, as part of their sustainable development goals, emphasised a need for equitable education and lifelong learning. To enable this, resources of knowledge must be available in all languages. But alongside access to knowledge, the lack of linguistic diversity is a pressing issue for smaller languages, including indigenous languages which are dying out at a rate of two languages per month. For speakers of these languages, their extinction may also reflect the extinction of their culture and identity. 

Watch Dr. Sara Thomas speak about Scots Wikipedia at the Arctic Knot.

The role of Wikimedia in improving linguistic diversity 

Wikipedia is attempting to increase global access to knowledge, and it is one of the aims of The Wikimedia Foundation to ensure that knowledge is diverse, inclusive, and accessible to all. When considering linguistic diversity, the aim is for the number of Wikipedia articles to be evenly distributed across languages. Theoretically, this could be done by simply translating articles from one language Wikipedia into another. 

However, translating Wikipedia would not be enough to create linguistic diversity. Take the Game of Thrones article on Welsh “Wicipedia”, for instance, which highlights the similarity of the fictional languages in the series to Welsh and emphasises its Welsh actors. This demonstrates the impact of culture on what is important, or not, to the readers of Wikipedia. The relationship between the language and culture is heavily-entangled, and makes it even more important that these are represented and preserved online. 

Watch the opening speeches by Aili Keskitalo, President of the Sámi Parliament of Norway and Guri Melby, Minister of Education Norway at the Arctic Knot 2021

One of the best ways that we can support linguistic diversity is through collaborative efforts with Wikipedia projects. In 2017, The University of Edinburgh and Wikimedia UK started the ‘Celtic Knot’ Wikipedia Language conference, which aims to bring together smaller language communities to collaborate on ideas for how to improve the Wikipedia content in these languages and to increase their linguistic presence across other language Wikipedias. The Celtic Knot also developed into the Arctic Knot conference, hosted by Wikimedia Norway this year, which aims to improve the visibility of indigenous arctic languages. These conferences allow speakers to address the importance of engaging with their language, and provide practical resources for encouraging contributions to Wikipedia. The Toolkit for language activism, for instance, supports the creation of digital skills and written language skills which can help people who speak minority languages to contribute to Wikipedia. Through such projects, people are encouraged to contribute to Wikipedia to improve both representation and usability of languages. 

Using Wikidata to build linguistic diversity online 

From the collaborative efforts of dedicated Wikimedians, communities are already seeing successes in increasing the presence of their languages. But for smaller languages, including many indigenous languages, writing entire Wikipedia articles is challenging and time-consuming. This is where Wikipedia’s sister project – Wikidata – has proven to be an important contributor to improving language diversity online.

This chart, made using Wikidata, shows the amount of Wikipedia articles about Greek citizens that are available on English Wikipedia but not on Greek Wikipedia. The majority are sports players, but it also includes a number of artists and academics.

Wikidata is a free and open knowledge base of machine-readable facts. Each data item has  a unique identifier (a ‘Q’ number). The label, description and all of the statements within each data item can be labelled in any language and, because of this, the data can be instantly transformed into any language. This means that any search can make this knowledge both discoverable and understandable in any language. Items from Wikidata are important for modern technologies such as Amazon’s Alexa and Siri, which use Wikidata’s machine-readable entries to answer questions – but, importantly, these can only provide responses in the languages it is labelled in, and the number of Wikidata language labels, beyond European languages, is scarce.

As an example, take disease and health data, which constitute vital information that needs to be easily-accessible. A search of diseases uploaded to Wikidata reveals over 13,000 diseases have been uploaded to the database, but around 5,000 of these entries are only labelled in 1 language. So whilst Wikidata is a useful tool to aid knowledge discovery, it will take the work of native language speakers from around the world to develop it into the linguistically diverse database that it has the potential to become. In growing both the number of items in Wikidata, and its language labels, technologies can become more accessible for different languages. Ultimately, this is crucial in enabling smaller languages to thrive, rather than just to survive. 

What can we do to promote linguistic diversity?

Governments have highlighted the importance of actively increasing linguistic diversity. UNESCO has produced a 10-year plan for the preservation of indigenous languages, referred to as the Decade of Indigenous Languages, which calls into action the human rights of Indigenous Peoples. A key part of the plan surrounds the use of technology to support access to Indigenous languages – this can involve the use of Wikipedia and Wikidata as impactful open platforms for building global understanding about different languages and, alongside this, different cultures. Encouraging people to contribute to Wikipedia may seem difficult, but events including the Celtic and Arctic Knot conferences, and outreach projects such as Indigenizing Wikipedia, have demonstrated how successfully Wikipedia can be used as a platform for language activism. 

By contributing to both Wikipedia and Wikidata, we can increase the use and representation of smaller languages, contributing to the preservation of the important cultures that are intertwined with them. 

Clea Strathmann, Open Data and Knowledge Equity intern

Watch the whole Arctic Knot conference on YouTube here.

Welsh Wikipedia Thinking Big – Keynote address by Jason Evans at the Celtic Knot

A state of the question – the Catalan language project – Àlex Hinojo, Executive Director, Amical Wikimedia

The Scottish Gaelic Uicipeid project – Susan Ross at the Celtic Knot

Celtic Knot – Panel discussion & closing plenary: The Politics of Language Online

Powered by WordPress & Theme by Anders Norén