#MultilingualDH

2 weeks ago

I am absolutely losing it as I fill out the form to get this added to our library catalogue, and my closest options for topic from their controlled vocabulary are "American Literary Studies" and "British and Commonwealth Literary Studies". #MultilingualDH

1 0 0 0

From Global to Local? | Melusina Press ![reviewed](https://www.melusinapress.lu/static/review_badges/peer_board_full_manuscript_pre_publication_unblinded.svg) <br/> <br/> Der Sammelband “From Global to Local? Digitale Methoden in den Geist...

3 weeks ago

Nächste Woche ist #DHd2026 in Wien und die #AGMultilingualDH hat neue #Sticker im Gepäck. Wer welche haben möchte muss nur @jomla , @cosima_wagner , @jakmende oder mich ansprechen!

#multlingualDH

2 1 0 0

Markus Trapp

@textundblog.bsky.social

3 weeks ago

From Global to Local?
Digitale Methoden in den Geisteswissenschaften im deutschsprachigen Raum: Ein Triptychon
by
Ulrike Wuttke
Christopher Nunn
Christian Schröter (geb. Vater)
Melanie Seltmann
Christian Wachter

doi.org/10.26298/198...

#Mehrsprachigkeit #multilingualDH #DigitalHumanities #OA

5 3 0 0

Frederik Elwert

@felwert.fedihum.org.ap.brid.gy

3 weeks ago

Original post on fedihum.org

RE: https://fedihum.org/@melusinapress/116090883119296136

Es freut mich sehr, dass mit diesem Band nun auch unser gemeinsamer Beitrag „Digitale Glokalisierung: Sogenannte Kleine Fächer, Digital Humanities und die Herausforderungen der Mehrsprachigkeit“ erschienen ist (der selbst ein kleiner […]

2 2 0 0

@literaturegeek.bsky.social

1 month ago

An excellent thread if you're ready to break out of the monolingual world of English. You can also learn some great tricks we know in area studies, like how to get something out of a talk that's about something in a language you don't speak. Come join us! #MultilingualDH

21 4 0 0

Programming Historian

@proghist.bsky.social

1 month ago

Infographic reading: Have you read and valued a Programming Historian lesson? Ever thought about translating one?, https://tinyurl.com/open-call-2025-blog, digital methods for the humanities, programminghistorian.org

🗣️ Have you read and valued a Programming Historian lesson?
💡 Ever thought about translating one?

Our English journal welcomes proposals for new translations from our ES, FR and PT editions.

🔗 tinyurl.com/open-call-2025-blog

Send us your proposal by 15 February 2026

#CallForPapers #MultilingualDH

2 0 0 0

Amanda Wyatt! Visconti

1 month ago

Whoa, that's amazing! I'll need to read more, and look into the couple current folks I know who've designed and/or sell new Hebrew wood type. (Am remembering how I've enjoyed your livetweeting from various #multilingualDH events, thank you for that!)

1 0 1 0

2 months ago

A half wheel of camembert cheese with the other half being Bert's head. The bite out of the cheese parallels Bert's mouth

French-language CamemBERT (and its logo) is one of my favorite whimsical bits of naming in this space. #MultilingualDH

5 1 1 1

2 months ago

Generative AI + Perso-Arabic Calligraphy = ? YouTube video by SILICON

If books includes Perso-Arabic manuscripts, I've got a great talk for you from 2023! #MultilingualDH

3 0 1 0

2 months ago

@felwert Dazu haben Nathan Gibson und Ronny Vollant ein zwanzigjähriges Akademieprojekt eingeworben, das große #DigitalHumanities und vor allem #multilingualDH Anteile hat

1 0 1 0

2 months ago

Great #MultilingualDH food for thought, reflecting on global Englishes, the afterlives of British colonial education, and how that intersects with what writing is judged to be "human".

13 6 0 0

Programming Historian

@proghist.bsky.social

3 months ago

Infographic reading: Searching for an opportunity to hone your technical translation skills?, tinyurl.com/open-call-2025-blog, digital methods for the humanities, programminghistorian.org

🔎 Searching for an opportunity to hone your technical translation skills?

Our English journal welcomes proposals for new translations from our ES, FR and PT editions.

🔗 tinyurl.com/open-call-20...

📩 Send us your proposal by 15 February 2026

#CallForPapers #MultilingualDH

4 0 0 0

3 months ago

Today, I am getting a lot of "503 Service Unavailable" from @internetarchive while linking items to #Wikidata. Probably a good moment to think about #CriticalInfrastructure for #CulturalHeritage.

#DigitalHumanities #multilingualDH

0 2 0 0

Original post on digitalcourage.social

4 months ago

I also wrote a #SPARQL quer to see the linguistic composition of the periodical press until 1930 at all locations with titles published in languages of the Eastern Mediterranean: #Arabic, #Ottoman, #Armenian, #Coptic, #Greek, #Farsi, #Ladino, #Azerbaijani […]

1 0 0 0

5 months ago

It's looking like I've got several projects that need language detection as part of the workflow. It's been a few years since I've used that and I assume there's been some (possibly vast?) improvements. Anyone have a favorite library / model / etc they'd recommend? #MultilingualDH

5 6 1 0

5 months ago

TFW a silly idea that you jotted down on the internet turns into something delightful through someone else's imagination. 🥰 #MultilingualDH

1 0 0 0

5 months ago

Filing this good news away for the next #MultilingualDH class. 🥳

2 0 0 0

@felwert.fedihum.org.ap.brid.gy

5 months ago

Leaving aside the nature and details of the lawsuit, I'm filing this one away as another example of real-world effects of people's attitudes about English vs. every other language for next time I teach #MultilingualDH.

5 0 0 0

Frederik Elwert

5 months ago

Original post on fedihum.org

Bin sehr traurig, nicht auf der #FORGE25 zu sein, aber froh, das Philipp Tögel unseren @sfb1475 würdig vertritt. Wer direkt mal in unseren Ansatz für die TEI-Modellierung heterogener multilingualer Textkorpora reinschnuppern will: https://doi.org/10.5281/zenodo.17178219 #multilingualDH […]

0 4 0 0

5 months ago

and there are certainly more, just search on huggingface...

You can find models like https://huggingface.co/Davlan/afro-xlmr-large-114L or even Apertus that boasts about "1811 natively supported languages" https://huggingface.co/swiss-ai/Apertus-70B-2509 ...

Some remarks and outlook:

- only […]

0 0 0 0

5 months ago

Shared Tasks at https://semeval.github.io/

- SemEval 2023 Task 12: AfriSenti https://afrisenti-semeval.github.io/
- SemEval 2024 Task 1: SemRel https://semantic-textual-relatedness.github.io/
- SemEval 2025 Task 11: Bridging the Gap github.com/emotion-analysis-project... […]

0 0 1 0

The State of Large Language Models for African Languages: Progress and Challenges Large Language Models (LLMs) are transforming Natural Language Processing (NLP), but their benefits are largely absent for Africa's 2,000 low-resource languages. This paper comparatively analyzes African language coverage across six LLMs, eight Small Language Models (SLMs), and six Specialized SLMs (SSLMs). The evaluation covers language coverage, training sets, technical limitations, script problems, and language modelling roadmaps. The work identifies 42 supported African languages and 23 available public data sets, and it shows a big gap where four languages (Amharic, Swahili, Afrikaans, and Malagasy) are always treated while there is over 98\% of unsupported African languages. Moreover, the review shows that just Latin, Arabic, and Ge'ez scripts are identified while 20 active scripts are neglected. Some of the primary challenges are lack of data, tokenization biases, computational costs being very high, and evaluation issues. These issues demand language standardization, corpus development by the community, and effective adaptation methods for African languages.

5 months ago

Hussen et al. (2025): The State of Large Language Models for African Languages: Progress and Challenges. https://doi.org/10.48550/arXiv.2506.02280

8/x

#MultilingualDH

0 0 1 0

AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text Language models built from various sources are the foundation of today's NLP progress. However, for many low-resource languages, the diversity of domains is often limited, more biased to a religious domain, which impacts their performance when evaluated on distant and rapidly evolving domains such as social media. Domain adaptive pre-training (DAPT) and task-adaptive pre-training (TAPT) are popular techniques to reduce this bias through continual pre-training for BERT-based models, but they have not been explored for African multilingual encoders. In this paper, we explore DAPT and TAPT continual pre-training approaches for African languages social media domain. We introduce AfriSocial, a large-scale social media and news domain corpus for continual pre-training on several African languages. Leveraging AfriSocial, we show that DAPT consistently improves performance (from 1% to 30% F1 score) on three subjective tasks: sentiment analysis, multi-label emotion, and hate speech classification, covering 19 languages. Similarly, leveraging TAPT on the data from one task enhances performance on other related tasks. For example, training with unlabeled sentiment data (source) for a fine-grained emotion classification task (target) improves the baseline results by an F1 score ranging from 0.55% to 15.11%. Combining these two methods (i.e. DAPT + TAPT) further improves the overall performance. The data and model resources are available at HuggingFace.

5 months ago

Belay et al. (2025): Afro-XLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text. https://doi.org/10.48550/arXiv.2503.18247

- Model: https://huggingface.co/Tadesse/AfroXLMR-Social
- Dataset: https://huggingface.co/datasets/Tadesse/AfriSocial

7/x

#MultilingualDH

0 0 1 0

5 months ago

Belay et al. (2025): AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text. https://doi.org/10.48550/arXiv.2503.18247
- covering 19 languages
- Model https://huggingface.co/Tadesse/AfroXLMR-Social
- AfriSocial dataset […]

0 1 0 0

5 months ago

Yimam et al. (2021): Introducing various Semantic Models for Amharic: Experimentation and Evaluation with multiple Tasks and Datasets. https://doi.org/10.3390/fi13110275

6/x

#MultilingualDH

0 0 1 0

A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez Script Homophone normalization, where characters that have the same sound in a writing script are mapped to one character, is a pre-processing step applied in Amharic Natural Language Processing (NLP) literature. While this may improve performance reported by automatic metrics, it also results in models that are not able to understand different forms of writing in a single language. Further, there might be impacts in transfer learning, where models trained on normalized data do not generalize well to other languages. In this paper, we experiment with monolingual training and cross-lingual transfer to understand the impacts of normalization on languages that use the Ge'ez script. We then propose a post-inference intervention in which normalization is applied to model predictions instead of training data. With our simple scheme of post-inference normalization, we show that we can achieve an increase in BLEU score of up to 1.03 while preserving language features in training. Our work contributes to the broader discussion on technology-facilitated language change and calls for more language-aware interventions.

5 months ago

Nigatu et al. (2025): A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez Script. https://doi.org/10.48550/arXiv.2507.15142

5/x

#MultilingualDH

0 0 1 0

5 months ago