Thibault Clérice's Avatar

Thibault Clérice

@ponteineptique

Digital humanists, loves python, making data, talking to data, reusing data. Researcher @ ALMAnaCh, Inria Paris.

889
Followers
248
Following
256
Posts
17.10.2023
Joined
Posts Following

Latest posts by Thibault Clérice @ponteineptique

Poster advertising lectures on "Raisonnement Philologique et Modèles Informatiques" stating at 4pm, Thursday, March 12, at 54 Boulevard Raspail, Paris.

Poster advertising lectures on "Raisonnement Philologique et Modèles Informatiques" stating at 4pm, Thursday, March 12, at 54 Boulevard Raspail, Paris.

Paris friends! Amis parisiens ! This Thursday is the first of four public lectures I'm giving on AI and philology, broadly defined: "Philological Reasoning and Computational Models." The advertisement is in French, but the lectures are in English. I'd also love to meet while I'm here in March! 1/

09.03.2026 08:34 👍 21 🔁 15 💬 1 📌 1

@transkribus.bsky.social Is there a public API to access our documents and documents export, to streamline their publication ? I am not able to find any information on such an option that is not > 2 or 3 years.

06.03.2026 14:30 👍 0 🔁 0 💬 0 📌 0

Excellent venue for computational humanities work, colocated with ACL in San Diego on July 6. Please share!

04.03.2026 20:13 👍 15 🔁 8 💬 0 📌 0
Ingénieur/Ingénieure d'études chargé(e) de l'édition de corpus numériques (Biblissima+) - H/F - Ecole Normale Supérieure de Lyon Le pôle Document numérique (PDN) de la MRSH (Université de Caen Normandie – CNRS) est une unité de soutien à la recherche. Il apporte des solutions de...

📣 Dans le cadre de l'EquipEx+ Biblissima+, l' @ensdelyon.bsky.social recrute un.e ingénieur.e d'études chargé.e de l'édition de corpus numériques.

🗓️ Postulez à cette offre avant le 1er avril 2026 👇️
ens-lyon.softy.pro/offers/196330

@irht-cnrs.bsky.social
@mrshcaen.bsky.social

04.03.2026 10:07 👍 1 🔁 3 💬 0 📌 0
Original post on fedihum.org

We’re looking for a Research Data Engineer (m/f/x) (3,5 years). If you have a #DigitalHumanities profile with experience in #TEI encoding, #OCR / #HTR, and #IIIF (or any of those and are willing to learn the rest), get in touch! The full time position can be split, so if (for whatever reason) […]

03.03.2026 16:18 👍 2 🔁 19 💬 1 📌 0
Preview
Workshop (Call for proposals) – Analysing Cultural Heritage Documents: HTR/OCR, Information Extraction, and Textual Variation Workshop (Call for proposals) – Analysing Cultural Heritage Documents: HTR/OCR, Information Extraction, and Textual Variation  AI for Cultural Heritage Documents: HTR/OCR, Information Extraction, and ...

Workshop (Call for proposals) – Analysing Cultural Heritage Documents: HTR/OCR, Information Extraction, and Textual Variation prima.hypotheses.org/3737

02.03.2026 15:06 👍 4 🔁 3 💬 0 📌 0

We have this experience as well, a paper is coming up later this week with details about a specific domain...

23.02.2026 14:09 👍 3 🔁 1 💬 2 📌 0

Hold my non alcoholic beverage 😁
It's not on my to do list but if a team wants to look at it with us...

21.02.2026 17:11 👍 3 🔁 0 💬 0 📌 1

Hotel de la région, région qui penche bien a droite depuis des années, pas étonnant non ?
Dégoûtant, mais pas étonnant

21.02.2026 14:36 👍 1 🔁 0 💬 0 📌 0
Extrême droite à Lyon — Wikipédia

Y a une page wikipédia extrême droite à Lyon. On en est là fr.wikipedia.org/wiki/Extr%C3...

21.02.2026 14:13 👍 0 🔁 0 💬 1 📌 0
Post image
21.02.2026 14:12 👍 0 🔁 0 💬 0 📌 0

La région Rhône Alpes quoi...

21.02.2026 14:08 👍 0 🔁 0 💬 1 📌 0

Setting aside the original issue, on which we agree 😉, you are right. This particular manuscript and another have no open data in this paper apparently...

19.02.2026 18:43 👍 1 🔁 0 💬 0 📌 0

Or, an error in the paper presentation (not unseen in the humanities ) where they make us expect consecutive line by design and this is not. I would rather expect this than bad curation given that they actually spend time looking at errors.

19.02.2026 15:40 👍 0 🔁 0 💬 1 📌 0

Thank you for the catch ! Should have been more careful here...

19.02.2026 07:11 👍 0 🔁 0 💬 1 📌 0
Preview
Manuscrits de la Médiathèque du Grand Troyes. Manuscrits issus de la bibliothèque de Clairvaux. Ms. 1600 Manuscrits de la Médiathèque du Grand Troyes. Manuscrits issus de la bibliothèque de Clairvaux. Ms. 1600 -- -- manuscrits

You are completely right. I was so baffled by the dya9 bad interpretation that I just did not check the rest...
The original paper by Aguilar showed stitched lines most of the time and trusted it...
Paper(P.17): hal.science/hal-04716654/
Manuscript: gallica.bnf.fr/ark:/12148/b...

19.02.2026 06:41 👍 2 🔁 0 💬 1 📌 0
Digital Classicist London 2026 call for papers - The Stoa: a Review for Digital Classics The Digital Classicist London seminar invites proposals for the Summer 2026 series. We are looking for papers on any aspect of the ancient or pre-colonial worlds, including archaeology, cultural herit...

Digital Classicist London 2026 Call for Papers #cfp

blog.stoa.org/archives/4370

18.02.2026 16:22 👍 3 🔁 5 💬 0 📌 0

And for those who want the answer:

❌ dyaconus -> diabolus
❌ in futurum -> infusum

18.02.2026 09:18 👍 1 🔁 0 💬 1 📌 0

There are plans :)

17.02.2026 19:36 👍 0 🔁 0 💬 0 📌 0

We also show that we are far from done, specifically for a complicated language like Old French.

But we
(1) defined the issue,
(2) propose a first solution that enables pre-annotation of larger dataset and
(3) offer an alternative to less trustable models that go beyond ATR.

17.02.2026 18:11 👍 3 🔁 1 💬 0 📌 0
Preview
Pre Editorial Normalization - a Hugging Face Space by comma-project Latin and Old French normalization of CATMuS output

We release:

📚 4.66M silver training samples
🧪 1.8k gold evaluation set huggingface.co/datasets/com...
🤖 ByT5-based model → 6.7% CER huggingface.co/comma-projec...

Try it here 👇
huggingface.co/spaces/comma...

17.02.2026 18:11 👍 4 🔁 1 💬 1 📌 0

👉 We propose Pre-Editorial Normalization (PEN):

An intermediate layer between:
📝 graphemic ATR output
📖 fully edited text

Goal: preserve palaeographic fidelity + enable usability.
Keep two layer, ATR output and normalization, with aligned token to go back to the source.

17.02.2026 18:11 👍 2 🔁 1 💬 1 📌 0

Recent ATR progress—especially with palaeographic datasets like CATMuS—has improved access to medieval sources.

But:
❌ Raw outputs are hard to use
❌ Fully normalized models over-normalize & hallucinate

There’s a methodological gap.

17.02.2026 18:11 👍 2 🔁 1 💬 1 📌 0

If I give you the text
📚 omnium peccatorum quia ex quo dyaconus quando esset in futurum, stultus esset

Can you find the ATR error without the manuscript ?

Probably not.

ATR models that predict text and normalize in one go generate trustable text, but prevent detecting issues.

17.02.2026 18:11 👍 1 🔁 1 💬 2 📌 0
Post image

📄 New paper:
Pre-Editorial Normalization for Automatically Transcribed Medieval Manuscripts in Old French and Latin

Thibault Clérice, @rachelbawden.bsky.social , Anthony Glaise, Ariane Pinche, @dasmiq.bsky.social (2026) arxiv.org/abs/2602.13905

We introduce Pre-Editorial Normalization (PEN).

🧵⬇️

17.02.2026 18:11 👍 23 🔁 9 💬 1 📌 2
Preview
Pre Editorial Normalization - a Hugging Face Space by comma-project Latin and Old French normalization of CATMuS output

I got something up, thank you :) huggingface.co/spaces/comma...

17.02.2026 12:12 👍 2 🔁 0 💬 0 📌 0
Preview
Participation in the community call for DTS Please provide your email address below to be receive a link for the community call. All other fields are optional, and just help us know a little more about the community.

To mark this milestone, we are organizing a community call on April 23rd, 15:00 CET / 9:00 EST.
Register here: docs.google.com/forms/d/e/1F...

16.02.2026 13:08 👍 4 🔁 1 💬 0 📌 0
Specifications The Distributed Text Services (DTS) Specification defines a Hypermedia-Driven Web API for working with collections of text as machine-actionable data.

We are pleased to announce the official release of Distributed Text Services (DTS) v1.0 — a stable specification, ready for broad adoption.

This release is the result of years of collaborative development, and community feedback.

The specification are available at: dtsapi.org/specificatio...

16.02.2026 13:08 👍 15 🔁 11 💬 1 📌 0

Thanks

16.02.2026 10:34 👍 0 🔁 0 💬 1 📌 0

Oh ! Sorry, I meant @danielvanstrien.bsky.social and did not see I clicked on the wrong username...

16.02.2026 09:57 👍 0 🔁 0 💬 1 📌 0