Kirill Maslinsky's Avatar

Kirill Maslinsky

@maslinych

Computational literary studies with a modicum of pure linguistics | research design, infrastructure and methods guy | open data enthusiast and curator | doing theory+engineering @ ERC Advanced project “Theory of tone” @ INALCO, Paris

185
Followers
141
Following
35
Posts
01.10.2024
Joined
Posts Following

Latest posts by Kirill Maslinsky @maslinych

Post image

How to model learning of Mandarin tones with no supervision, from raw sound, with a realistic model of human language learning?
The model learns the four tones (male only near perfectly) and also replicates stages of tone learning in language acquisition. What is more difficult to learn for children

23.09.2025 15:54 👍 20 🔁 8 💬 1 📌 2

It may be helpful to think about LLMs as fiction generation machines (which they basically are), and treat all their output as fictional text, however realistic, rather than "hallucinations".

16.06.2025 11:09 👍 0 🔁 0 💬 0 📌 0
Post image

I often show students this figure and ask, how different is the green distribution (p < 0.05) from the blue distribution (p = 0.10)? Just to raise some awareness that the difference between "statistical significant" and "not significant" is not always that significant...

03.06.2025 06:57 👍 28 🔁 10 💬 3 📌 0
Photo of Boris Yarkho (1889-1942), one of the few ones that are discoverable on the web; he is sitting on a chair, leaning sideways and smiling.

Photo of Boris Yarkho (1889-1942), one of the few ones that are discoverable on the web; he is sitting on a chair, leaning sideways and smiling.

A scholar possessing a singular vision and a frightening working resilience, he was hounded by Stalin repression machine, exiled, denied academic positions, firewood and food; his work was largely forgotten until 2000s.

So we, who were influenced by him at the rise of DH, keep remembering.

26.03.2025 12:43 👍 10 🔁 4 💬 1 📌 0
Preview
Speech Distribution in Five-Act Tragedies (A Question of Classicism and Romanticism) Article Speech Distribution in Five-Act Tragedies (A Question of Classicism and Romanticism) was published on March 1, 2019 in the journal Journal of Literary Theory (volume 13, issue 1).

Boris Yarkho, a Moscow formalist, was born today in 1889; his work of 1920-1930s fully anticipates computational literary studies: statistical methods used not for stylistics or attribution, but for questions of literary history and theory

Read him, if you haven't www.degruyter.com/document/doi...

26.03.2025 12:43 👍 20 🔁 7 💬 1 📌 0
Post image

As a gift for a patient reader, a graph showing the cohorts in terms of total print runs. *Graphs are better news

26.03.2025 05:48 👍 0 🔁 0 💬 0 📌 0

I don't have pre-revolutionary data, so “cohorts” do not necessarily correspond to the true date of the translation of the author into Russian, esp. for “classics”. NA stands for books with no author indicated on a cover/title (folklore, collections). Data source: my dataset bsky.app/profile/masl...

26.03.2025 05:48 👍 0 🔁 0 💬 1 📌 0

WWII was an obvious bottleneck for printing, including translations. Also to note: the rise in number of translations during the Thaw, the effect persisted until around 1976. And the Thaw indeed left its trace on further circulation of translations.

26.03.2025 05:48 👍 1 🔁 0 💬 1 📌 0
Post image

no context graph: the number of translated books for children printed in Soviet Russia and USSR 1918-1984, split into “cohorts” by the moment a translated author first appears in the data. In red are mostly those “classics” who stay with us: Grimms, Andersen, Jules Verne etc.

26.03.2025 05:48 👍 0 🔁 0 💬 1 📌 0

Along these lines, I recommend Carys Craig's "The AI-Copyright Trap," which argues (in my view convincingly) that copyright law is not actually academics' friend in a context in which big tech has more money than God:

papers.ssrn.com/sol3/papers....

22.03.2025 14:05 👍 27 🔁 10 💬 0 📌 1

“Newness exists only in the minds of new up and coming researchers who didn’t live through it last time. To be really blunt, newness is just ignorance of the past.”

11.03.2025 08:31 👍 0 🔁 0 💬 0 📌 0
Post image

The data is part of Daria's ongoing research, and she does wonderful things with it. As a teaser, here's Daria's graph showing cosine similarity between journals based on the poets who published there. Huge shoutout to Daria for sharing these data!

03.03.2025 16:09 👍 3 🔁 0 💬 0 📌 0
Preview
Роспись содержания советских толстых журналов, 1955—1990 (Новый Мир, Октябрь, Наш Современник, Звезда, Знамя, Юность) В базе данных представлены авторы и названия произведений, опубликованных в литературных журналах «Новый мир», «Октябрь», «Знамя», «Звезда», «Наш С...

The data is published in the Repository of open data on Russian literature and folklore, doi.org/10.31860/ope.... The main table has an entry for every work published, and some info on authors, including party membership. Additional tables list editorial teams and the recipients of literary awards ↓

03.03.2025 16:09 👍 1 🔁 1 💬 1 📌 0

While the world is on fire, and datasets disappear here and there, we continue our modest effort to publish open data on Russian literature. This time, the contents of the Soviet “thick journals” 1955—1990, a dataset by Daria Franklin www.dariafranklin.com. See ↓ for the data

03.03.2025 16:09 👍 6 🔁 4 💬 1 📌 0

a superficial similarity is also that both result in tables with asterisks

06.02.2025 12:34 👍 1 🔁 0 💬 0 📌 0

thinking how optimality theory in phonology is like the linear modeling in social science. A model you can use when you don't have any specific theory of language, really. Epicycles all way down

06.02.2025 12:34 👍 1 🔁 0 💬 1 📌 0
Toneme as a basic unit of tonology and criteria for its identification This poster is a concise view of the theoretical framework for identifying phonological tonal inventories for the typological study of the tonal systems. We define basic comparative categories, of whi...

The database is accompanied by the theoretical framework that provides us with the toneme — a comparative concept that allows us to consistently analyze typologically diverse tonal systems. A sister poster at the same conf with concise presentation of the idea: zenodo.org/records/1481...

05.02.2025 20:43 👍 0 🔁 0 💬 0 📌 0

thot.huma-num.fr/db/ Interactive maps of languages colored by tonal status, sources for tonal status info, structured descriptions of tonal systems of a few sampled languages, accompanied with texts with detailed tonal markup.

05.02.2025 20:37 👍 0 🔁 0 💬 1 📌 0
This is a presentation of a typlological database of tonal languages: ThoTDB, available online at https://thot.huma-num.fr/db/. The database contains the most comprehensive data on which languages in the world are tonal, detailed structured descriptions of tonal systems for a typologically diverse sample of languages, accompanied with short texts with detailed tonal markup that allows to compute tonal density indices.

This is a presentation of a typlological database of tonal languages: ThoTDB, available online at https://thot.huma-num.fr/db/. The database contains the most comprehensive data on which languages in the world are tonal, detailed structured descriptions of tonal systems for a typologically diverse sample of languages, accompanied with short texts with detailed tonal markup that allows to compute tonal density indices.

How many tonal languages are out there in the world? If you need an estimate based on most comprehensive database to date, here it is: 42.7%. Concisely on a poster presented today at the #OCP22 conference in Amsterdam: zenodo.org/records/1481.... The database itself is online and has more ↓

05.02.2025 20:37 👍 3 🔁 0 💬 1 📌 0
Preview
Call for input: finding a new publication venue for our conference proceedings Dear Computational Humanities Research Community, As many of you know, we have been publishing our conference proceedings with CEUR Workshop Proceedings since the first edition of CHR back in 2020. C...

Hi people! We need a new publisher for the proceedings of the #CHR conference. Any input is greatly appreciated!

discourse.computational-humanities-research.org/t/call-for-i...

03.02.2025 14:31 👍 4 🔁 2 💬 0 📌 0

Did I mention these data are very special? The print runs of the editions were well documented throughout the Soviet period, and kept as part of bibliographic records. We have good basis here to estimate total print runs, print run by author, by gender etc.

27.12.2024 19:51 👍 0 🔁 0 💬 0 📌 0

Of 14367 unique authors 82% has a known gender, 26% have info on birth/death year, and 24.5% have wikidata person ID. It may seem like not much, but authors with known wikidata ID comprise more than 65% of total print runs of the whole period. →

27.12.2024 19:51 👍 0 🔁 0 💬 1 📌 0

transformed into structured table data. The bibliography is the most comprehensive source on all books for children (fic and non-fic) printed in Soviet Russia and USSR. This year's edition includes a separate table of unique authors. Author data has undergone massive cleanup and disambiguation. →

27.12.2024 19:51 👍 0 🔁 0 💬 1 📌 0
Preview
Библиография детской книги 1918–1984 Машиночитаемая библиографическая база данных по русской детской книге XX века. База основана на 18-томном библиографическом указателе «Детская лите...

To all bibliographic data lovers (myself included) — a yearly Christmas update of the “Bibliography of Russian children's book 1918-1984” dataset: doi.org/10.31860/ope.... For those new to the show this dataset is based on the digitized 18-volume printed bibliography by Ivan Startsev →

27.12.2024 19:51 👍 1 🔁 0 💬 1 📌 1
graph shows the yearly percentage of print runs of children's books by gender.

graph shows the yearly percentage of print runs of children's books by gender.

No context graph: a yearly proportion of total print run of all books for children printed in Soviet Russia/USSR split by gender of the author. Note the fluctuations of the share of the female authors. 1931 marks the governmental ban of private publishers, 1941 the nazi invasion. →

27.12.2024 19:51 👍 1 🔁 1 💬 1 📌 0
Preview
Dateno - datasets search engine Search engine for datasets

Totally agree with your point on decline of institutions. Still, there might be a very different kind of international instituition arising: aggregators and search engines for open data with international scope, e.g. dateno.io. They still rely on governmental and corporate data disclosure

18.12.2024 07:32 👍 1 🔁 0 💬 1 📌 0

“...it is important that everyone interested in data about culture is aware of the extent to which commercial interest prevents us from accessing data about the world in which we live, and uses the same data to shape the world for us.”
bsky.app/profile/andr...

18.12.2024 07:26 👍 1 🔁 0 💬 0 📌 0

I would

09.12.2024 20:58 👍 0 🔁 0 💬 0 📌 0

There were no presentation, unfortunately, but here's the link to the paper: ceur-ws.org/Vol-3834/pap...
Bonus — it's a short paper!

07.12.2024 22:32 👍 1 🔁 0 💬 1 📌 0

In fact, it was a ban by Aarhus university, not by CHR organizers. The responsibility of CHR might be that they preferred to hush it up when speaking about inclusion for everyone. And the responsibility of us all as an academic community is that too often we let universities define policies for us.

07.12.2024 22:10 👍 2 🔁 0 💬 0 📌 0