📚 Revised ECCO+EVANS TCP corpora for #DH
🔹 Platform variance: long ſ kept/→s → theſe/thefe/these; some artefacts
🔹 long ſ: 13% | 10%
🔹 #NLP norm: 98.8% | 99.1% + OCR fixes → more accurate results
🔹 #AVOBMAT: orig+norm, harmonised #metadata, doc-compare: avobmat.hu
🔹tinyurl.com/smzpmn6x
#history