For anybody in the mid-atlantic region, the annual conference MASC is looking for a host next year. It's a great chance for your university to meet other researchers (and potential collaborators) in our region!
For anybody in the mid-atlantic region, the annual conference MASC is looking for a host next year. It's a great chance for your university to meet other researchers (and potential collaborators) in our region!
Could you give an example of the input/output you're looking for on which function call (encode, tokenize, etc)? And maybe which tokenizer it's inheriting from π (looks like maybe the OPT models inherit from a GPT2Tokenizer?)
an compilation of adorable dog photos referencing a Simpson's meme ("Do it for her")
Happy to talk about any of these topics and more!
I will also likely end up talking a lot about my pride and joy (my dog).
And if you think sentence-level machine translation is good-enough, I encourage you to run your systems on our evaluation data (ctxpro, an extension to ContraPro and other similar evaluation datasets)
github.com/rewicks/ctxpro
Most recently I've released the ParaDocs dataset which reconstructs document annotations on large, parallel machine translation datasets. Contextual information is integral to machine translation, but often overlooked!
Data: huggingface.co/datasets/jhu...
Since we're all new here, an introduction:
I'm a final-year PhD student at Johns Hopkins University (in @jhuclsp.bsky.social working with Philipp Koehn and Matt Post.
I'm largely interested in the creation and processing of high-quality, multilingual datasets for both training and evaluation.
Putting together a JHU Center for Language and Speech Processing starter pack!
Please reply or DM me if you're doing research at CLSP and would like to be added - I'm still trying to find out which of us are on here so far.
go.bsky.app/JtWKca2
Cool work by @jhuclsp colleagues Rafael Rivera Soto and Nick Andrews on how AI-generated text carries unique stylistic fingerprints, enabling the detection and identification of specific language models.
Based on ICLR paper: arxiv.org/pdf/2401.06712
hub.jhu.edu/2024/11/18/a...