Trending
Daniel Khashabi's Avatar

Daniel Khashabi

@danielkhashabi

I play with intuitions and data. Now: @jhuclsp @jhucompsci Past: @allen_ai @uwnlp @Penn @cogcomp @Illinois_Alma @MSFTResearch

24
Followers
25
Following
97
Posts
03.04.2025
Joined
Posts Following

Latest posts by Daniel Khashabi @danielkhashabi

How should multiple agents ๐—ฐ๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐—ฐ๐—ฎ๐˜๐—ฒ ๐˜๐—ผ ๐—ฐ๐—ผ๐—ผ๐—ฟ๐—ฑ๐—ถ๐—ป๐—ฎ๐˜๐—ฒ in tight spaces? This remains challenging!

See @suyu_ye's solution ๐Ÿ‘‡
x.com/suyu_ye/sta...

09.03.2026 21:18 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

We show that GP provably identifies a target among N documents in O(log N) rounds, ensuring scalability to many-document settings.

More in the paper: arxiv.org/pdf/2510.09770

11.02.2026 22:48 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Specifically, it searches over long contexts by (i) reordering documents to concentrate high-belief items in highly โ€œdiagnosticโ€ positions, and (ii) updating beliefs about document relevance from model outputs.

11.02.2026 22:48 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

We introduce โญ๐†๐จ๐ฅ๐-๐๐š๐ง๐ง๐ข๐ง๐ โญ, a black-box Bayesian framework that, at inference time, strategically and iteratively shuffles documents to overcome positional bias.

11.02.2026 22:48 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

LLMs continue to struggle with long-context tasksโ€”such as needle-in-a-haystack problemsโ€”because of โ€œpositional bias.โ€ What can we do if we only have ๐˜ฃ๐˜ญ๐˜ข๐˜ค๐˜ฌ-๐˜ฃ๐˜ฐ๐˜น access to the model? (i.e., we canโ€™t modify the model weights or attention patterns, as is often the case with API models.)

11.02.2026 22:48 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
BiomedArena.AI - Transparent AI Model Evaluation Platform Compare and evaluate leading AI models side-by-side through community voting.

* On CARDBiomedBench, outperforming top frontier models
* Lives inside a fully open platform, ready for experimentation, benchmarking, and real-world science

๐Ÿงช Read the full blog: lnkd.in/emJjTAue
๐Ÿ” Try it today on: biomedarena.ai

22.01.2026 05:00 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

We have been busy building our science co-pilot for Genomics AI Agent at @DataTecnica which is specialized in Alzheimerโ€™s and neurodegenerative disease research.

This system:
* Synthesizes complex biomedical data across literature and genomics databases

22.01.2026 05:00 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Postdoctoral Fellowship Program - Johns Hopkins Data Science and AI Institute The Johns Hopkins Data Science and AI (DSAI) Institute welcomes applications for its postdoctoral fellowship program, seeking disciplinarily diverse scholars to advance foundational methods of data science and artificial intelligence,โ€ฆ

Postdoc positions:
ai.jhu.edu/careers/pos...

Applications are due January 23, 2026.

Positions are for 2 years with the possibility of an extension.

21.01.2026 05:00 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
CARDBiomedBench: A Benchmark for Evaluating Large Language Model Performance in Biomedical Research Backgrounds Biomedical research requires sophisticated understanding and reasoning across multiple specializations. While large language models (LLMs) show promise in scientific applications, their capability to safely and accurately support complex biomedical research remains uncertain. Methods We present CARDBiomedBench , a novel question-and-answer benchmark for evaluating LLMs in biomedical research. For our pilot implementation, we focus on neurodegenerative diseases (NDDs), a domain requiring integration of genetic, molecular, and clinical knowledge. The benchmark combines expert-annotated question-answer (Q/A) pairs with semi-automated data augmentation, drawing from authoritative public resources including drug development data, genome-wide association studies (GWAS), and Summary-data based Mendelian Randomization (SMR) analyses. We evaluated seven private and open-source LLMs across ten biological categories and nine reasoning skills, using novel metrics to assess both respon

Overdue update โ€” CARDBiomedBench will be featured in @LancetDigitalH! ๐ŸŽ‰

If you're looking for a high-quality and challenging science benchmark for your AI model, this could be it!

๐Ÿค— Dataset: huggingface.co/datasets/NI...
๐Ÿ“„ Paper: biorxiv.org/content/10....
x.com/DanielKhash...

20.01.2026 21:45 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Postdoctoral Fellowship Program - Johns Hopkins Data Science and AI Institute The Johns Hopkins Data Science and AI (DSAI) Institute welcomes applications for its postdoctoral fellowship program, seeking disciplinarily diverse scholars to advance foundational methods of data science and artificial intelligence,โ€ฆ

Postdoc positions:
ai.jhu.edu/careers/pos...

Applications are due January 23, 2026.

Positions are for 2 years with the possibility of an extension.

16.01.2026 12:15 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Postdoctoral Fellowship Program - Johns Hopkins Data Science and AI Institute The Johns Hopkins Data Science and AI (DSAI) Institute welcomes applications for its postdoctoral fellowship program, seeking disciplinarily diverse scholars to advance foundational methods of data science and artificial intelligence,โ€ฆ

Postdoc positions:
ai.jhu.edu/careers/pos...

Applications are due January 23, 2026.

Positions are for 2 years with the possibility of an extension.

07.01.2026 21:15 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Postdoctoral Fellowship Program - Johns Hopkins Data Science and AI Institute The Johns Hopkins Data Science and AI (DSAI) Institute welcomes applications for its postdoctoral fellowship program, seeking disciplinarily diverse scholars to advance foundational methods of data science and artificial intelligence,โ€ฆ

Postdoc positions:
ai.jhu.edu/careers/pos...

Applications are due January 23, 2026.

Positions are for 2 years with the possibility of an extension.

02.01.2026 14:00 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

We're extremely thankful to the Evo2 team ( @BrianHie @pdhsu @garykbrixi @mgdurrant @MichaelPoli6 etc.). Not only these models help advance biomed research, now we see that they can help AI community better understand the fundamentals of pre-training.

18.11.2025 17:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Paper page - Genomic Next-Token Predictors are In-Context Learners

Draft: huggingface.co/papers/2511...

Huge thanks to @N8Programs for leading the work, and to collaborators @anqi_liu33 @aamixsh @mrevsine @mike_schatz.

18.11.2025 17:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐——๐—ผ๐—ฒ๐˜€ ๐˜๐—ต๐—ถ๐˜€ ๐—บ๐—ฒ๐—ฎ๐—ป ๐—ต๐˜‚๐—บ๐—ฎ๐—ป ๐—น๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ถ๐˜€ ๐—ถ๐—ฟ๐—ฟ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐—ป๐˜? No! But it suggests there may be universal distributional properties across different languages (human, DNA, etc.) that yield ICL. It remains an open question what these properties are.

18.11.2025 17:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐——๐—ผ๐—ฒ๐˜€ ๐—œ๐—–๐—Ÿ ๐—ถ๐—ป ๐—ด๐—ฒ๐—ป๐—ผ๐—บ๐—ถ๐—ฐ ๐˜ƒ๐˜€ ๐—น๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐—ฎ๐—ฐ๐˜ ๐—ถ๐—ฑ๐—ฒ๐—ป๐˜๐—ถ๐—ฐ๐—ฎ๐—น๐—น๐˜†? No! While share macro-level ICL trends, each shows domain-specific inductive biases traceable to properties of DNA vs human language.

18.11.2025 17:27 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐—ช๐—ต๐˜† ๐—ถ๐˜ ๐—บ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐˜€: To our knowledge, this is the first evidence of emergent ICL in non-[human]language symbolic sequences. It suggests that ICL is modality-agnostic, and a general consequence of large-scale autoregressive training on rich data distributions.

18.11.2025 17:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

This lets us compare Evo2 (genomic) vs Qwen3 (language) under matched few-shot prompts.

18.11.2025 17:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐—›๐—ผ๐˜„ ๐—ฑ๐—ถ๐—ฑ ๐˜„๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฎ๐—ฟ๐—ฒ ๐—ด๐—ฒ๐—ป๐—ผ๐—บ๐—ถ๐—ฐ ๐˜ƒ๐˜€ ๐—น๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€? We built a suite of symbolic bitstring-reasoning tasks and encoded them two ways: (1) genomic alphabet (A/T/C/G) and (2) linguistic alphabet (digits).

18.11.2025 17:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

โ†’ similar log-linear gains with more shots
โ†’ similar improvement with model scale
... all learned purely from DNA (nucleotide) sequences.

18.11.2025 17:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Thrilled to share our latest result: ๐—š๐—ฒ๐—ป๐—ผ๐—บ๐—ถ๐—ฐ๐Ÿงฌ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฑ ๐™ค๐™ฃ๐™ก๐™ฎ ๐—ผ๐—ป '๐—ป๐—ฒ๐˜…๐˜-๐—ป๐˜‚๐—ฐ๐—น๐—ฒ๐—ผ๐˜๐—ถ๐—ฑ๐—ฒ ๐—ฝ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜๐—ถ๐—ผ๐—ป' ๐—ฒ๐˜…๐—ต๐—ถ๐—ฏ๐—ถ๐˜ ๐—œ๐—–๐—Ÿ!

What's remarkable is that their overall pattern closely mirrors LLMs:
โ†’ similar few-shot pattern induction

18.11.2025 17:27 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

For years since the GPT-2 paper, emergent in-context learning (ICL) from 'next-token' training has been treated as something deeply tied to ๐ก๐ฎ๐ฆ๐š๐ง ๐ฅ๐š๐ง๐ ๐ฎ๐š๐ ๐ž. But โ€ฆ is it?

18.11.2025 17:27 ๐Ÿ‘ 2 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

Big congrats to @jackjingyuzhang for being named an Amazon AI PhD Fellow! ๐ŸŽ‰ Grateful for @AmazonScience @RohitPrasadAIโ€™s support as we work together to advance AI research at JHU.
x.com/jackjingyuz...

24.10.2025 16:08 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Paper page - IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning

๐—ฆ๐—ฒ๐—ฒ ๐˜๐—ต๐—ฒ ๐—ฑ๐—ฒ๐˜๐—ฎ๐—ถ๐—น๐˜€ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ณ๐—ถ๐—ป๐—ฑ๐—ถ๐—ป๐—ด๐˜€: huggingface.co/papers/2509...

Work lead by @aamixsh and in collaboration with @anqi_liu33.
@HopkinsEngineer @JHUCompSci

x.com/aamixsh/sta...

03.10.2025 14:23 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

For 2๏ธโƒฃ, we introduce ๐‘จ๐’„๐’•๐’Š๐’—๐’‚๐’•๐’Š๐’๐’ ๐‘จ๐’๐’Š๐’ˆ๐’๐’Ž๐’†๐’๐’• (๐‘ฐ๐‘จ๐Ÿ) -- a method that ๐˜ฅ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ญ๐˜ญ๐˜ด ๐˜๐˜Š๐˜“ ๐˜ข๐˜ค๐˜ต๐˜ช๐˜ท๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ด ๐˜ช๐˜ฏ๐˜ต๐˜ฐ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฑ๐˜ข๐˜ณ๐˜ข๐˜ฎ๐˜ฆ๐˜ต๐˜ฆ๐˜ณ๐˜ด ๐˜ฐ๐˜ง ๐˜ข ๐˜ฑ๐˜ณ๐˜ฆ-๐˜ต๐˜ณ๐˜ข๐˜ช๐˜ฏ๐˜ฆ๐˜ฅ ๐˜ฎ๐˜ฐ๐˜ฅ๐˜ฆ๐˜ญ. Then, running SFT on top of this "primed" model leads to consistent gains over vanilla SFT.

03.10.2025 14:23 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

On 1๏ธโƒฃ, building on prior findings, we find that ICL and SFT trigger distinct โšกactivationโšก patterns -- an additional signal that ICL and SFT operate differently. We also find that ICL is generally more calibrated than SFT, though sometimes at the cost of accuracy.

03.10.2025 14:23 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Our latest work asks two questions:
1๏ธโƒฃ Do ICL and SFT operate differently?
2๏ธโƒฃ And if so, can one ๐—น๐—ฒ๐˜ƒ๐—ฒ๐—ฟ๐—ฎ๐—ด๐—ฒ ๐˜๐—ต๐—ฒ๐—ถ๐—ฟ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐—ฟ๐—ถ๐˜๐˜† ๐—ณ๐—ผ๐—ฟ ๐—ฏ๐—ฒ๐˜๐˜๐—ฒ๐—ฟ ๐—ฎ๐—ฑ๐—ฎ๐—ฝ๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป?

03.10.2025 14:23 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

ICL and SFT are the two most studied ways to adapt LMs. We understand each in isolation โ€” but far less about how they might ๐—ฐ๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—ผ๐—ป๐—ฒ ๐—ฎ๐—ป๐—ผ๐˜๐—ต๐—ฒ๐—ฟ.

03.10.2025 14:23 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
GitHub - JHU-CLSP/hell-or-high-water: Code and data for the paper: "Hell or High Water: Evaluating Agentic Recovery from External Failures" Code and data for the paper: "Hell or High Water: Evaluating Agentic Recovery from External Failures" - JHU-CLSP/hell-or-high-water

Paper: arxiv.org/abs/2508.11027 (to appear in @COLM_conf)
Code: github.com/JHU-CLSP/he...

With @andrewwnlp (lead), Sophia Hager, Adi Asija, Nicholas Andrews @HopkinsEngineer @JohnsHopkins

19.09.2025 14:29 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

๐Ÿ‘‰ The overall takeaway: LLM agents today are brittle in open-world environments. For real-world deployment, we need robust strategies for fallback planning and recovery.

19.09.2025 14:29 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0