How should multiple agents ๐ฐ๐ผ๐บ๐บ๐๐ป๐ถ๐ฐ๐ฎ๐๐ฒ ๐๐ผ ๐ฐ๐ผ๐ผ๐ฟ๐ฑ๐ถ๐ป๐ฎ๐๐ฒ in tight spaces? This remains challenging!
See @suyu_ye's solution ๐
x.com/suyu_ye/sta...
How should multiple agents ๐ฐ๐ผ๐บ๐บ๐๐ป๐ถ๐ฐ๐ฎ๐๐ฒ ๐๐ผ ๐ฐ๐ผ๐ผ๐ฟ๐ฑ๐ถ๐ป๐ฎ๐๐ฒ in tight spaces? This remains challenging!
See @suyu_ye's solution ๐
x.com/suyu_ye/sta...
We show that GP provably identifies a target among N documents in O(log N) rounds, ensuring scalability to many-document settings.
More in the paper: arxiv.org/pdf/2510.09770
Specifically, it searches over long contexts by (i) reordering documents to concentrate high-belief items in highly โdiagnosticโ positions, and (ii) updating beliefs about document relevance from model outputs.
We introduce โญ๐๐จ๐ฅ๐-๐๐๐ง๐ง๐ข๐ง๐ โญ, a black-box Bayesian framework that, at inference time, strategically and iteratively shuffles documents to overcome positional bias.
LLMs continue to struggle with long-context tasksโsuch as needle-in-a-haystack problemsโbecause of โpositional bias.โ What can we do if we only have ๐ฃ๐ญ๐ข๐ค๐ฌ-๐ฃ๐ฐ๐น access to the model? (i.e., we canโt modify the model weights or attention patterns, as is often the case with API models.)
* On CARDBiomedBench, outperforming top frontier models
* Lives inside a fully open platform, ready for experimentation, benchmarking, and real-world science
๐งช Read the full blog: lnkd.in/emJjTAue
๐ Try it today on: biomedarena.ai
We have been busy building our science co-pilot for Genomics AI Agent at @DataTecnica which is specialized in Alzheimerโs and neurodegenerative disease research.
This system:
* Synthesizes complex biomedical data across literature and genomics databases
Postdoc positions:
ai.jhu.edu/careers/pos...
Applications are due January 23, 2026.
Positions are for 2 years with the possibility of an extension.
Overdue update โ CARDBiomedBench will be featured in @LancetDigitalH! ๐
If you're looking for a high-quality and challenging science benchmark for your AI model, this could be it!
๐ค Dataset: huggingface.co/datasets/NI...
๐ Paper: biorxiv.org/content/10....
x.com/DanielKhash...
Postdoc positions:
ai.jhu.edu/careers/pos...
Applications are due January 23, 2026.
Positions are for 2 years with the possibility of an extension.
Postdoc positions:
ai.jhu.edu/careers/pos...
Applications are due January 23, 2026.
Positions are for 2 years with the possibility of an extension.
Postdoc positions:
ai.jhu.edu/careers/pos...
Applications are due January 23, 2026.
Positions are for 2 years with the possibility of an extension.
We're extremely thankful to the Evo2 team ( @BrianHie @pdhsu @garykbrixi @mgdurrant @MichaelPoli6 etc.). Not only these models help advance biomed research, now we see that they can help AI community better understand the fundamentals of pre-training.
Draft: huggingface.co/papers/2511...
Huge thanks to @N8Programs for leading the work, and to collaborators @anqi_liu33 @aamixsh @mrevsine @mike_schatz.
๐๐ผ๐ฒ๐ ๐๐ต๐ถ๐ ๐บ๐ฒ๐ฎ๐ป ๐ต๐๐บ๐ฎ๐ป ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ ๐ถ๐ ๐ถ๐ฟ๐ฟ๐ฒ๐น๐ฒ๐๐ฎ๐ป๐? No! But it suggests there may be universal distributional properties across different languages (human, DNA, etc.) that yield ICL. It remains an open question what these properties are.
๐๐ผ๐ฒ๐ ๐๐๐ ๐ถ๐ป ๐ด๐ฒ๐ป๐ผ๐บ๐ถ๐ฐ ๐๐ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ฎ๐ฐ๐ ๐ถ๐ฑ๐ฒ๐ป๐๐ถ๐ฐ๐ฎ๐น๐น๐? No! While share macro-level ICL trends, each shows domain-specific inductive biases traceable to properties of DNA vs human language.
๐ช๐ต๐ ๐ถ๐ ๐บ๐ฎ๐๐๐ฒ๐ฟ๐: To our knowledge, this is the first evidence of emergent ICL in non-[human]language symbolic sequences. It suggests that ICL is modality-agnostic, and a general consequence of large-scale autoregressive training on rich data distributions.
This lets us compare Evo2 (genomic) vs Qwen3 (language) under matched few-shot prompts.
๐๐ผ๐ ๐ฑ๐ถ๐ฑ ๐๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐ฎ๐ฟ๐ฒ ๐ด๐ฒ๐ป๐ผ๐บ๐ถ๐ฐ ๐๐ ๐น๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐บ๐ผ๐ฑ๐ฒ๐น๐? We built a suite of symbolic bitstring-reasoning tasks and encoded them two ways: (1) genomic alphabet (A/T/C/G) and (2) linguistic alphabet (digits).
โ similar log-linear gains with more shots
โ similar improvement with model scale
... all learned purely from DNA (nucleotide) sequences.
Thrilled to share our latest result: ๐๐ฒ๐ป๐ผ๐บ๐ถ๐ฐ๐งฌ ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐๐ฟ๐ฎ๐ถ๐ป๐ฒ๐ฑ ๐ค๐ฃ๐ก๐ฎ ๐ผ๐ป '๐ป๐ฒ๐
๐-๐ป๐๐ฐ๐น๐ฒ๐ผ๐๐ถ๐ฑ๐ฒ ๐ฝ๐ฟ๐ฒ๐ฑ๐ถ๐ฐ๐๐ถ๐ผ๐ป' ๐ฒ๐
๐ต๐ถ๐ฏ๐ถ๐ ๐๐๐!
What's remarkable is that their overall pattern closely mirrors LLMs:
โ similar few-shot pattern induction
For years since the GPT-2 paper, emergent in-context learning (ICL) from 'next-token' training has been treated as something deeply tied to ๐ก๐ฎ๐ฆ๐๐ง ๐ฅ๐๐ง๐ ๐ฎ๐๐ ๐. But โฆ is it?
Big congrats to @jackjingyuzhang for being named an Amazon AI PhD Fellow! ๐ Grateful for @AmazonScience @RohitPrasadAIโs support as we work together to advance AI research at JHU.
x.com/jackjingyuz...
๐ฆ๐ฒ๐ฒ ๐๐ต๐ฒ ๐ฑ๐ฒ๐๐ฎ๐ถ๐น๐ ๐ผ๐ณ ๐๐ต๐ฒ ๐ณ๐ถ๐ป๐ฑ๐ถ๐ป๐ด๐: huggingface.co/papers/2509...
Work lead by @aamixsh and in collaboration with @anqi_liu33.
@HopkinsEngineer @JHUCompSci
x.com/aamixsh/sta...
For 2๏ธโฃ, we introduce ๐จ๐๐๐๐๐๐๐๐๐ ๐จ๐๐๐๐๐๐๐๐ (๐ฐ๐จ๐) -- a method that ๐ฅ๐ช๐ด๐ต๐ช๐ญ๐ญ๐ด ๐๐๐ ๐ข๐ค๐ต๐ช๐ท๐ข๐ต๐ช๐ฐ๐ฏ๐ด ๐ช๐ฏ๐ต๐ฐ ๐ต๐ฉ๐ฆ ๐ฑ๐ข๐ณ๐ข๐ฎ๐ฆ๐ต๐ฆ๐ณ๐ด ๐ฐ๐ง ๐ข ๐ฑ๐ณ๐ฆ-๐ต๐ณ๐ข๐ช๐ฏ๐ฆ๐ฅ ๐ฎ๐ฐ๐ฅ๐ฆ๐ญ. Then, running SFT on top of this "primed" model leads to consistent gains over vanilla SFT.
On 1๏ธโฃ, building on prior findings, we find that ICL and SFT trigger distinct โกactivationโก patterns -- an additional signal that ICL and SFT operate differently. We also find that ICL is generally more calibrated than SFT, though sometimes at the cost of accuracy.
Our latest work asks two questions:
1๏ธโฃ Do ICL and SFT operate differently?
2๏ธโฃ And if so, can one ๐น๐ฒ๐๐ฒ๐ฟ๐ฎ๐ด๐ฒ ๐๐ต๐ฒ๐ถ๐ฟ ๐ฐ๐ผ๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐๐ฎ๐ฟ๐ถ๐๐ ๐ณ๐ผ๐ฟ ๐ฏ๐ฒ๐๐๐ฒ๐ฟ ๐ฎ๐ฑ๐ฎ๐ฝ๐๐ฎ๐๐ถ๐ผ๐ป?
ICL and SFT are the two most studied ways to adapt LMs. We understand each in isolation โ but far less about how they might ๐ฐ๐ผ๐บ๐ฝ๐น๐ฒ๐บ๐ฒ๐ป๐ ๐ผ๐ป๐ฒ ๐ฎ๐ป๐ผ๐๐ต๐ฒ๐ฟ.
Paper: arxiv.org/abs/2508.11027 (to appear in @COLM_conf)
Code: github.com/JHU-CLSP/he...
With @andrewwnlp (lead), Sophia Hager, Adi Asija, Nicholas Andrews @HopkinsEngineer @JohnsHopkins
๐ The overall takeaway: LLM agents today are brittle in open-world environments. For real-world deployment, we need robust strategies for fallback planning and recovery.