This tool is especially useful in cases when evaluations are expensive (e.g. LM harness eval) and you want to track model performance during training.
15.06.2025 22:27
๐ 1
๐ 0
๐ฌ 0
๐ 0
This tool is especially useful in cases when evaluations are expensive (e.g. LM harness eval) and you want to track model performance during training.
New side project!
assayer: A simple Python-RQ based tool to automatically monitor and evaluate ML model checkpoints offline during training.
Excited to share our recent work, AuPair, an inference-time technique that builds on the premise of in-context learning to improve LLM coding performance!
arxiv.org/abs/2502.18487
๐งต