Check out our work here:
arxiv.org/abs/2602.06566
@niccoloav.bsky.social @matrig.net
Check out our work here:
arxiv.org/abs/2602.06566
@niccoloav.bsky.social @matrig.net
This separation unlocks powerful capabilities:
โจ Scale "looking" independently of "thinking"
โจ Keep contexts lean โ only process relevant crops
โจ Train the "eyes" without retraining the "brain" on a single GPU
Our method SPARC explicitly decouples the "Where" (perception) from the "Why" (reasoning) โ mimicking how the brain separates early visual processing from executive function.
๐ First: aggressive visual search to find the right pixels
๐ง Then: focused reasoning on only the relevant crops
Even the most brilliant detective can't solve a case without finding the clues first.
Yet most "thinking" VLMs make this exact mistake: they entangle visual search and complex logic into one giant, expensive chain of thought.
Stop burning through tokens in the dark. Ignite a SPARC. โก
Thanks! Love to be on the list!
๐๐ป