Trending

#ModelScaling

Latest posts tagged with #ModelScaling on Bluesky

Latest Top
Trending

Posts tagged #ModelScaling

Boomerang Distillation Enables Zero‑Shot Interpolation of Model Sizes

Boomerang Distillation Enables Zero‑Shot Interpolation of Model Sizes

Boomerang distillation creates a size range from one teacher‑student pair, needing no extra training after distillation. The team released code on GitHub. Read more: getnews.me/boomerang-distillation-e... #boomerangdistillation #modelscaling

0 0 0 0
How scaling post-training is offering gains in AI intelligence Investing.com -- The AI race is entering a new phase, with a sharp pivot from the brute-force scaling of training data to the strategic amplification of models after training. Recent disclosures around xAI’s Grok 4 model show a structural shift in how intelligence gains are being unlocked, not by growing model size or feeding in more data, but by investing heavily in post-training compute. Until late 2024, most advances in AI were driven by “Chinchilla” scaling laws—training ever-larger models on ever-larger datasets. That changed with OpenAI’s o1 model and is now being accelerated by Grok 4, according to Barclays. The Grok 4 model uses roughly the same pre-training compute as its predecessor but achieves markedly higher intelligence levels thanks to a tenfold increase in reinforcement learning applied after the initial training phase. Unlike pre-training, which relies on unsupervised learning of vast text corpora, post-training through reinforcement learning allows models to improve through trial and error using curated tasks. In Grok 4’s case, this shift not only improved reasoning and problem-solving but did so without increasing the number of model parameters, keeping inference costs lower while boosting output quality. The implications are significant. “There shouldn’t be as much of a reliance on scaling up raw pre-training data tokens to achieve higher performance and intelligence,” the analyst at Barclays said. Instead to agentic models that can plan, reason, and interact with tools in complex environments. One benchmark shows Grok 4 outperforming other leading models, and humans at managing a simulated vending machine business, a test of economic reasoning and adaptability. This evolution has clear ramifications for compute demand and capex. Whereas earlier models generated responses in single steps, agentic models now reason in chains, issuing 15 times more compute-intensive tokens per query. As such, post-training may not just be the route to smarter models, it may also justify the staggering infrastructure investments being made by hyperscalers. In a field long defined by pre-training scale, post-training may be where the real intelligence lies. Don't miss out on the next big opportunity! Stay ahead of the curve with ProPicks AI – 6 model portfolios fueled by AI stock picks with a stellar performance this year... In 2024 alone, ProPicks AI identified 2 stocks that surged over 150%, 4 additional stocks that leaped over 30%, and 3 more that climbed over 25%. That's an impressive track record. With portfolios tailored for Dow stocks, S&P stocks, Tech Stocks, and Mid Cap stocks, you can explore various wealth-building strategies. So if BARC is on your watchlist, it could be very wise to know whether or not it made the ProPicks AI lists.

Click Subscribe #AI #ArtificialIntelligence #MachineLearning #PostTraining #ModelScaling

1 1 0 0

GPT-4.5: Largest public AI model yet, but improvements subtle. Future value in integration, not chat.
www.interconnects.ai/p/gpt-45-not-a-frontier-...
#aidevelopment #modelscaling #openai #technicalanalysis #businessstrategy

0 0 0 0
Preview
Is AI progress slowing down? Making sense of recent technology trends and claims

Got this book @aisnakeoil.com recently and this article is a great read (beware rabbit hole: It is chock full of links 🙂). Is #InferenceScaling #ArtificialIntelligence metaprogramming? Hard to see how that is the answer even with best #ModelScaling!

0 0 0 0