Trending
's Avatar

@erogol.com

Doing ML erogol.com erogol.substack.com github.com/erogol

104
Followers
44
Following
84
Posts
15.11.2024
Joined
Posts Following

Latest posts by @erogol.com

Agentic tools like OpenClaw grow with every PR β€” but bigger codebases are harder for AI to understand and extend.

What if we kept the core tiny and let agents adapt themselves to user needs? No PR, just evolve.

02.02.2026 08:53 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Model check - DeepSeek-V3.2-Exp - Fine-Grained Sparse Attention for Efficient Long-Context LLMs Going over the recently released DeepSeek-V3.2-Exp technical paper, source code and innovations.

Here is my take on new DeepSeek-V3.2-Exp

erogol.substack.com/p/model-chec...

01.10.2025 16:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Model Check - MiMo-Audio: Scaling Speech Pre-Training to 100M Hours Going over the code and the technical report of the new Speech LM model from Xiaomi that rivals GPT4o-audio and Gemini

My post on MiMo-Audio

open.substack.com/pub/erogol/p...

πŸ”₯ Trained on 100M+ hours and shows emergent few-shot learning:
β€’ Voice conversion
β€’ Emotion transferβ€’ Speech translation
β€’ Cross-modal reasoning

⚑ Key finding: Speech follows same scaling laws as text LLMs

22.09.2025 17:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Machine Learns #55 Voice + reasoning releases (Ling‑flash‑2.0, VoxCPM, Kimi K2, ultraVAD) and 2 papers: long‑horizon execution & decay‑free LR schedules.

Machine Learns #55 is out!

Full of new models… check it out

open.substack.com/pub/erogol/p...

18.09.2025 13:01 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Machine Learns #54 πŸ€– Voice models, long-context tricks, and a token-order loss worth trying Flashy audio releases + 5 papers (MoC, TOP, FELLE, M2N2, Motif TR)

machine learns #54 is out
open.substack.com/pub/erogol/p...

04.09.2025 11:07 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Model Check - VibeVoice: Next-Token Diffusion Meets Long-Form Speech Generation Going over the code and the technical report of the new TTS model from Microsoft Research.

My breakdown of VibeVoice - new open-weight TTS model from Microsoft.

open.substack.com/pub/erogol/p...

26.08.2025 11:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
microsoft/VibeVoice-1.5B Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

ms released a tts model… nice…

You can create long form convos and podcasts with 4 distinct voice

huggingface.co/microsoft/Vi...

25.08.2025 17:10 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Model check - KyutaiTTS: Streaming Text-to-Speech with Delayed Streams Modeling Going over the Kyutai's new TTS model and its delayed streaming model.

KyutaiTTS solved streaming text-to-speech with a state machine that generates audio word-by-word as text arrives.

220ms latency, 10-second voice cloning, 32 concurrent users on single GPU.

No more waiting for complete sentences.

Full analysis: erogol.substack.com/p/model-chec...

02.08.2025 19:46 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

This is such a great idea

12.06.2025 13:59 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

claude is the best coding model

gemini cause frequent syntax errors

openai does not even understand the task at hand

10.06.2025 13:38 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
BlaGPT/bla_gpt/llada.py at main Β· erogol/BlaGPT Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

lately spending sometime with Diffusion LMs and working on NanoGPT style LlaDA model

so far I've not achieved comparable results to AR models but its a good start

github.com/erogol/BlaGP...

01.06.2025 14:12 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

This work was done in collaboration with Jeff Clune’s lab at UBC, and led by his PhD students Jenny Zhang and Shengran Hu, together with Cong Lu and Robert Lange.

Paper: arxiv.org/abs/2505.22954
Code: github.com/jennyzzt/dgm

30.05.2025 02:33 πŸ‘ 12 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
Machine Learns #48 OpenAI's 'Sign in with ChatGPT', Meta's AGI ambitions, new models like Gemma 3 & MAGI-1, research breakthroughs in KV caching for diffusion & PaTH Attention, and fresh open-source releases.

⚑ Machine Learns issue 48 is out

πŸš€ dKV-Cache accelerates diffusion models up to 10x faster
πŸ” OpenAI's authentication play (think OAuth for AI)
🎯 PaTH Attention beats RoPE on long-context tasks
πŸ€– Humanoid Robot fights became real

open.substack.com/pub/erogol/p...

28.05.2025 12:25 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

Following the bread crumbs, implemented PLE from Gemma3n.

It gave a significant performance boost and resulted in a new best model with almost no compute overhead.

github.com/erogol/BlaGPT

27.05.2025 09:36 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Paper check: Merging LLMs at Pre-training, Considering Token Probabilities at RL πŸ”¬Two papers in scope: "Model Merging in Pre-training for LLMs" and "Do Not Let Low-Probability Tokens Over-Dominate in RL"

My paper notes on 2 new papers

- Model Merging in Pre-training of Large Language Models,
- Do Not Let Low-Probability Tokens Over-Dominate in RL,

open.substack.com/pub/erogol/p...

21.05.2025 12:10 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

muon really works. got best results in BlaGPT

```
torchrun --standalone --nproc_per_node=8 train.py --run_name best_model --model_name best
```

github.com/erogol/BlaGPT

08.05.2025 13:13 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - erogol/BlaGPT: Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible experimentation and exploration. - erogol/BlaGPT

All code is available in BlaGPT if you want to check it out yourself!

github.com/erogol/BlaGPT

06.05.2025 12:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

My results:

β€’ Canon Layers definitely improved performance when placed before Attention/MLP blocks
β€’ Softpick had worse validation loss but completely removed attention sinks
β€’ Parallel blocks matched baseline performance but trained 15% faster

06.05.2025 12:11 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Parallel Transformer blocks run MLP and Attention in parallel instead of one after another.

So you get: z = x + MLP(x) + Attention(x)

PaLM models use this approach, which improves memory usage and speed without hurting performance.

06.05.2025 12:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The Canon Layers paper shows they boost performance when added to transformer blocks.

They also help models without positional encoding work just as well as RoPE models.

❗Worth noting that RWKV used a similar idea years ago.

06.05.2025 12:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Canon Layers are basically causal 1D convolutions that mix the current hidden state with previous states (how many depends on the kernel size).

06.05.2025 12:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Softpick replaces regular softmax in attention blocks.

It allows zero values in the numerator and lets negative values contribute to the denominator.

This prevents attention sinks while keeping math properties similar to regular softmax.

06.05.2025 12:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

🧡 Here is a small thread with my notes about some of the recent Transformer papers.

- Softpick: an alternative to softmax in Attention
- Canon Layers: mixing states with conv1d
- Parallel Transformer blocks

06.05.2025 12:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Machine Learns #45 OpenAI's social network & GPT-4.1, China launches $8.2B AI fund, NVIDIA's US manufacturing push, new GLM-4 & MineWorld models, C3PO expert pathways optimization, GigaTok's 3B visual tokenizer...

Machine learns #45 - no fluff AI newsletter - is out!

I normally share bi-weekly but last week was full enough so here we go

open.substack.com/pub/erogol/p...

16.04.2025 13:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Updated my LLM usage and cancelled ChatGPT sub for now

Coding - Claude, Gemini 2.5
Reading papers - Claude
Research - Gemini 2.5
Daily - Gemini 2.5
Search - Gemini 2.5

11.04.2025 21:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Thanks :)

11.04.2025 21:05 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Machine Learns #44 Praxis Sam Altman's tech utopia, Amazon launches Nova Sonic voice AI, Midjourney returns with V7, Llama 4 models debut amid controversy, new brain-to-voice model, NoProp learning ...

Machine Learns #44 is out !!

click for no fluff AI newsletter

erogol.substack.com/p/machine-le...

09.04.2025 14:18 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Next big thing is Brain-LLMs.

Imagine an LLM compressing all world knowledge attached to your brain and ready to serve your thoughts and questions.

You also update it over internet and pay for sub. I don't want to think about the ad business :)

01.04.2025 13:26 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Measuring AI Ability to Complete Long Tasks Despite rapid progress on AI benchmarks, the real-world meaning of benchmark performance remains unclear. To quantify the capabilities of AI systems in terms of human capabilities, we propose a new me...

β€œIf these results generalize to real-world software tasks, extrapolation of this trend predicts that within 5 years, AI systems will be capable of automating many software tasks that currently take humans a month.”

arxiv.org/abs/2503.14499

21.03.2025 10:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

It’s crazy that Gemma3 held up for only about three days

18.03.2025 14:10 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0