rev. howard arson's Avatar

rev. howard arson

@theophite

god created him and demanded that he die

25,664
Followers
750
Following
25,120
Posts
11.04.2023
Joined
Posts Following

Latest posts by rev. howard arson @theophite

that isn't totally deranged, right? basically i'm not trying to find anything knew, it's just that there is a single eigenvector where if you can estimate it you can throw noise at it for free, because it causes a uniform perturbation to softmax.

11.03.2026 14:41 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

like, essentially all advances in the field are coming out of better understanding of the things you are claiming do not exist.

11.03.2026 14:23 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

i'm actually doing something completely different, which is "estimating the preimage of the 1s vector at logits and routing all the quantization noise there, then stapling on a small 32-bit correction to send it there very precisely."

11.03.2026 14:20 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

[turning around chair and sitting on it backwards] let me tell you about the one in three persons who invented Gender

11.03.2026 14:18 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

frustratingly, the 6-bit lora thing is still using a dumb code path which doesn't work.

11.03.2026 14:14 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

none of these things are true in any meaningful sense

11.03.2026 14:11 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

we of course fully understand how we built it. we cannot read the artifact which results from the initial process, and the initial process is followed by a series of steps which are very eccentric and hoc state-space optimization functions which have been developed empirically and atheoretically.

11.03.2026 05:29 πŸ‘ 6 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

it very much is not. it is not programmed by anyone, no one knows how they work to any degree of accuracy, and making them do anything at all is essentially trial and error

11.03.2026 05:24 πŸ‘ 9 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0
Preview
The Scale of Billionaires’ Campaign Donations is Overwhelming U.S. Politics

How the Roberts Court destroyed our Democracy

"Five presidential elections ago, before the Supreme Court’s 2010 ruling that lifted many remaining campaign finance restrictions, the share of billionaire spending was almost zero β€” 0.3 percent, to be precise."

www.nytimes.com/2026/03/09/u...

10.03.2026 02:38 πŸ‘ 510 πŸ” 207 πŸ’¬ 16 πŸ“Œ 7

personal seeds aren't empirically validated whereas 42 has seen millions of GPU-years of training time

11.03.2026 04:50 πŸ‘ 59 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

[blowing on my GPU and shaking my workstation] c'mon, c'mon, daddy needs a new pair of shoes

11.03.2026 04:47 πŸ‘ 52 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

42 is 3e-4

11.03.2026 04:46 πŸ‘ 8 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

okay, I was doing different calibration sets per model.

11.03.2026 04:39 πŸ‘ 9 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

i do not believe Harris was a warmonger because the vice president has no causal power over war

11.03.2026 04:37 πŸ‘ 163 πŸ” 4 πŸ’¬ 4 πŸ“Œ 0

alzabo blood

11.03.2026 04:15 πŸ‘ 28 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0

this is on CPU with T5-SMALL. i am doing it on CPU precisely because i am trying to prove a thesis and do not want to second-guess the number of matmuls i am doing, because if it is O(the number of matmuls which would be reasonable) the CPU will freeze until next month.

11.03.2026 04:12 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I think about this post every day and get progressively more angry

11.03.2026 03:53 πŸ‘ 166 πŸ” 21 πŸ’¬ 5 πŸ“Œ 0

everyone who moved to london during the industrial revolution to sleep draped over a rope in a poorhouse and shit in the street and get their hands torn off by the satanic mills moved there to get away from agricultural labor

11.03.2026 03:53 πŸ‘ 304 πŸ” 35 πŸ’¬ 7 πŸ“Œ 1

i VERY STRONGLY disbelieve this conclusion but i am not yet ruling it out. it is most likely the fact that I used different calibration sets on every model, which would be exactly the dumb thing i did if i put the iterator in the wrong place.

11.03.2026 03:50 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

there is one way in which this could be correct, which is if the float32 LoRA correction in the Fisher eigenbasis is fixing the primary points of sensitivity whereas the null-routing of quantization noise means that the most sensitive directions no longer see any noise at all.

11.03.2026 03:48 πŸ‘ 6 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0

"i made the model 8x smaller using a novel method and my metrics say it got better, which i do not believe."

11.03.2026 03:43 πŸ‘ 10 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

well uh i don't believe this

11.03.2026 03:26 πŸ‘ 15 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

yeah, self-prediction perplexity

11.03.2026 03:26 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

okay uh

11.03.2026 03:23 πŸ‘ 18 πŸ” 0 πŸ’¬ 4 πŸ“Œ 1

that KL divergence number is absolutely wild. honestly i care more about KL-div than PPL, because "confidently wrong" is still low-PPL.

11.03.2026 01:44 πŸ‘ 13 πŸ” 0 πŸ’¬ 2 πŸ“Œ 1
Post image

well, it's doing _something_

11.03.2026 01:35 πŸ‘ 12 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1

42 is the 3e-4 of seeds

11.03.2026 00:53 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

the Rosicrucian Egyptian Museum is a fantastic collection of Egyptian artifacts curated by an unreliable narrator. they're subtle about their bias in the main collection, but in the Alchemy exhibit they strongly imply that they made certain scientific discoveries centuries before anyone else.

10.03.2026 20:47 πŸ‘ 40 πŸ” 2 πŸ’¬ 3 πŸ“Œ 0

the intuition is that the softmax operation zeroes out any uniform change in the distribution, so if you propagate stuff down through the model so that it produces a uniform change in the output distribution, you don't have to pay any loss to dump your error term there.

10.03.2026 22:26 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

(i do not know that this is a widely known result but i am presently using it as a wastebasket to dump quantization error into.)

10.03.2026 19:08 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0