Jean Czerlinski Ortega (@jeanimal)

How Reputation Captures What Metrics Cannot Fixing gaming in one-off transactions with reputation systems

If you’re designing incentives, regulations, or evaluation systems, this post offers a new framework for thinking about accountability.
Read the full piece here: medium.com/beyond-incen...
🧵7/7

12.01.2026 23:59 👍 2 🔁 0 💬 0 📌 0

The core lesson: When quality can’t be fully specified in advance, letting context and consequences flow over time outperforms even the best-designed metrics.
Stop measuring the moment; start weighing the relationship. 🧵6/7

12.01.2026 23:59 👍 0 🔁 0 💬 1 📌 0

Imagine: 🔹 Software vendors as long-term service providers with a reputational incentive to patch flaws instantly. 🔹 Academic authors who value sustained ownership of their claims—begging for replications as much as restaurants beg for 5-star reviews. 🧵5/7

12.01.2026 23:59 👍 0 🔁 0 💬 1 📌 0

Think about hotel reviews. If a hotel “100 meters from the beach” is actually separated from the beach by a wall, past customers can warn future customers.
What happens if we apply that same logic to software security or academic publishing? 🧵4/7

12.01.2026 23:59 👍 1 🔁 0 💬 1 📌 0

In my blog post, I explore a different approach: Reputation Systems.
Reputation transforms one-off, easily gamed transactions into ongoing relationships with accountability. It creates a bridge between today’s behavior and tomorrow’s consequences. 🧵3/7

12.01.2026 23:59 👍 0 🔁 0 💬 1 📌 0

When quality is elusive, our instinct is to add more metrics.
But Goodhart’s Law always wins: “When a measure becomes a target, it ceases to be a good measure.”
Metrics offer temporary relief, but they eventually invite new, creative ways to fail (or be gamed). 🧵2/7

12.01.2026 23:59 👍 2 🔁 0 💬 1 📌 0

How Reputation Captures What Metrics Cannot Fixing gaming in one-off transactions with reputation systems

Do you work in a domain where low quality is revealed only after a decision is made?

A hotel looks perfect online, software ships as “secure,” or a paper passes peer review—but the real issues only surface months later.
Here is why metrics fail us and why reputation is the answer. 🧵1/7

12.01.2026 23:59 👍 0 🔁 0 💬 1 📌 0

7/
If you're in a domain where the cheaters move faster than the rulebook, "Hindsight Accountability" can help. Read more:
👉https://medium.com/@jeanimal/hindsight-accountability-deterring-the-gaming-of-regulations-2ccdc800db09
#Cybersecurity #AIRegulation #Incentives #Governance #PolicyDesign

21.05.2025 09:33 👍 0 🔁 0 💬 0 📌 0

6/
This isn’t just a technical fix.
It’s a philosophical shift in regulation:
We don’t need to anticipate every trick—we just need to track evidence well enough to figure out the tricks later. Gamers will be deterred.

21.05.2025 09:33 👍 0 🔁 0 💬 1 📌 0

5/
🔐 Cybersecurity already uses these ideas.
Firms track malware reports, identify new patterns over time, and retroactively patch their defenses.
Some regulations now require these tracking systems.
It’s hindsight, made actionable.

21.05.2025 09:33 👍 0 🔁 0 💬 1 📌 0

4/
🏦 In banking, clawbacks let firms reclaim bonuses for deals that later go bad.
Even if the deal-makers snuck in a bad deal at the time, long-term performance still matters.
That changes how people play the game.

21.05.2025 09:33 👍 0 🔁 0 💬 1 📌 0

3/
🏅 Sports agencies now freeze athletes’ biological samples for 10 years.
When new drug tests emerge, they re-analyze.
And sometimes strip medals retroactively.
It’s not just punishment—it’s deterrence.
Cheaters know the past can catch up.

21.05.2025 09:33 👍 1 🔁 0 💬 1 📌 0

2/
Regulators leverage hindsight accountability when they:

1. Store evidence
2. Reanalyze the evidence with better tools & context later-- to catch people gaming the rules
3. Apply retroactive consequences

It’s no silver bullet. But it can deter people from gaming in the first place.

21.05.2025 09:33 👍 0 🔁 0 💬 1 📌 0

Hindsight Accountability: Deterring the Gaming of Regulations From sports dopers to hackers, some cheaters can only be caught in hindsight

🧵1/
Fast-moving domains like cybersecurity evolve too quickly for static rules.
Adaptive regulation has scheduled review and updates, but hackers evolve faster.
An approach I call “hindsight accountability” can help:
medium.com/@jeanimal/hi...

21.05.2025 09:33 👍 0 🔁 0 💬 1 📌 0

02.03.2025 16:36 👍 0 🔁 0 💬 0 📌 0

LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization We introduce LLM-Lasso, a novel framework that leverages large language models (LLMs) to guide feature selection in Lasso $\ell_1$ regression. Unlike traditional methods that rely solely on numerical ...

LLM-lasso keeps the theory of Lasso, while using an LLM to analyze domain-specific metadata to improve the weights of the regularizer. Result: better performance on biomedical case studies.

Plus, since lasso reduces the number of features, it's more interpretable!

arxiv.org/abs/2502.10648

02.03.2025 16:36 👍 0 🔁 0 💬 1 📌 0

The slides for my lectures on (Bayesian) Active Learning, Information Theory, and Uncertainty are online now 🥳 They cover quite a bit from basic information theory to some recent papers:

blackhc.github.io/balitu/

and I'll try to add proper course notes over time 🤗

17.12.2024 06:50 👍 177 🔁 28 💬 3 📌 0

Just 10 days after o1's public debut, we’re thrilled to unveil the open-source version of the technique behind its success: scaling test-time compute

By giving models more "time to think," Llama 1B outperforms Llama 8B in math—beating a model 8x its size. The full recipe is open-source!

16.12.2024 21:42 👍 83 🔁 18 💬 4 📌 2

Solving N equations in N unknowns is analogous to the interpolation threshold. Since there is exactly one solution, it has to fit any noise in the data. These are the shackles. Having fewer or more unknown parameters gives freedom to avoid overfitting.

4/4

20.10.2024 15:25 👍 1 🔁 0 💬 0 📌 0

The spike in error happens at the interpolation threshold when the number of parameters in the model (same as number of columns for my regression) equals the number of examples (rows). Double descent follows.

3/4

20.10.2024 15:25 👍 0 🔁 0 💬 1 📌 0

I create double descent with a few lines of sklearn code. I fit linear regression on data sampled with different “parameterization ratios,” (# examples / # parameters), allowing me to control exactly where the interpolation threshold causes the error spike before descent.

2/4

20.10.2024 15:24 👍 0 🔁 0 💬 1 📌 0

How double descent breaks the shackles of the interpolation threshold Insights for deep learning from solving N equations with N unknowns

Double descent enables a chat bot with a billion parameters to perform well and not overfit. But how does double descent work? I use simulations fitting linear regressions, plots, and tables for solving systems of equations to build intuition.
medium.com/@jeanimal/ho...

20.10.2024 15:23 👍 1 🔁 1 💬 1 📌 0

Jean Czerlinski Ortega

Latest posts by Jean Czerlinski Ortega @jeanimal