Trending
Dylan's Avatar

Dylan

@dylancastillo.co

About: dylancastillo.co Projects: dylancastillo.co/projects

35
Followers
103
Following
33
Posts
13.03.2024
Joined
Posts Following

Latest posts by Dylan @dylancastillo.co

Post image

AI-powered team collaboration

15.05.2025 10:39 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

All the latest models keep breaking SOTA coding benchmarks but I’m not sure if they’re that much better.

The only thing I’m sure about is that I’m rarely able to ask something without getting an overengineered solution and a random new README in my codebase.

24.04.2025 07:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I was wrong to trash React and other frontend frameworks for the past few years.

Once a project is big enough, they definitely make you more productive vs vanilla, htmx, etc

But I'm happy that I didn't switch earlier, writing frontend code without AI tools must be horrible.

22.04.2025 07:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Somehow, my most random side project made it to the FT πŸ˜…

It's a quick test designed to assess your estimation skills: estimator.dylancastillo.co/

This is inspired by @codinghorror's great posts: blog.codinghorror.com/how-good-an...

archive.is/qDc0v

18.04.2025 08:53 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The biggest life hack is having a job that feels like a hobby

13.03.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Thank you, I'll update the article!

11.03.2025 13:27 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
2024: Personal Snapshot – Dylan Castillo

It's late but I finally finished my 2024 review.

Last year:

πŸ’΅ I worked on 9 projects with 7 clients. Doubled revenue, costs are up by 155%.
πŸ’» Coded 322 days. Wrote 14 blog posts.
🧠 Struggled with focus. Nearly burned out.
πŸ“Έ Debi tirar mas fotos.

dylancastillo.co/posts/2024-...

11.03.2025 08:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image

I got an email from Google saying that one of my side projects, deepsheet, got 1,000% more clicks.

After a bit of digging, I realized that it was just due to people misspelling "DeepSeek."

There are now people out there who think that China's top AI is a πŸ’© that makes charts.

04.02.2025 08:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

New pinned tab

21.01.2025 08:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Always remember that using a response schema for an LLM is not the same as using one for your API.

Sounds easy, but happens to everyone.

Here's OpenAI breaking the CoT reasoning of an LLM judge.

14.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

Note to self: your only job is not to break the chain.

09.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Here's the full post: dylancastillo.co/posts/gemin...

and the github code: github.com/dylanjcasti...

07.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

In any case, for me, the key takeaway is that SO can decrease (or increase!) the performance in some tasks. Be conscious of that.

For now, there are no clear guidelines on where each method works better.

Your best bet is testing your LLM running your own evals.

07.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

So, if you only consider constrained decoding (JSON-Schema), performance decreases across the board vs. NL.

Given this result and the key sorting issue, I'd suggest avoiding using JSON-Schema, unless you really need to. JSON-Prompt seems like a better alternative.

07.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Still, I could workaround the issue and re-run the benchmarks. NL and JSON-Prompt are tied.

But JSON-Schema performed worse than NL in 5 out of 6 tasks in my tests. Plus, in Shuffled Objects, it did so with a huge delta: 97.15% for NL vs.Β 86.18% for JSON-Schema.

07.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

There's a propertyOrdering param documented in Vertex AI that should solve this: cloud.google.com/vertex-ai/g...

But it doesn't work in the Generative AI SDK. Other users have already reported this issue.

For the benchmarks, I excluded FC and used already sorted keys for JSON-Schema.

07.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Before generation, they reorder the schema keys. SO-Schema does it alphabetically and FC does it in a random manner (?). This can break your CoT.

You can fix SO-Schema by being smart with keys. Instead of "reasoning" and "answer" use something like "reasoning and "solution".

07.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Gemini has 3 ways of generating SO:

1. Forced function calling (FC): ai.google.dev/gemini-api/...
2. Schema in prompt (SO-Prompt): ai.google.dev/gemini-api/...
3. Schema in model config (SO-Schema): ai.google.dev/gemini-api/...

SO-Prompt works well. But FC and SO-Schema have a major flaw.

07.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Found 2 big issues with Gemini's structured outputs (SO):

1. Using constrained decoding seems to lower performance in reasoning tasks.
2. The Generative AI SDK can break your model's reasoning.

Just re-ran Let Me Speak Freely benchmarks with Gemini and got some interesting news

07.01.2025 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Pi by Darren Aronofsky

02.01.2025 08:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Learn how to build AI products now

Here's the post with all the code required to replicate the results: dylancastillo.co/posts/say-w...

Once or twice per month I write a technical article about AI here: subscribe.dylancastillo.co/

12.12.2024 10:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I’m not saying you should default to unstructured outputs. In fact, I usually go with structured.

But it’s clear to me that neither structured nor unstructured outputs are always better, and choosing one or the other can often make a difference.

Test things yourself. Run your own evals and decide.

12.12.2024 10:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Then I switched to GPT-4o-mini, using LMSF's results as a reference.

Tweaked the prompts and improved all LMSF metrics except for NL in GSM8k.

GSM8k and Last Letter looked as expected (no diff).

But in Shuffled Obj. unstructured outputs clearly surpassed structured ones.

12.12.2024 10:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

I began by replicating .txt's results using LLaMA-3-8B-Instruct (the model considered in the rebuttal).

I was able to reproduce the results and, after tweaking a few minor prompt issues, achieved a slight improvement in most metrics.

12.12.2024 10:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Structured outputs can decrease LLM's performance in some tasks

I replicated @willkurt.bsky.social / @dottxtai.bsky.social rebuttal of Let Me Speak Freely? (LMSF) using gpt-4o-mini

The rebuttal correctly highlights many flaws with the original study, but ironically, LMSF's conclusion still holds

12.12.2024 10:30 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Me after using ChatGPT to reproduce and patch a security vulnerability in a package downloaded 1 million times per month.

10.12.2024 08:30 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Good stuff! Will be useful soon. I'm about to jump ship from Poetry but old habits die hard.

04.12.2024 10:18 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

ML is a subset of AI

30.11.2024 14:34 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I believe you're the one creating the strawman.

People are lynching a researcher for publishing a dataset of publicly available data that, if anything, will be used to improve this same social network where they're doing the lynching.

I'm trying to make clear that AI has tons of positive use cases

30.11.2024 14:33 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0