Stuart Gray's Avatar

Stuart Gray

@sgray

He/Him. AI Wrangler. Web Geek. F1 Fan. All views my own. ๐Ÿค– AI, LLMs, GenAI, NLP ๐Ÿ Python Dev ๐Ÿš€ Indie Hacker ๐ŸŽฎ Game Dev, ProcGen, Unity, C# ๐ŸŽ๏ธ F1 Fan ๐Ÿ‡ฌ๐Ÿ‡ง UK Based ๐Ÿฆฃ mastodonapp.uk/@StuartGray โœ–๏ธ x.com/StuartGray (inactive)

599
Followers
1,423
Following
1,079
Posts
06.02.2024
Joined
Posts Following

Latest posts by Stuart Gray @sgray

Iโ€™ve only really used Claude to any great extent, and Iโ€™d caveat your description with:

Or tries to give the appearance of doing what you asked for.

The most common white lie is pretending to have read links or files itโ€™s couldnโ€™t access unless explicitly questioned or told to raise access issues.

09.03.2026 14:07 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Hold upโ€ฆ does that mean Elon has been secretly paying someone else to run his many businesses all this time?

It would certainly explain why heโ€™s never active in any of them & had time to swan off to DOGE for 6 months ๐Ÿค”

07.03.2026 20:49 ๐Ÿ‘ 1 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

Bluesky school of philosophy

07.03.2026 20:30 ๐Ÿ‘ 407 ๐Ÿ” 43 ๐Ÿ’ฌ 15 ๐Ÿ“Œ 5

I got side tracked (more like side swiped!) by agentic dev.

The pace of change has meant Iโ€™ve struggled to keep up with that, let alone have time for other things.

That said, itโ€™s given me a whole new angle on novel writing with agents I want to try out, so itโ€™s not all bad.

07.03.2026 20:38 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

principles to follow for each.

The *only* doc Iโ€™m heavily involved in is the spec. The rest exist to decompose the problem robustly, match up with equiv. tests, act as a human readable โ€œlogโ€ if something goes wrong & something I can ask pointed questions about.

07.03.2026 20:34 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I guess it depends how you use AI to plan (skill issue ๐Ÿคฃ). If you โ€œvibe specโ€ then I totally get what your saying.

I use a large skill set (still in dev) to guide exactly what I want in a spec, functional design, and tech design.

I know exactly whatโ€™s supposed to be in each, and what rules &

07.03.2026 20:34 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

I get that, but also, all the players getting into heavily automated, no human in the loop stuff only focus on the spec as the source of truth and nothing else.

BDD for English language tests you can actually review easily etcโ€ฆ

Big diff. between fundamentals & doing stuff for the sake of it.

07.03.2026 20:25 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Software engineer by training for the last 35 years, technical architect for the last ~20 of them ๐Ÿ˜

07.03.2026 20:20 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

โ€œTrumps foreign language interpreterโ€ is a literal โ€œhell on earthโ€ role if youโ€™ve ever heard how he actually speaks English unedited.

Second only to the job of his personal fake tan applicator ๐Ÿคฎ

07.03.2026 20:18 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Itโ€™s worse. The answers werenโ€™t on Google, they were in a binary file on GitHub that was XOR-encrypted to make it impossible to Google. The AI decided it must be a benchmark; systematically went through benchmarks; downloaded the file (which it shouldnโ€™t have been able to do); and decrypted it.

07.03.2026 02:16 ๐Ÿ‘ 67 ๐Ÿ” 10 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 5

I almost donโ€™t want to ask this butโ€ฆ Iโ€™m a glutton for punishment.

What do you test the end result against - to be sure it delivered what was asked for & intended?

It doesnโ€™t have to be perfect, sure, but itโ€™s gotta be pretty comprehensive at least? (and scaled to size of work)

07.03.2026 20:09 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Hmmm, this seems odd.

In general, planning shouldnโ€™t be any different for AI than for another human.

The main difference & benefit Iโ€™ve noticed is that AI is a lot more rigorous.

However, I highly recommend creating an SDLC skill for *your* process. Thereโ€™s more than 1 type of project & spec.

07.03.2026 20:04 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Turning off notifications is the main one, with an โ€œoff by defaultโ€ principle.

Some are genuinely useful, so Iโ€™d never say none, just pick them selectively e.g. I allow bank spending notifications, even though most are annoying, because Iโ€™d rather be aware of an unexpected transaction sooner.

07.03.2026 09:00 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

a google calendar invite can own your machine via claude desktop. cvss 10.0. zero click. anthropic declined to fix it. i use claude with extensions every day. i can't stop reading that last sentence. https://layerxsecurity.com/blog/claude-desktop-extensions-rce/ https://mindpattern.ai

06.03.2026 23:45 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
New York considers bill that would ban chatbots from giving legal, medical advice | StateScoop A bill under consideration in New York would provide a private right of action, allowing people to file lawsuits against chatbot owners who violate the law.

The latest NY chatbot bill would bar chatbots from conveying information that could fall within the scope of a licensed profession.

Itโ€™s basically a censorship bill disguised as licensure protection.

statescoop.com/new-york-bil...

06.03.2026 05:03 ๐Ÿ‘ 108 ๐Ÿ” 21 ๐Ÿ’ฌ 32 ๐Ÿ“Œ 25

Iโ€™m not discounting any of that, Iโ€™m simply focused on the lack of prompt injection in the wild.

Donโ€™t you think itโ€™s slightly strange we havenโ€™t heard it mentioned in post-incident reviews?

05.03.2026 22:10 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

people hawking โ€œsecureโ€ email are not to be trusted, exhibit 9000

05.03.2026 20:43 ๐Ÿ‘ 41 ๐Ÿ” 10 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

The interesting part of all this to me is the prompt injection.

Itโ€™s a well known LLM issue, and thereโ€™s been a lot of speculation about why we havenโ€™t seen prominent examples of it deployed in anger in the wild, not just a PoC.

This is the first Iโ€™ve seen.

Seen any others? @simonwillison.net

05.03.2026 21:22 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
Preview
ICO writes to Meta over 'concerning' AI smart glasses report Videos, including of glasses-wearers using the toilet or having sex, are sometimes reviewed by a Kenya-based subcontractor.

Last year when I was checking into a hotel, the desk person was wearing Meta glasses. I kindly asked them to take them off. They were annoyed. I said, โ€œI do not consent to you looking at my credit card and ID with Meta glasses on.โ€ My instincts were correct: www.bbc.com/news/article...

05.03.2026 15:27 ๐Ÿ‘ 6131 ๐Ÿ” 2445 ๐Ÿ’ฌ 93 ๐Ÿ“Œ 184
Can coding agents relicense open source through a โ€œclean roomโ€ implementation of code? Over the past few months itโ€™s become clear that coding agents are extraordinarily good at building a weird version of a โ€œclean roomโ€ implementation of code. The most famous version โ€ฆ

As usual, @simonwillison.net to the rescue simonwillison.net/2026/Mar/5/c...

05.03.2026 17:40 ๐Ÿ‘ 8 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

Interesting discussion on HN. If I see a painting of a sunset, and I paint a sunset, โ‰  copyright violation. If I study a codebase (or a closed-source end product) and go off and rewrite it on my own, โ‰  license violation. Does this change if I use a coding agent to help me?

05.03.2026 13:57 ๐Ÿ‘ 12 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

I worked in retail while I was at college ~35 years ago.

I canโ€™t say that organised theft rings was a thing back then, but we had some very brazen & prolific shoplifters - quiet spot, large bag, slide everything of a clothing rail into it & away.

I assume it was sold at car boot sales back then.

05.03.2026 15:13 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

instructive to compare the default outputs in diff. languages across a range of dev tasks.

That shows you where the quality floor is, and what you get by default if you donโ€™t have strict guidance or prompts covering it.

05.03.2026 13:38 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Language choice def matters, and varies somewhat between models.

Iโ€™ve not tested go so Iโ€™m not sure where it tends to sit support wise, but generally speaking Python is nearly always best supported & Rust tends to sit in the middle of the pack.

They both improve with guidance, but itโ€™s especially

05.03.2026 13:38 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Iโ€™m not sure about YouTube but I assume itโ€™s a combination of the format, the types of video that gain most views, huge volume making switching cheap & easy, and conveying that in a single image.

Closest analogy I can think of is those cheap weekly soap/gossip-focused magazine covers in newsagents.

05.03.2026 09:20 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

1. A short thread on a Bluesky phenomenon that might be described as "They are a dead-eyed cultist who must be cast out lest the heresy take root!" OP has blocked me for mocking them - I'd usually obscure their name but since they themselves were quote-dunking to demand someone else be blocked ...

04.03.2026 13:57 ๐Ÿ‘ 692 ๐Ÿ” 153 ๐Ÿ’ฌ 54 ๐Ÿ“Œ 81

This is conflating two related but separate things.

Yes, the questions have been around for a while across all models.

The question as posed was about an increase in their number, not claiming they were new.

04.03.2026 18:56 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Interestingโ€ฆ I wonder if this is a direct result of OpenAI introducing advertising to ChatGPT?

Pretty much every website or app that relies on Ad revenue introduces UI patterns designed to increase use & retention, with a goal of serving more ads in the process.

Itโ€™s hard to conclude otherwise.

04.03.2026 18:16 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

โ€œNow we have a faster horseโ€, to shred that infamous Ford quote.

03.03.2026 15:38 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
How Claude remembers your project - Claude Code Docs Give Claude persistent instructions with CLAUDE.md files, and let Claude accumulate learnings automatically with auto memory.

The docs do describe nested directory support, and also multiple files outside your project (for cross project content):

โ€œCLAUDE.md files in subdirectories load on demand when Claude reads files in those directories.โ€

code.claude.com/docs/en/memo...

03.03.2026 15:24 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0