"Give me 3 recommendations."
If the model only has enough info for 2, it'll hallucinate the third. Confidently.
Confidently wrong is the worst kind of wrong.
"Give me your top recommendations (typically 2-4)" β that one change fixed it.
@jenny-ouyang
Software Engineer | Technical Writer | Solo AI Maker 2nd Product @ https://quickviralnotes.xyz/ 3rd Product @ https://www.substackexplorer.com/ Personal Website: https://www.jennyouyang.dev/ Newsletter: https://jennyouyang.substack.com/
"Give me 3 recommendations."
If the model only has enough info for 2, it'll hallucinate the third. Confidently.
Confidently wrong is the worst kind of wrong.
"Give me your top recommendations (typically 2-4)" β that one change fixed it.
AI doesn't get smarter with more input. It gets dumber when you overload it.
Someone gave their AI agent admin access to the production database "just in case."
That agent can now DROP tables.
One wrong query from a resume-generating event.
Least-privilege isn't optional just because the operator is an LLM.
Everyone's connecting Opik or Langfuse.
Nobody's figuring out what to actually evaluate.
That's not observability. That's logging with extra steps.
Binary criteria that can't be misinterpreted. That's the real game.
My first AI-built app broke in users' hands.
So I created a production-ready checklist. Then a whole playbook.
Now I hand AI the system and it catches what it used to miss: security holes, edge cases, real-world scenarios.
The most powerful thing about SKILLs is not prompt reusability.
It's that skills tell agents HOW and WHEN to use tools.
MCP gives agents tools. SKILLs give agents judgment.
I'm offloading my jobs to my AI agents.
One by one.
Today, my agent autonomously updated my personal website.
What it did:
- Retrieved my newest Substack posts
- Digested them into semantic search
- Decided products to showcase
Stop collecting certifications. Start building Proof of Work.
Pick one table. Rewrite its description so an AI agent can understand it. Purpose, grain, common queries, edge cases.
More valuable than any LinkedIn badge.
Today I built an agent that uses MCP tools AND becomes an MCP tool itself.
Agent β calls 6 MCP tools
AgentOS β wraps it as MCP server
Other agents β call YOUR agent as one tool
Composable agents. Wild.
My go-to prompt for every new AI conversation:
"Before doing anything, ask me questions until you're 95% confident you understand what I need."
Flips the burden. AI asks for exactly what it needs. Output quality is night and day.
There is no best AI model.
Only the right teammate for the task.
Most people won't automate their actual workflow with AI. They'll automate a demo.
3-question filter for automation worth building:
1. Boredom Test β what bored you this week?
2. Pain Test β what breaks if you stop?
3. Sentence Test β "I take X, do Y, produce Z"
Start with the boring one.
Testing scheduled posting from CrossPost MCP. If you're seeing this, the cron job works.
Hi, this is a test post from CrossPost MCP via Claude Code
Hi, this is a test post from CrossPost MCP via my Claude Desktop
Hi, this is a test post from crosspost mcp via my Cursor chatbox
hi, this is a test posting from crosspost :)
Ever built something with AI that worked perfectlyβ¦ until it didnβt?
@jenny-ouyang.bsky.social breaks down why that happens and how a few old-school software engineering habits can prevent 80% of AI build failures.
Smart, practical, and grounded in real experience.
open.substack.com/pub/codelike...
Sorry I really should check bsky more often... to your question, I do use API, some I just use my sessions.
I ask ChatGPT to guide me through the settings of n8n :)
Still hard: messy data, security, cost, culture.
Future = ultra-long context + RAG cache + MCP agents, powered by AI-native knowledge graphs.
Which solution would help your workflow first?
Breakthrough 2 β Agent workflows: frameworks wire LLMs to plan steps, call APIs and loop until done.
Breakthrough 3 β MCP: a lightweight JSON/HTTP βsocketβ any tool can expose, so agents plug in without bespoke wrappers.
Early LLMs wowed demos but choked on real-world jargon.
* Prompt-stuffing hit context limits
* Fine-tuning hit GPU bills + data-privacy walls.
Then came quick breakthroughs:
Breakthrough 1 β RAG: let models βopen-bookββretrieve company docs on demand, then answer.
Breakthrough 2 β
What makes you say it's getting more stable just by looking at the graph?
I'm trying to set up a one-click scheduling system using n8n, so I can plan all my social posts across platforms, including Bluesky, to go out at the same time on the selected dates.
This is playing with n8n automation last try
This is my automation test 4
This is playing with n8n test 11
Testing from n8n, posting to Bluesky!
Why do you say so?
You probably heard about the US stock market plummeting due to DeepSeek. π₯π
Then I find outβ¦ DeepSeek is under serious attack! π¨
How do I know? Bug reports started rolling in for quickviralnotes.xyz, where Iβve implemented DeepSeek API.
Wild times. Stay with fallbacks. π
#indiedev #buildinpublic