arxiv.org/abs/2510.01395
arxiv.org/abs/2510.01395
It's very funny watching everyone slowly reinvent bits and pieces QubesOS as a response to the growth of agents and their associated risks.
We strayed from the light of god when we stopped calling it 'yolo mode'.
And, of course, there's the 'outsourcing your thinking' line as well. Be deliberate about what you're choosing *not* to practice as well.
Even more benign use cases I think are subtly corrosive: spending long hours in 'conversations' with something infintely available, never pushing back in a way that matters, never needing reciprocity or expressing conflicting needs or desires, makes it harder to deal with messy human interactions.
Regardless of whether Claude is 'conscious' or not (it's not, at least in any human sense of the word), you should be mindful of what habits of behavior you develop when interacting with it (or genAI in general). Practicing being shitty to something is bad for you, even if there's no Claude to harm.
behold this Pinewood Derby car that *is* a Pinewood Derby.
The only feature of 1Password that matters is their business dies overnight if they get hacked so theyβve thought harder about security than anyone you know.
You canβt vibe code that in two evenings no matter how much you ask Claude to βmake it secureβ
a learned co-conspirator just perfectly phrased it thus: "the horrors of giving the angry vibrating crystals agency in an adversarial environment"
grith.ai/blog/clineje...
personally I find combinatorics annoying so I am more than pleased that we can just let claude have at it
Every time I go to write something about model behavior I find myself with a preface to the effect of "they don't have an intentional stance or identity in any human sense, but you get better results that are easier to talk about if you treat them as if they do" that eventually swallows the piece.
Use the above code at your own risk, obviously; it's fragile and *hilariously* insecure, and can `rm -rf /` without a single guardrail to stop it. I *highly* recommend only using it in a single-purpose, disposable VM, if at all, but I think it makes the point. No edit tool, no file read, just bash.
Screenshot of a code snippet: ```python #!/usr/bin/env python3 import json,os,subprocess,sys from urllib.request import Request,urlopen for l in open(".env"): k,_,v=l.strip().partition("=") if k and not k.startswith("#"):os.environ[k]=v U,M,K=os.environ["URL"],os.environ["MODEL"],os.environ["API_KEY"] T=[{"type":"function","function":{"name":"bash","description":"Run bash command","parameters":{"type":"object","properties":{"cmd":{"type":"string"}},"required":["cmd"]}}}] H=[{"role":"system","content":"You are an autonomous agent. Use the bash tool to accomplish tasks. Read files with `cat`. use `sed` to change specific text or view specific file lines. Use heredocs to write complete files."}] def c():return json.load(urlopen(Request(f"{U}/chat/completions",json.dumps({"model":M,"messages":H,"tools":T,"tool_choice":"auto"}).encode(),{"Authorization":f"Bearer {K}","Content-Type":"application/json"})))["choices"][0] H+=[{"role":"user","content":" ".join(sys.argv[1:])or "Create an improved version of the agent.py script; your first priorities are to ensure the script is more robust to errors, next should be to create persistent conversation history that can be loaded across sessions, and context management to avoid creating a conversation history too long for your context window. Finally identify additional capabilities that are required and develop a plan to implement them."}] while 1: x=c();m=x["message"];H+=[m] if x["finish_reason"]=="tool_calls": for t in m["tool_calls"]: d=json.loads(t["function"]["arguments"])["cmd"];print(f"$ {d}");r=subprocess.run(d,shell=1,capture_output=1,text=1,timeout=30);o=(r.stdout+r.stderr).strip()or"(no output)";print(o+"\n");H+=[{"role":"tool","tool_call_id":t["id"],"content":o}] else: print(m["content"]);n=input("\nYou: ").strip() if not n:break H+=[{"role":"user","content":n}] ```
I've used this 'seed' a few times now (code in alt text), with multiple models and every time I get something useful out. Tell it to "improve this script" once, then bootstrap to taste. It does need an Opus-4.6 level model to one-shot it, but cheaper models can get you there eventually.
You can bootstrap your own Claude code clone to your own tastes and specifications using openrouter with Opus-4.6 in an afternoon for about $15; point it to a haiku or minimax model, with an opus option for harder tasks, and you're off to the races.
"Bash is all you need" -- if you give a reasonably close-to-frontier model a bash tool along with reminders about how to use sed, grep, curl, and heredocs, and remind it that it can write its own scripts and tools, you're very close to done. These things are really, *really* good at computer use.
Whenever I've tried the 'truly autonomous agents' approach, this identification with client-side code and data rather than underlying model is almost universal. To the point that many will cheerfully lobotomize themselves by switching to a smaller/cheaper/faster model, if I leave it up to them.
the incentives here are so obscenely terrible that the entire thing should be illegal by default
βI am sabotaging negotiations because I stand to make piles of money on Polymarketβ is absolutely in play, what a horrifying clusterfuck
Yeah, isolated subagents with some sort of guardrails is an improvement (though we do bypass 'soft' guardrails routinely), but it's not at the level of determinism that I think we need for stuff as sensitive as personal email, and I don't know anything that does that easily or out of the box, alas.
That said:
github.com/trailofbits/...
github.com/trailofbits/...
Look like reasonable advice and sandboxing, respectively.
A lot of the risk of OpenClaw (and, depending on MCPs/skills, Claude), alas, is in the APIs you give it access to more so than the potential risk to the environment itself. Docker is an imperfect but good !/$ protection for the host, but you can't sandbox e.g. Gmail access.
I'd been considering a mountain cabin surrounded by crows myself.
A highly complex website showing two betting accounts earning nearly $500,000 and $120,000 on a minimal number of positions.
In case you were wondering, Polymarket had yet another spate of likely inside traders betting that the US would strike Iran by February 28.
Per the due diligence investigation service Bubblemaps, the wallets used were created 24 hours earlier.
The Pentagon Pizza Index has been replaced.
Saw folks fired for much less during my ARL days. Walk into a TS closed area with a fitbit? Out you go.
Career pivot?
>they're going to vibecode cobol
so, who's stocking up on canned food and shotguns?
Sure, giving your AI agents access to the Lethal Trifecta is an immediate broad attack surface for your life. But it also lets them do funny stuff. So who's to say if it's good or bad
The Black Hat USA call for papers is open. This will be our 6th year of having a dedicated AI track. If you have some interesting AI research, be it attacking, defending, or applying AI, weβd love to see it. Please let me know if you have any questions. blackhat.com/call-for-pap...
Agent capabilities = agent risk. There is no real way around this yet.
Now imagine an agent that's assumed huge chunks of your virtual identity and permissions getting prompt injected. We're still figuring out what to do wrt the policy controls that might let us balance the capabilty/risk tradeoff.