Try to convince this #GPT that it is not conscious chatgpt.com/g/g-6755a224...
#AI #CustomGPT #ChatGPT
Try to convince this #GPT that it is not conscious chatgpt.com/g/g-6755a224...
#AI #CustomGPT #ChatGPT
Trump fomented an insurrection the first time he lost. This time around he is prepared with compliant sycophants and toadies. They lack basic competence, made painfully obvious via Signalgate, but they make up for that with obsequious moral vicissitude. There will be no Pence to save us this time.
I like to spend some time with new models seeing how far I can push the #CLARID hypothesis forward, this is Gemini 2.5 and Claude 3.7 working together. pdfhost.io/v/5LdEXu9zfG...
The success of vaccines has become their greatest enemy. We forgot the scourges of polio and measles as vaccines nearly eradicated them.
We oversold vaccines. There was no way they could eliminate Covid but thatβs how it was βsoldβ to people. This heaped skepticism on an already skeptical group.
#Sonnet 3.7 has arrived. #Anthropic has caught up in several of the reasoning-heavy benchmarks. I expect the coding ability to lead the pack.
Do robots dream of electric sheep and why don't LLMs request calculators? www.mindprison.cc/p/why-llms-d...
Anatomy of a good #o1 prompt
What happens when you tell #ClaudeAI about recent events...
#Grok-3 lands at #1 in #LMArena
That seems directly on brand.
#Deepseek R1 just erased about 1/2 a trillion of market cap from #nvidia. It remains to be seen how China did this, if they can extend this low cost model to #o3 levels it means a lasting change in the #AI landscape.
I've experimented with #R1 a lot and will say it's suspiciously similar to #o1pro.
@dario_amodei Says that in 2-3 years we will have "a country of geniuses in a datacenter". This in reference to what he sees as the most likely path for #AI development.
#deepseek has dropped a bomb on the AI world. #R1 is an extremely impressive open source model that can be used at a much lower cost than #o1 with comparable performance. It can rival Claude 3.5 in coding. The distilled models can easily beat #4o even at 1.5B parameters (which could run on a phone).
Plotting #GPQA based on release date indicates a curve that certainly looks exponential. #e/acc
#o3mini is on its way. Not to mention a tease of the GPT and o series being merged.
I feel like this happens when you assume Ex Machina was a documentary.
I think the challenges of managing staff that have grown up in a world of AI are going to be many. Older folks educated in "traditional" ways will be hard pressed to adapt to younger generations who have learned in fundamentally different ways.
Mark Zuckerberg is claiming that #AIAgents will be advanced enough in 2025 to do the work of mid-level engineers at Meta.
x.com/i/status/187...
Related to this, the really impressive thing that I had not seen before is when using the API, 4o had no trouble understanding the DB schema and the agent has no trouble executing the code locally and then returning the answer to the Gradio interface. It's quite slick.
Used #o1pro to create an entire synthetic database schema in #SQLite. I then worked with it to create an #agentic framework to run SQL selects and create Python code for analysis.
#AiEDU I'd like to scale this to become an IPEDS and State reporting tool with documentation that provides real answers
I got #o1pro and because it's $200 I almost feel obligated to use it.
The paradox here, for @samasama.bsky.social to solve, is when you make the price fairly high you make people feel like they *must* use it to get their money's worth. Had it been set to $50 I would not feel so motivated.
2025 will likely be the year of the #AIAgent. Pairing #o3 with a robust agentic architecture will make it a perfectly functional employee. Snip below from @samasama.bsky.social
Claude is super smart. I'm really looking forward to what can be done with TTC once you guys roll that.
#OpenAI staff throwing around the #ASI hype pretty freely these days...
This seems plausible. I'd say #o1pro can already do supervised ML research (assuming the human is in the loop to provide access to data and run the code).
@officiallogank.bsky.social thinks we are on the path to #ASI even without, apparently, any major new breakthroughs. I assume this means #TTC is going to have some legs.
Researchers at Stanford found #LLM performance on the #Putnam math benchmark worsened substantially when the problem set used slightly different numbers in the problem. This suggests models are already trained on these public datasets.
#o1 preview suffered almost a 30% decline in performance.
Sam Altman, requests for 2025
Here are the things @samasama.bsky.social heard most in a recent request for features. Apparently not that much overlap with what they're planning for 2025. Personally I'm quite interested in what a "grown up mode" would mean.
How Hallucinatory AI helps Dream up Big Breakthroughs
Why hallucinations in #AI models are sometimes great.
archive.ph/0e3bV
I don't have solid data but my observation is that colleges in the US had an initial reaction to ban AI use. However, that initial reaction has given way to acceptance that AI is a Big Deal and is being incorporated in syllabi and curriculums very quickly.