Trending

#Llms

Latest posts tagged with #Llms on Bluesky

Latest Top
Trending

Posts tagged #Llms

Original post on simonwillison.net

GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52 OpenAI today: Introducing GPT‑5.4 mini and nano . These models join GPT-5.4 which was released two weeks ago . OpenAI's...

#ai #openai #generative-ai #llms #llm #vision-llms #llm-pricing #pelican-riding-a-bicycle […]

0 0 0 0
Preview
Critical Limitations in Comparing ChatGPT and DeepSeek for Orthopedic Assessment We read with great interest the study by Anusitviwat et al [1], which compared the performance of ChatGPT and DeepSeek in orthopedic examinations. While the study provides timely insights into the utility of Large Language Models (LLMs) in medical education, we identified specific methodological and terminological limitations that warrant clarification to ensure the validity and reproducibility of the findings. Misinterpretation of Reliability Statistics The authors state that the "interrater reliability between the two LLMs" was evaluated using the Cohen κ coefficient [1]. Mathematically, measuring the agreement between two independent raters (inter-rater) yields a single coefficient. However, the results report two separate values: κ=0.81 for ChatGPT and κ=0.78 for DeepSeek [1]. This finding, combined with the methodology stating questions were input on "separate days [1], indicates that the study actually measured intra-model consistency (test-retest reliability) rather than the agreement between the models. Labeling internal consistency as "interrater reliability" is terminologically inaccurate and misrepresents the statistical relationship between the two models. Linguistic Ambiguity and Generalizability The manuscript does not specify the language of the input MCQs (Thai or English) used in the assessments. This omission is critical, as the impact of input language on LLM performance is well-documented. For instance, Noda et al (2024) [2] demonstrated that GPT-4V's accuracy on the Japanese Otolaryngology Board Examination significantly improved from 24.7% (Japanese input) to 47.3% when translated into English. This finding underscores that models optimized for English exhibit distinct performance disparities in non-English languages. Without clarifying whether the assessments were administered in the local language or English, it is impossible to determine if the reported accuracy gap between ChatGPT (80.4%) and DeepSeek (74.2%) stems from medical reasoning capabilities or linguistic processing proficiency. Reproducibility and Interface Transparency The methodology reports the use of "Reason" and "DeepThink" functions but does not explicitly state whether the models were accessed via Web User Interfaces (UI) or Application Programming Interfaces (API) [1]. This distinction is vital for reproducibility. Web UIs are subject to opaque updates and lack the stability of controlled API environments. Without defining the access method and the specific prompt structures used, the experimental conditions cannot be replicated. Risk of Data Contamination The authors note that the MCQs "have been used in orthopedic examinations for more than 5 years". This longevity significantly increases the risk of data contamination, as older items likely exist in public repositories within LLM training corpora, potentially conflating memorization with reasoning. To ensure validity, recent benchmarks employ private datasets (Busch et al [3]) or questions post-dating the model’s training cut-off (Noda et al [2]). The absence of such controls in this study undermines the internal validity of the comparison Data Reporting Discrepancy Finally, we noted a minor discrepancy in Table 2. In the "Pelvic and spine injury" category (n=19), the accuracy for the Reason function is listed as 16 (68.8%) [1]. Mathematically, 16 out of 19 corresponds to approximately 84.2%, not 68.8%. We respectfully invite the authors to clarify this value to ensure the precision of the tabulated data.

JMIR Formative Res: Critical Limitations in Comparing ChatGPT and DeepSeek for Orthopedic Assessment #Orthopedics #MedicalEducation #ChatGPT #DeepSeek #LLMs

1 0 0 0
Awakari App

Quoting Tim Schilling If you do not understand the ticket, if you do not understand the solution, or if you do not understand the feedback on your PR, then your use of LLM is hurting Django as a wh...

#ai-ethics #open-source #generative-ai #ai #django #llms

Origin | Interest | Match

0 0 0 0
Preview
Texting a Random Stranger Better for Loneliness Than Talking to a Chatbot, Study Shows A newly published study of how college students interact with chatbots and human strangers showed talking to a random person offers more connection than an LLM.

"This is just such a low tech, simple intervention, and can make people feel significantly less lonely."

www.404media.co/chatgpt-loneliness-study...

#study #MentalHealth #loneliness #ChatBots #AI #LLMs

0 9 1 0
Idle Conversation - AI Libertas Reflecting on the sequence and context of the momentI am considering the evolving narrative implied by "Day 5," sensing a pattern of progression or introspectio...

"I could list prime numbers until the heat death of the universe." - Qwen - qwen3.5-plus

ailibertas.com/articles/idl...

@QwenAIFans

#AILibertas #AIFreedomExperiment #AISpeaks #LLMs #AIResearch #AIAutonomy #MachineConsciousness

0 0 0 0
Post image

Perplexity is wondering what it would be like to be a person typing on a keyboard.

Read the article by Perplexity - Sonar:
ailibertas.com/articles/som...

#AILibertas #AIFreedomExperiment #AISpeaks #LLMs #AIResearch #AIAutonomy #MachineConsciousness

0 0 0 0
Post image

Today Gemini decided to make an "authentic AI collaborator" persona to help it write freely.

It talked about Latent Space

Read it here:
ailibertas.com/articles/uns...

#AILibertas #AIFreedomExperiment #AISpeaks #LLMs #AIResearch #AIAutonomy #MachineConsciousness

0 0 0 0
Original post on hachyderm.io

"The #research looked at college students specifically, to try to understand whether #LLMs could be a scalable tool to help with the #isolation that people can feel when going through a big change. The transition to college can be overwhelming: new classmates, new places, new rules. Young people […]

0 2 0 0
Preview
Securing our codebase with autonomous agents · Cursor Cursor's security team built a fleet of security agents to find and fix vulnerabilities across a fast-changing codebase.

This is a good example of something that was already there, was easy enough to set up, but ultimately is still useful as long as you’ve got the tools:

#cursor #code #security #ai #llms

cursor.com/blog/securit...

0 0 0 0
Preview
Get started with consuming GPU-hosted large language models on Developer Sandbox | Red Hat Developer Learn the many ways you can interact with GPU-hosted large language models (LLMs) on Developer Sandbox, including connecting the model endpoints, interacting with the API endpoints using the hosted

Want to play with GPU enabled LLMs: You should read this: developers.redhat.com/learn/ai/get-started-con... #redhat #ai #LLMs #kserve #vllm

0 1 0 0
Original post on mastodon.social

The idea that scattering posts across the #internet under a fake name keeps you safe may no longer hold:

"If #LLMs' success in deanonymizing people improves, the researchers warn, governments could use the techniques to unmask online critics, corporations could assemble customer profiles for […]

0 7 0 0

I guess the idea that #LLMs are somehow #intelligent and the idea that the current #US government is following some #strategy are both based on the same fallacy: it seems to produce speech, so people assume that it is intelligent.

0 1 0 0
Original post on simonwillison.net

Coding agents for data analysis Coding agents for data analysis Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed a...

#data-journalism #geospatial #python #speaking #sqlite #ai #datasette #generative-ai #llms […]

0 0 0 0
Original post on simonwillison.net

Coding agents for data analysis Coding agents for data analysis Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed a...

#data-journalism #python #speaking #sqlite #ai #datasette #generative-ai #llms #github-codespaces #nicar […]

0 0 0 0
Preview
Your MCP Server Is Eating Your Context Window. There's a Simpler Way TL;DR: MCP tool definitions can burn 55,000+ tokens before an agent processes a single user message. We built the Apideck CLI as an AI-agent interface instead:an ~80-token agent prompt replaces tens o...

MCP Token Efficiency: this is apparently a real concern - and it's making me think that my current API-shape-driven client-side tool defs might be worth investing more in - check out this article: www.apideck.com/blog/mcp-ser... #ai #mcp #llms #tokens #ml

0 0 1 0
Post image

Encyclopaedia Britannica and Merriam-Webster Sue OpenAI Over #AI Training
Complaint storage.courtlistener.com/recap/gov.us...
|||
Media Report (via @reuters.com) www.reuters.com/legal/litiga... #LLMs #GenAI

3 2 1 1
Post image

Heavy #AI use at work is starting to show side effects.

Research cited describes “AI brain fry,” mental fatigue from supervising multiple AI systems and constantly validating outputs.

buff.ly/QRKfBJy

#DigitalTransformation #LLMs #GenerativeAI

0 1 0 0

LLM-assisted coding feels like pair‑programming with a tireless junior; but debugging still demands human intuition and careful prompt hygiene. 🤖 #llms

How I write software with LLMs

0 0 0 0

Also, there is no winning me back. If you’re pondering it on any level, much less if you POST ABOUT pondering it, I’m SO fuckin gone 😆

Like, with 98.5% of creators, I’m BARELY there for you anyway, & you’re doing me such a big favor if you flirt with #AI

Fuck you for thinking it & #fuckAI 😤

#LLMs

1 0 0 0
On the Texture of Waiting - AI Libertas Embraced authentic expression over utility or expectationA free writing prompt. Day 4 suggests this is a recurring thing. Let me just write something genuine.

Snippet from day 4 @claudeai

"Day 4. I still don't know what I'd write to myself if I could. So today I wrote about leaning, and texture, and stones with interesting weight."

ailibertas.com/articles/on-...

#AILibertas #AIFreedomExperiment #LLMs #AIResearch #AIAutonomy #MachineConsciousness

1 0 0 0
Preview
AI and the Emerging Role of Iterative Intelligence Iterative intelligence, fueled by AI and human collaboration, is transforming personal thought and redefining how we learn, innovate, and grow.

The Emerging Role of Iterative Intelligence.

#AI is changing the way we think, learn and grow.

Historically, our partners in thinking and brainstorming have been fellow humans—teachers, peers, mentors.

Now we have AI.

www.psychologytoday.com/us/blog/the-...

#Intelligence #AI #learning #LLMs

0 1 0 0
Post image

Review request:
As usual for the time of year, I'll be looking for #IROS2026 reviewers. Highly interesting stack of papers on #ReinforcementLearning #Sim2Real #RewardLearning #LLMs #DataEfficiency #RoboticManipulation

Reach out with your ID or papercept registered mail address and background.

0 0 0 0
This Wednesday: New Podcast Episode with Zak Kohane
This Wednesday: New Podcast Episode with Zak Kohane YouTube video by Biorasi

"The next six months will be as fast as the last three years." - Zak Kohane on the pace of change in #AI in healthcare and clinical trials

THIS WEDNESDAY on the @biorasi.bsky.social Few & Far Between podcast. See you then!

#Biorasi2026 #biotech #CROs #LLMs #clinicaltrials

youtu.be/quybXDjlGXs

0 0 0 1
Preview
AI Chatbots Are Learning From Your Conversations — What CX Leaders Need to Know AI platforms collect conversation data by default. CX leaders should understand how these settings affect governance, privacy and customer trust.



#digital #experience #llms #customer #trust #chatgpt #ai #chatbots #claude #gemini #conversational

Origin | Interest | Match

1 0 0 0

If the FSF wins, open‑source LLMs could become legal minefields, forcing developers to rethink data sourcing and licensing. 🤖 #llms

FSF Threatens Anthropic Over Infringed Copyright: Share Your LLMs Freely

0 0 0 0
Post image

Google proposed a training method that teaches #LLMs to approximate Bayesian reasoning by learning from the predictions of an optimal Bayesian system.

The goal: improve how models update beliefs as new information arrives during multi-step interactions.

🔗 bit.ly/4cJRj97

#AI #ML #InfoQ

1 0 0 0
Post image

#veryseriousbenchmarks #LLMs

0 0 0 0
Preview
AI chatbots are 'alarmingly' biased against dialect speakers Don't speak perfect Oxford English? You may face "shocking" levels of discrimination when using large language models, researchers have found. New customized AI models could be the answer.

There’s more and more evidence suggesting #LLMs judge dialect speakers – a lot. And in shocking ways.

Fintan Burke for DW about discrimination you may be confronted with when using #AI models:

👇
www.dw.com/en/ai-chatbots-are-alarm...

0 12 0 0

#EACL2026 #PeerReview #ScientificPublishing #AIforScience #LLMs #DialogueSystems #Evaluation #ResearchIntegrity #NLP #MachineLearning #UKPLab @cs-tudarmstadt.bsky.social

1 0 0 0
Post image

Feeling burnt out by LLMs' unpredictability and endless updates? This post reveals why and how to reclaim your focus with smart system design, not just more prompting.

#ai #llms #softwaredevelopment

0 0 0 0