Gregory Marton's Avatar

Gregory Marton

@gregory-marton

GenAI adjunct at Tufts, ft dad, cs tutor. https://www.seidellmarton.us/gremio https://www.linkedin.com/in/gregory-marton/

536
Followers
1,264
Following
63
Posts
21.10.2024
Joined
Posts Following

Latest posts by Gregory Marton @gregory-marton

Post image

🚨 New paper alert 🚨

Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? πŸ€” Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️

1/n 🧡

27.05.2025 13:59 πŸ‘ 28 πŸ” 17 πŸ’¬ 1 πŸ“Œ 1
Post image

Do LLMs Think Like Humans?

They find that,

While LLMs achieve broad categorical alignment with human judgment, they falter in capturing fine-grained semantic nuances such as typicality and, critically, exhibit vastly different representational efficiency profiles.

26.05.2025 14:19 πŸ‘ 43 πŸ” 10 πŸ’¬ 4 πŸ“Œ 2
Preview
Are You Smarter Than A.I.? Some experts predict that A.I. will surpass human intelligence within the next few years. Play this puzzle to see how far the machines have to go.

Kudos to @nytimes.com for covering ARC-AGI in such an exquisite example of interactive data journalism. Amazing spot for @fchollet.bsky.social as well.
www.nytimes.com/interactive/...

01.04.2025 00:41 πŸ‘ 40 πŸ” 6 πŸ’¬ 1 πŸ“Œ 1
Post image

The funny thing about multimodal image generation as released in the last week by Google and OpenAI is that now LLM image generation works like how most people using LLMs for the past two years always thought LLM image generation works.

26.03.2025 01:17 πŸ‘ 77 πŸ” 6 πŸ’¬ 1 πŸ“Œ 0

It ought to be a property of a world model, but a language model models language. We use language capacities to describe counterfactuals, argue about facts, and imagine all the time. What assistants are lacking is any reasonable kind of world model. We need to marry the stochastic to the symbolic.

16.03.2025 10:13 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

If you have used LLM image generators, you know they are hard to control: LLM had to send a prompt to a separate image generation tool, it did not not make the image.

Gemini is the first public release of a full multimodal LLM that can directly make images. This allows the systems to do detail work

13.03.2025 00:49 πŸ‘ 121 πŸ” 19 πŸ’¬ 5 πŸ“Œ 1
A histogram showing the distribution of numeracy scores for students in the U.S. vs. other benchmark countries. The histogram uses an icon array made up of little figures of people, using the `weepeople` font.
The chart has the title "U.S. Numeracy Education has room for improvement"

A histogram showing the distribution of numeracy scores for students in the U.S. vs. other benchmark countries. The histogram uses an icon array made up of little figures of people, using the `weepeople` font. The chart has the title "U.S. Numeracy Education has room for improvement"

πŸ“Š #dataviz Putting the people πŸ‘¨β€πŸ‘¨β€πŸ‘§β€πŸ‘¦back into charts that talk about them.

An interesting histogram of numeracy scores for U.S. vs. some other countries,
using Alberto Cairo's [weepeople font](github.com/propublica/w...) to show the people involved in these distributions.
Src: bit.ly/3FrDq0v
πŸ‘‡

09.03.2025 17:48 πŸ‘ 23 πŸ” 7 πŸ’¬ 3 πŸ“Œ 1
Preview
COLUMN: Science funding cuts threaten economy If you’ve ever been treated for a medical problem, used a cellphone or a computer, or been excited by robots exploring Mars or gene-editing therapy, then your life has been

As part of the Science Homecoming project I wrote an opinion piece for the Albuquerque Journal on the dangers of cuts to federal science funding:

www.abqjournal.com/opinion/arti...

@sciencehomecoming.bsky.social
@cantlonlab.bsky.social
@spiantado.bsky.social

09.03.2025 18:05 πŸ‘ 128 πŸ” 43 πŸ’¬ 0 πŸ“Œ 2

The public really needs to understand this. Every university system in the world rests on public funding, there has never been an alternative model at any time in history.

We have universities for literally the same reason that we have roads and armies.

08.03.2025 20:02 πŸ‘ 3985 πŸ” 1121 πŸ’¬ 57 πŸ“Œ 45
Post image

1/13 LLM circuits tell us where the computation happens inside the modelβ€”but the computation varies by token position, a key detail often ignored!
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. πŸ§΅πŸ‘‡

06.03.2025 22:15 πŸ‘ 26 πŸ” 8 πŸ’¬ 1 πŸ“Œ 1
Post image Post image Post image Post image

This is a crazy paper. Fine-tuning a big GPT-4o on a small amount of insecure code or even "bad numbers" (like 666) makes them misaligned in almost everything else. They are more likely to start offering misinformation, spouting anti-human values, and talk about admiring dictators. Why is unclear.

25.02.2025 21:01 πŸ‘ 214 πŸ” 43 πŸ’¬ 7 πŸ“Œ 19

Removing the gears part results in better performance, and that's surprising because it feels different from how humans learn. Perhaps relatedly, though anecdotally, telling e.g. an image generator what you didn't like about the previous response results in more, not less, of what you didn't like.

14.02.2025 10:10 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Screenshot of a twitter post showing that the latest openAI commercial model is better than previous models at doing arithmetic but still cannot reliably produce the correct answer of multiplication problems with values greater than 11 x 11.  It's supposed to be impressive I think

Screenshot of a twitter post showing that the latest openAI commercial model is better than previous models at doing arithmetic but still cannot reliably produce the correct answer of multiplication problems with values greater than 11 x 11. It's supposed to be impressive I think

you fucked up a perfectly good computer is what you did. look at it. it's got innumeracy

12.02.2025 19:36 πŸ‘ 3175 πŸ” 625 πŸ’¬ 109 πŸ“Œ 153
Preview
β€œThis was CS50”: Yale ends largest computer science course After a decade of partnership with Harvard, Yale’s CS50 course will no longer be offered starting in fall 2025 due to limited funding and an expanding computer science department.

Β«Kim pointed to newer introductory offerings such as β€œPython for Humanities and Social Sciences,” β€œAI for Future Presidents” and β€œC Programming Language and Linux.”» and it's still available free online www.edx.org/cs50
Love the homage to Richard Muller, too!

06.02.2025 09:47 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Are linguists paying a lot of attention to LLMs? Because this seems like a fascinating finding with large implications: LLMs share highly abstract grammatical concept representations, even across unrelated languages, so even models trained mostly on English do well in other languages.

06.02.2025 02:24 πŸ‘ 165 πŸ” 13 πŸ’¬ 11 πŸ“Œ 3

Tech oligarchs made their fortunes thanks in large part to government funded research done by scientists based in universities. The tech industry’s complicity in dismantling these govt agencies and higher ed is not only immoral, it’s also shortsighted. Where will new science breakthroughs come from?

05.02.2025 22:46 πŸ‘ 38 πŸ” 4 πŸ’¬ 5 πŸ“Œ 0
Post image

We launched a bunch of Gemini 2.0 models today. Compared to the 1.5 series models, each of the 2.0 models is generally better than the "one size up" model in the 1.5 series.

2.0 Flash & Flash-Lite set new standards in the quality/cost Pareto frontier.

More details:
blog.google/technology/g...

05.02.2025 23:45 πŸ‘ 94 πŸ” 14 πŸ’¬ 4 πŸ“Œ 0

open-Deep-Research by huggingface
as posted by @aymeric-roucher.bsky.social

An entirely open agent that can: navigate the web autonomously, scroll and search through pages, download and manipulate files, run calculation on data...

04.02.2025 20:46 πŸ‘ 13 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
man stepping on rake labelled 4o

man skatedboarding down stairs on a rake before getting hit in face labelled o3-mini-high

man stepping on rake labelled 4o man skatedboarding down stairs on a rake before getting hit in face labelled o3-mini-high

03.02.2025 21:28 πŸ‘ 237 πŸ” 18 πŸ’¬ 6 πŸ“Œ 0
Preview
GitHub - exa-labs/exa-deepseek-chat: A simple open-source chat app that uses Exa's API for web search and Deepseek R1 for reasoning A simple open-source chat app that uses Exa's API for web search and Deepseek R1 for reasoning - exa-labs/exa-deepseek-chat

Exa & Deepseek R1 Chat App

Exa & Deepseek Chat App is a free and open-source chat app that uses Exa's API for web search and Deepseek R1 LLM for reasoning.

github.com/exa-labs/exa...

01.02.2025 08:01 πŸ‘ 12 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
End of Term Web Archive – Preserving the Transition of a Nation | Internet Archive Blogs

The Internet Archive has to date downloaded 500 terabytes of US government websites, which it crawls at the end of every presidential term. The whole archive is fully searchable. This effort's housed by a donation-funded nonprofit, not a branch of the US government. blog.archive.org/2024/05/08/e...

01.02.2025 00:58 πŸ‘ 32971 πŸ” 12181 πŸ’¬ 484 πŸ“Œ 580
Text Shot: Professor Karsten emphasized the potential global impact of this development, noting that if major tech companies like Amazon, Google, and Meta choose to implement this method in their data centers, it could lead to savings of gigawatt-hours of energy worldwide.

Text Shot: Professor Karsten emphasized the potential global impact of this development, noting that if major tech companies like Amazon, Google, and Meta choose to implement this method in their data centers, it could lead to savings of gigawatt-hours of energy worldwide.

Researchers claim Linux kernel tweak could reduce data center energy use by 30% https://www.techspot.com/news/106501-linux-kernel-upgrade-promises-up-30-energy-savings.html #AI #climate

30.01.2025 01:31 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

As someone who has reported on AI for 7 years and covered China tech as well, I think the biggest lesson to be drawn from DeepSeek is the huge cracks it illustrates with the current dominant paradigm of AI development. A long thread. 1/

27.01.2025 14:12 πŸ‘ 6161 πŸ” 2359 πŸ’¬ 211 πŸ“Œ 722
The image depicts a monumental statue of Buddha, emphasizing serenity and grandeur. The statue's intricate design captures traditional Buddhist features, including a meditative posture with hands placed in a symbolic gesture, flowing robes, and a calm facial expression exuding peace. The perspective highlights the statue's immense size against a minimalistic white sky background, underscoring its significance as a spiritual and cultural landmark.

The image depicts a monumental statue of Buddha, emphasizing serenity and grandeur. The statue's intricate design captures traditional Buddhist features, including a meditative posture with hands placed in a symbolic gesture, flowing robes, and a calm facial expression exuding peace. The perspective highlights the statue's immense size against a minimalistic white sky background, underscoring its significance as a spiritual and cultural landmark.

Explainer: What's R1 and Everything Else

This is an attempt to consolidate the dizzying rate of AI developments since Christmas. If you're into AI but not deep enough, this should get you oriented again.

timkellogg.me/blog/2025/01...

26.01.2025 03:17 πŸ‘ 116 πŸ” 27 πŸ’¬ 5 πŸ“Œ 13

I'm not sure if people realize how quickly the Trumpzis can do enormous damage to US science, from basic research to translation. Really fast. REALLY fast. Labs with decades of irreplaceable domain and technique knowledge can break apart with a surprisingly short funding gap. When they're gone...1/

23.01.2025 22:13 πŸ‘ 942 πŸ” 319 πŸ’¬ 21 πŸ“Œ 50
Post image Post image

Next big thing for brands: knowing what sites agents prefer.

If you ask for stock prices, Claude with Computer Use goes to Yahoo Finance while Operator does a Bing search

Operator loves buying from the top search result on Bing. Claude has direct preferences like 1-800-Flowers

We don't know why

24.01.2025 02:18 πŸ‘ 75 πŸ” 7 πŸ’¬ 7 πŸ“Œ 6

Worth also pointing out that there are many "tests so easy no AI system can pass them".

Moravec's paradox remains.

E.g., arxiv.org/abs/2404.12390

23.01.2025 16:54 πŸ‘ 115 πŸ” 35 πŸ’¬ 7 πŸ“Œ 2
Video thumbnail

The new ability of AI video creators to add real people and products to scenes with just an image is likely to increase the utility (& more worryingly, misuse) of AI video.

Here I made Shakespeare at a cafe and the Girl with the Pearl Earring piloting a mech (just as Vermeer intended)

22.01.2025 17:28 πŸ‘ 73 πŸ” 8 πŸ’¬ 5 πŸ“Œ 0

In December, I posted about our new paper on mastering board games using internal + external planning. πŸ‘‡

Here's a talk now on Youtube about it given by my awesome colleague John Schultz!

www.youtube.com/watch?v=JyxE...

17.01.2025 17:26 πŸ‘ 35 πŸ” 11 πŸ’¬ 1 πŸ“Œ 0