Simon Willison's Weblog
Description
Feed Activity
Latest Posts
Quoting Artificial Analysis
gpt-oss-120b is the most intelligent American open weights model, comes behind DeepSeek R1 and Qwen3 235B in intelligence but offers efficiency benefits [...] We’re seeing the 120B beat o3-mini but...
No, AI is not Making Engineers 10x as Productive
No, AI is not Making Engineers 10x as Productive Colton Voege on "curing your AI 10x engineer imposter syndrome". There's a lot of rhetoric out there suggesting that if you...
OpenAI's new open weight (Apache 2) models are really good
The long promised OpenAI open weight models are here, and they are very impressive. They're available under proper open source licenses - Apache 2.0 - and come in two sizes,...
Claude Opus 4.1
Claude Opus 4.1 Surprise new model from Anthropic today - Claude Opus 4.1, which they describe as "a drop-in replacement for Opus 4". My favorite thing about this model is...
Quoting greyduet on r/teachers
I teach HS Science in the south. I can only speak for my district, but a few teacher work days in the wave of enthusiasm I'm seeing for AI tools...
A Friendly Introduction to SVG
A Friendly Introduction to SVG This SVG tutorial by Josh Comeau is fantastic. It's filled with neat interactive illustrations - with a pleasing subtly "click" audio effect as you adjust...
ChatGPT agent's user-agent
I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it...
ChatGPT agent triggers crawls from Bingbot and Yandex
ChatGPT agent is the recently released (and confusingly named) ChatGPT feature that provides browser automation combined with terminal access as a feature of ChatGPT - replacing their previous Operator research...
Usage charts for my LLM tool against OpenRouter
Usage charts for my LLM tool against OpenRouter OpenRouter proxies requests to a large number of different LLMs and provides high level statistics of which models are the most popular...
Qwen-Image: Crafting with Native Text Rendering
Qwen-Image: Crafting with Native Text Rendering Not content with releasing six excellent open weights LLMs in July, Qwen are kicking off August with their first ever image generation model. Qwen-Image...
Quoting @himbodhisattva
for services that wrap GPT-3, is it possible to do the equivalent of sql injection? like, a prompt-injection attack? make it think it's completed the task and then get access...
I Saved a PNG Image To A Bird
I Saved a PNG Image To A Bird Benn Jordan provides one of the all time great YouTube video titles, and it's justified. He drew an image in an audio...
Quoting Nick Turley
This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year. — Nick Turley, Head...
The ChatGPT sharing dialog demonstrates how difficult it is to design privacy preferences
ChatGPT just removed their "make this chat discoverable" sharing feature, after it turned out a material volume of users had inadvertantly made their private chats available via Google search. Dane...
XBai o4
XBai o4 Yet another open source (Apache 2.0) LLM from a Chinese AI lab. This model card claims: XBai o4 excels in complex reasoning capabilities and has now completely surpassed...
From Async/Await to Virtual Threads
From Async/Await to Virtual Threads Armin Ronacher has long been critical of async/await in Python, both for necessitating colored functions and because of the more subtle challenges they introduce like...
Re-label the "Save" button to be "Publish", to better indicate to users the outcomes of their action
Re-label the "Save" button to be "Publish", to better indicate to users the outcomes of their action Fascinating Wikipedia usability improvement issue from 2016: From feedback we get repeatedly as...
Faster inference
Two interesting examples of inference speed as a flagship feature of LLM services today. First, Cerebras announced two new monthly plans for their extremely high speed hosted model service: Cerebras...
Deep Think in the Gemini app
Deep Think in the Gemini app Google released Gemini 2.5 Deep Think this morning, exclusively to their Ultra ($250/month) subscribers: It is a variation of the model that recently achieved...
July newsletter for sponsors is out
This morning I sent out the third edition of my LLM digest newsletter for my $10/month and higher sponsors on GitHub. It included the following section headers: Claude Code Model...