Description
A community blog devoted to refining the art of rationality
Feed Activity
Latest Posts
Metacognition and Self-Modeling in LLMs
Published on July 10, 2025 9:25 PM GMTDo frontier LLMs know what they know or know what they're going to say?An interim research reportSummaryWe replicate and extend our earlier positive...
My take on AI Alignment: Corporate misalignment and DAOs
Published on July 10, 2025 8:33 PM GMTIn this post https://act65.github.io/alignment/ I go over the relationship between corporate and AI alignment to the 'public good'. Concluding;Our ongoing failure to align...
The Tenets of a Rational Debate
Published on July 10, 2025 7:25 PM GMTNote: This is a living document. It will be refined over time as new persuasive arguments come to my attention. I invite you...
what makes Claude 3 Opus misaligned
Published on July 10, 2025 8:06 PM GMTThis is the unedited text of a post I made on X in response to a question asked by @cube_flipper: "you say opus...
Why Are We All Cowards? The Rising Premium of Life, Or: How We Learned to Start Worrying and Fear Everything
Published on July 10, 2025 7:12 PM GMTI'm interested in a simple question: Why are people all so terrified of dying? And have people gotten more afraid? (Answer: probably yes!)In...
Lessons from the Iraq War for AI policy
Published on July 10, 2025 6:52 PM GMTI think the 2003 invasion of Iraq has some interesting lessons for the future of AI policy.(Epistemic status: I’ve read a bit about...
Linkpost: Redwood Research reading list
Published on July 10, 2025 6:39 PM GMTI wrote a reading list to get up to speed on Redwood’s research:Section 1 is a quick guide to the key ideas in...
Linkpost: Guide to Redwood's writing
Published on July 10, 2025 6:39 PM GMTI wrote a guide to Redwood’s writing:Section 1 is a quick guide to the key ideas in AI control, aimed at someone who...
Generalized Hangriness: A Standard Rationalist Stance Toward Emotions
Published on July 10, 2025 6:22 PM GMTPeople have an annoying tendency to hear the word “rationalism” and think “Spock”, despite direct exhortation against that exact interpretation. But I don’t...
The bitter lesson of misuse detection
Published on July 10, 2025 2:50 PM GMTTL;DR: We wanted to benchmark supervision systems available on the market—they performed poorly. Out of curiosity, we naively asked a frontier LLM to...
Evaluating and monitoring for AI scheming
Published on July 10, 2025 2:24 PM GMTAs AI models become more sophisticated, a key concern is the potential for “deceptive alignment” or “scheming”. This is the risk of an...
White Box Control at UK AISI - Update on Sandbagging Investigations
Published on July 10, 2025 1:37 PM GMTIntroductionJoseph Bloom, Alan CooneyThis is a research update from the White Box Control team at UK AISI. In this update, we share preliminary...
Open Global Investment as a Governance Model for AGI
Published on July 10, 2025 12:40 PM GMTI've seen many prescriptive contributions to AGI governance take the form of proposals for some radically new structure. Â Some call for a Manhattan...
How wide is "human-level" intelligence?
Published on July 10, 2025 11:51 AM GMTI'm interested in estimating how many 'OOMs of compute' span the human range. There are a lot of embedded assumptions there, but let's...
The anti-Kardashev scale is a better measure of civilizational power
Published on July 10, 2025 10:02 AM GMTThis post is making a point that would appear to be obvious, however given how the Kardashev scale and direct energy usage comes...
If Anyone Builds It, Everyone Dies: A Conversation with Nate Soares and Tim Urban
Published on July 10, 2025 8:00 AM GMTJoin Tim Urban (creator of Wait But Why) and Nate Soares as they chat about AI and answer questions from the audience about...
How Spacetime Emerges from Observer-Relative Information: An Extension of Relational Quantum Mechanics
Published on July 10, 2025 4:22 AM GMTMany of the paradoxes in quantum mechanics—like Wigner’s friend, the measurement problem, or nonlocality—can be traced to a deep mismatch: quantum theory models...
Academic Sorting, a Singaporean Experiment
Published on July 10, 2025 2:40 AM GMTThis has been cross-posted from my blog, but thought it'd be relevant here.The recent discourse bemoans how public schools do not separate by...
80,000 Hours is producing AI in Context — a new YouTube channel. Our first video, about the AI 2027 scenario, is up!
Published on July 9, 2025 11:58 PM GMTAbout the programHi! We’re Chana and Aric, from the new 80,000 Hours video program.For over a decade, 80,000 Hours has been talking about...
Asking for a Friend (AI Research Protocols)
Published on July 9, 2025 11:41 PM GMTTL;DR:Â Multiple people are quietly wondering if their AI systems might be conscious. What's the standard advice to give them?THE PROBLEMThis thing I've...