I'm absolutely thrilled to launch the first edition of "Run the Reads"!
I was planning to start next week, but honestly—so much incredible research dropped this week that I couldn't wait. When Gemini 2.5 pushes context windows to 2M tokens and we're seeing real-world AI deployments cutting clinical tests by 43%, you don't sit on that intel.
Welcome to your weekly dose of signal, not noise.
Run the Reads is published every Wednesday as part of the “SO”cial series. It provides a curated weekly recap of research papers, articles, and technical content.
Last week zeroed in on agent design, long-context reasoning, and clinically ready bio-AI. Gemini 2.5 stretched context to 2M tokens, while fresh evaluations showed frontier models still miss "easy" logic. Real-world trials and multi-omics models pushed AI closer to bedside impact.
AI Reasoning & LLMs
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next-Generation Agentic Capabilities – arXiv – 2 M-token window lets the model “read” full books and codebases, enabling deeper chain-of-thought and tool use in one pass #multimodal #longcontext
Frontier LLMs Still Struggle with Simple Reasoning Tasks – arXiv – Benchmarks of basic arithmetic and logic reveal that scale alone doesn’t fix brittle reasoning, highlighting the need for structured prompts and training #evaluation #reasoning
Chain-of-Thought Is Not Explainability – alphaXiv – Shows that readable scratch-pads don’t trace causal pathways; urges causal probing before trusting model “thoughts” #explainability #cot
Test-Time Scaling with Reflective Generative Model – arXiv – Self-reflection loop boosts answer quality up to 6 % without new weights, pointing to cheap deployment gains #inference #scaling
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation – alphaXiv – Lets easy tokens exit early while hard ones recurse, shaving FLOPs and latency under heavy loads #efficiency #modeldesign
Meek Models Shall Inherit the Earth – arXiv – Argues diminishing returns make small, tuned models more cost-effective than giga-scale systems for many tasks #scalinglaws #policy
AI Agents & Architectures
Deep Research Agents: A Systematic Examination and Roadmap – arXiv – Maps current agent designs, tool-use patterns, and open benchmarks, then outlines a research agenda for trustworthy autonomy #agents #roadmap
A Survey of AI Agent Protocols – arXiv – Compares OpenAI Functions, LangChain, and open-standard proposals, arguing shared schemas unlock ecosystem composability #protocols #agents
AI Agent Behavioral Science – arXiv – Treats agents like lab animals, measuring bias and deception to surface safety gaps early #behavior #safety
MIRIX: Multi-Agent Memory System for LLM-Based Agents – arXiv – Six-layer memory stack stores task, social, and tool context, cutting hallucinations in 20-step plans #memory #agents
12-Factor Agents – blog – Pulls DevOps discipline into agent engineering: versioned prompts, stateless workers, and observability baked in #engineering #bestpractices
Open Source Tools & Optimization
Muon Is Scalable for LLM Training (Moonlight) – arXiv PDF – New optimizer halves FLOPs and powers Moonlight, an open 16 B MoE, showing that training tricks still beat parameter brute-force #optimization #opensource
Genomics & Biomed AI
Real-world Deployment of a Fine-Tuned Pathology Foundation Model for Lung-Cancer Biomarker Detection – Nature – Prospective trial cut rapid molecular tests by 43 %, hinting at cheaper, faster diagnostics in busy labs #pathology #biomarkers
Pan-Cancer Analysis Uncovers Immunogenomic Drivers of Post-Immunotherapy Resistance – bioRxiv – Cross-tumor study links escape mutations to therapy failure, pointing to new combo-therapy targets #cancer #immunotherapy
Predicting Cellular Responses to Perturbation Across Diverse Contexts with STATE – bioRxiv – Multi-omics model predicts drug response across tissues, boosting hit-rate in in-silico screens #singlecell #perturbomics
Joint Probabilistic Modeling of Pseudobulk and Single-Cell Transcriptomics – bioRxiv – MixupVI blends bulk and single-cell data, giving cleaner deconvolution for rare-cell studies #transcriptomics #mlbio
Automation of Systematic Reviews with Large Language Models – medRxiv – otto-SR agent screens 4 000 abstracts in an hour, shaving weeks off evidence synthesis #systematicreview #agents
Industry Applications
Generative AI Is Finding Fertile Soil in Healthcare – FastCompany – Early adopters report 10-min note-taking and faster doc search but warn of workflow fit issues #healthcare #aiadoption
What Life Sciences Gets Right (And Misses) About AI – LifeScienceLeader – Culture and data silos, not models, are blocking ROI; piece outlines a three-step fix #lifesciences #strategy
Fusing LLM Capabilities with Routing Data – arXiv – FusionFactory routes tasks to specialised sub-models, beating single giants on 14 benchmarks with less compute #modelrouting #fusion
AI Safety & Alignment
Lessons from a Chimp: AI “Scheming” and the Quest for Ape Language – arXiv – Paper dissects sensational “deception” claims and offers a rigor checklist to avoid anthropomorphic traps #alignment #methodology
Research Methods
Dealing with Continuous Variables and Modelling Non-Linear Associations in Healthcare Data – BMJ – Practical guide to splines and fractional polynomials for cleaner clinical models #statistics #researchmethods
That's a wrap on this week's research highlights!
What caught your attention? Any papers you think I missed? Drop a comment below.
Next week: "Run AI Run" launches Friday with synthesis and analysis of these developments. Subscribe to stay ahead of the curve.
New here? This is part of the “SO”cial series—AI-powered intelligence curation for AI professionals. Read the announcement to learn more.