📚 Run the Reads #1

What I'm learning this week

Jul 16, 2025

I'm absolutely thrilled to launch the first edition of "Run the Reads"!

I was planning to start next week, but honestly—so much incredible research dropped this week that I couldn't wait. When Gemini 2.5 pushes context windows to 2M tokens and we're seeing real-world AI deployments cutting clinical tests by 43%, you don't sit on that intel.

Welcome to your weekly dose of signal, not noise.

Run the Reads is published every Wednesday as part of the “SO”cial series. It provides a curated weekly recap of research papers, articles, and technical content.

Last week zeroed in on agent design, long-context reasoning, and clinically ready bio-AI. Gemini 2.5 stretched context to 2M tokens, while fresh evaluations showed frontier models still miss "easy" logic. Real-world trials and multi-omics models pushed AI closer to bedside impact.

AI Reasoning & LLMs

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next-Generation Agentic Capabilities – arXiv – 2 M-token window lets the model “read” full books and codebases, enabling deeper chain-of-thought and tool use in one pass #multimodal #longcontext

Frontier LLMs Still Struggle with Simple Reasoning Tasks – arXiv – Benchmarks of basic arithmetic and logic reveal that scale alone doesn’t fix brittle reasoning, highlighting the need for structured prompts and training #evaluation #reasoning

Chain-of-Thought Is Not Explainability – alphaXiv – Shows that readable scratch-pads don’t trace causal pathways; urges causal probing before trusting model “thoughts” #explainability #cot

Test-Time Scaling with Reflective Generative Model – arXiv – Self-reflection loop boosts answer quality up to 6 % without new weights, pointing to cheap deployment gains #inference #scaling

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation – alphaXiv – Lets easy tokens exit early while hard ones recurse, shaving FLOPs and latency under heavy loads #efficiency #modeldesign

Meek Models Shall Inherit the Earth – arXiv – Argues diminishing returns make small, tuned models more cost-effective than giga-scale systems for many tasks #scalinglaws #policy

AI Agents & Architectures

Deep Research Agents: A Systematic Examination and Roadmap – arXiv – Maps current agent designs, tool-use patterns, and open benchmarks, then outlines a research agenda for trustworthy autonomy #agents #roadmap

A Survey of AI Agent Protocols – arXiv – Compares OpenAI Functions, LangChain, and open-standard proposals, arguing shared schemas unlock ecosystem composability #protocols #agents

AI Agent Behavioral Science – arXiv – Treats agents like lab animals, measuring bias and deception to surface safety gaps early #behavior #safety

MIRIX: Multi-Agent Memory System for LLM-Based Agents – arXiv – Six-layer memory stack stores task, social, and tool context, cutting hallucinations in 20-step plans #memory #agents

12-Factor Agents – blog – Pulls DevOps discipline into agent engineering: versioned prompts, stateless workers, and observability baked in #engineering #bestpractices

Open Source Tools & Optimization

Muon Is Scalable for LLM Training (Moonlight) – arXiv PDF – New optimizer halves FLOPs and powers Moonlight, an open 16 B MoE, showing that training tricks still beat parameter brute-force #optimization #opensource

Genomics & Biomed AI

Real-world Deployment of a Fine-Tuned Pathology Foundation Model for Lung-Cancer Biomarker Detection – Nature – Prospective trial cut rapid molecular tests by 43 %, hinting at cheaper, faster diagnostics in busy labs #pathology #biomarkers

Pan-Cancer Analysis Uncovers Immunogenomic Drivers of Post-Immunotherapy Resistance – bioRxiv – Cross-tumor study links escape mutations to therapy failure, pointing to new combo-therapy targets #cancer #immunotherapy

Predicting Cellular Responses to Perturbation Across Diverse Contexts with STATE – bioRxiv – Multi-omics model predicts drug response across tissues, boosting hit-rate in in-silico screens #singlecell #perturbomics

Joint Probabilistic Modeling of Pseudobulk and Single-Cell Transcriptomics – bioRxiv – MixupVI blends bulk and single-cell data, giving cleaner deconvolution for rare-cell studies #transcriptomics #mlbio

Automation of Systematic Reviews with Large Language Models – medRxiv – otto-SR agent screens 4 000 abstracts in an hour, shaving weeks off evidence synthesis #systematicreview #agents

Industry Applications

Generative AI Is Finding Fertile Soil in Healthcare – FastCompany – Early adopters report 10-min note-taking and faster doc search but warn of workflow fit issues #healthcare #aiadoption

What Life Sciences Gets Right (And Misses) About AI – LifeScienceLeader – Culture and data silos, not models, are blocking ROI; piece outlines a three-step fix #lifesciences #strategy

Fusing LLM Capabilities with Routing Data – arXiv – FusionFactory routes tasks to specialised sub-models, beating single giants on 14 benchmarks with less compute #modelrouting #fusion

AI Safety & Alignment

Lessons from a Chimp: AI “Scheming” and the Quest for Ape Language – arXiv – Paper dissects sensational “deception” claims and offers a rigor checklist to avoid anthropomorphic traps #alignment #methodology

Research Methods

Dealing with Continuous Variables and Modelling Non-Linear Associations in Healthcare Data – BMJ – Practical guide to splines and fractional polynomials for cleaner clinical models #statistics #researchmethods

That's a wrap on this week's research highlights!

What caught your attention? Any papers you think I missed? Drop a comment below.

Next week: "Run AI Run" launches Friday with synthesis and analysis of these developments. Subscribe to stay ahead of the curve.

New here? This is part of the “SO”cial series—AI-powered intelligence curation for AI professionals. Read the announcement to learn more.

Run Data Run

Discussion about this post