Run the Reads #4
Agentic Pipelines, Diffusion Coders, and CRISPR-GPT Breakthroughs – August 6 2025
Welcome to this week's edition of Run the Reads, your curated weekly recap of research papers, articles, and technical content as part of the "SO"cial series. Published every Wednesday, we dive into the latest developments shaping the world of AI and beyond.
The past seven days delivered major strides in fully autonomous agents, diffusion-based language models, and AI-driven biotech. We saw Google unveil a production-ready ML engineering agent, ByteDance push diffusion LLMs to 2k tokens per second, and Nature publish a nanobody designed entirely by virtual agents. On the biomedical front, CRISPR-GPT and OpenCRISPR-1 point toward laboratory workflows run end-to-end by models, while new multi-omics and glucose studies highlight AI’s growing clinical reach.
🧠 Agentic AI & Evaluation
One-stop advances in LLM agents, memory, and benchmarking.
MemTool: Optimizing Short-Term Memory Management for Dynamic Tool Calling in LLM Agent Multi-Turn Conversations
Introduces a salience-aware buffer that learns what to keep or drop, lifting software-engineering task success from 63 % → 79 % on SWE-Bench while trimming prompt size by a third. (arXiv)
Hierarchical Reasoning Model
Stacks a high-level planner atop a token-level solver; achieves an 11-point gain over vanilla chain-of-thought on GSM8K with 30 % fewer generated tokens. (arXiv)
MLE-STAR: A state-of-the-art machine learning engineering agent
Google’s agent autonomously builds data pipelines, tunes hyper-parameters, and deploys models, matching expert Kaggle solutions across 18 public datasets. (Google Research)
Building a Comprehensive AI Agent Evaluation Framework with Metrics, Reports, and Visual Dashboards
Step-by-step tutorial combines task probes, reliability tests, and Grafana dashboards for continuous scoring and slice-based error analysis of agents in production. (MarkTechPost)
Graph-R1: Towards Agentic GraphRAG Framework via End-to-End Reinforcement Learning
Re-frames retrieval as an RL game over knowledge hyper-graphs, boosting reasoning F1 by 6–8 points versus baseline GraphRAG on standard datasets. (arXiv)
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
ByteDance’s lemma-style model solves 64 % of MiniF2F proofs and uses Lean feedback loops for self-improvement, edging closer to end-to-end formal reasoning. (alphaXiv)
💻 LLM Efficiency & Code Diffusion
Speed and quality improvements for language-model coding.
Implementing Self-Refine Technique Using Large Language Models
Shows how a simple self-critique prompt raises Rouge-L by 18 % on summarization tasks without extra training or data. (MarkTechPost)
Seed Diffusion: ByteDance's discrete diffusion language model
Non-sequential generation hits 2 146 tokens / s—5.4 × faster than AR baselines—while matching HumanEval pass@1 scores. (AIbase)
Apple open-sources DiffuCoder
Diffusion-based coder built on Qwen-2.5 outperforms Gemini Diffusion on MBPP and nears GPT-4o accuracy thanks to coupled-GRPO fine-tuning. (InfoQ)
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Hybrid Transformer + SSM design delivers a 34 B model that rivals 70 B Llama-3 while supporting 256 K context and multilingual coverage across 18 languages. (arXiv)
🧬 Genomics & Biomedical AI
AI systems that design molecules, analyze health signals, and integrate omics.
The Virtual Lab of AI agents designs new SARS-CoV-2 nanobodies
A cloud-based agent ensemble generated sub-nanomolar binders against Omicron within 48 h, cutting typical discovery timelines from months to days. (Nature)
CRISPR-GPT for agentic automation of gene-editing experiments
Model-driven robotics planned and validated 1 000 CRISPR edits with 92 % success, reducing human intervention by 70 %. (Nature)
Multimodal AI correlates of glucose spikes in normal and diabetic cohorts
Transformer fusing CGM, diet, and wearable data explained 61 % of post-meal variance and surfaced new lifestyle factors linked to glycemic excursions. (Nature)
GAUDI: interpretable multi-omics integration with UMAP embeddings and density-based clustering
Outperforms MOFA+ across 12 datasets, links latent factors to biomarkers, and offers an interpretable workflow for multi-omics studies. (Nature)
Profluent publishes OpenCRISPR-1 results in Nature
AI-generated Cas variants show higher specificity and lower immunogenicity than SpCas9; company will open-source the underlying CRISPR-Cas atlas. (Business Wire)
Evaluating deep learning–based structure prediction methods on protein complexes
Benchmarks AlphaFold2, ESMFold, and RF-Diffusion on 400 heteromers, finding a 20 % drop for transient interfaces and recommending hybrid rescoring pipelines. (BioRxiv)
📊 Meta-Science & Adoption
Tracking model use and scientific-agent roadmaps.
Quantifying large language model usage in scientific papers
Word-shift analysis of 1.1 M papers shows LLM-edited text rising to 22 % in CS preprints, with open code for venue-level trend monitoring. (Nature)
How Far Are AI Scientists from Changing the World?
Survey maps the path to autonomous discovery, highlighting gaps in experimental validation, evaluation standards, and real-world integration. (arXiv)
⚖️ Alignment & Safety
Tools for steering model behavior.
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
Identifies activation-space directions for traits like hallucination and sycophancy; steering along vectors mitigates unwanted shifts during fine-tuning without extra data. (arXiv)
Agentic workflows are spreading from ML engineering to wet-lab biology, while diffusion and hybrid architectures are redefining speed-to-quality trade-offs. At the same time, meta-research reminds us how quickly these tools permeate science and how crucial alignment techniques remain. What stood out to you this week? Share your thoughts in the comments and join us next Wednesday for more insights.