DS-STAR: A state-of-the-art versatile data science agent
November 6, 2025
By Jinsung Yoon, Research Scientist, and Jaehyun Nam, Student Researcher, Google Cloud
—
DS-STAR is a state-of-the-art data science agent whose versatility is demonstrated by its ability to automate a range of tasks — from statistical analysis to visualization and data wrangling — across various data types, culminating in a top-ranking performance on the renowned DABStep benchmark.
—
Data science is dedicated to transforming raw data into meaningful, actionable insights and plays an essential role in solving real-world challenges. Businesses often rely on data-driven insights to make pivotal strategic decisions. However, the data science process is frequently complex, demanding expertise in areas such as computer science and statistics. It involves many time-consuming activities, including interpreting diverse documents, complex data processing, and statistical analysis.
To streamline this process, recent research has used large language models (LLMs) to create autonomous data science agents that convert natural language queries into executable code. Despite progress, these agents often depend heavily on well-structured data like CSV files, ignoring diverse formats such as JSON, unstructured text, and markdown files that are prevalent in practice. Additionally, many data science problems are open-ended without clear ground-truth labels, making it difficult to verify correctness.
—
To address these challenges, we introduce DS-STAR, a new agent designed to solve complex data science problems. DS-STAR incorporates three key innovations:
1. A data file analysis module that automatically extracts context from various data formats, including unstructured ones.
2. A verification stage using an LLM-based judge to assess the adequacy of plans at every step.
3. A sequential planning process that refines its plan iteratively based on feedback.
This iterative refinement enables DS-STAR to perform complex analyses combining verifiable insights from multiple data sources. DS-STAR achieves state-of-the-art results on challenging benchmarks like DABStep, KramaBench, and DA-Code, especially excelling with tasks involving heterogeneous data files.
—
### DS-STAR Framework
DS-STAR operates in two stages:
**Stage 1:** It automatically analyzes all files in a directory, generating a textual summary describing their structure and content. This summary provides vital context for the subsequent task.
**Stage 2:** DS-STAR runs a loop of planning, coding, and verification. The Planner agent drafts a high-level plan, which the Coder agent translates into a script. The Verifier agent, an LLM-based judge, evaluates whether the plan suffices. If inadequate, the Router agent modifies or adds steps, and the cycle repeats. This mirrors an expert analyst’s workflow using tools like Google Colab to iteratively build and verify intermediate results. The loop continues until the plan passes verification or hits the maximum rounds limit (10), after which the final code is output.
—
### Evaluation
We evaluated DS-STAR against state-of-the-art methods including AutoGen and DA-Agent across benchmarks DABStep, KramaBench, and DA-Code. These benchmarks include complex tasks such as data wrangling, machine learning, and visualization requiring multiple data sources and formats.
Results show DS-STAR outperforms competitors in all scenarios:
– Accuracy improved from 41.0% to 45.2% on DABStep
– From 39.8% to 44.7% on KramaBench
– From 37.0% to 38.5% on DA-Code
DS-STAR also holds the top rank on the public leaderboard for DABStep as of September 18, 2025. It consistently excels on both easy tasks (single data file) and hard tasks (multiple files), demonstrating superior ability to integrate diverse heterogeneous data sources.
—
### In-depth Analysis
Ablation studies highlight the importance of DS-STAR’s components:
– **Data File Analyzer:** Essential for rich data context. Removing it caused accuracy on hard DABStep tasks to drop sharply to 26.98%.
– **Router:** Crucial for refining plans by adding or correcting steps. Without it, DS-STAR performed worse due to only sequentially adding steps without correction.
– **Generalizability Across LLMs:** Using GPT-5 as a base model showed promising results, with better performance on easy tasks, whereas Gemini-2.5-Pro excelled on harder tasks.
—
### Refinement Process Analysis
Difficult tasks naturally require more iterations. On DABStep, hard tasks averaged 5.6 refinement rounds versus 3.0 for easy tasks, with over half of the easy tasks solved in just one round.
—
### Conclusion
DS-STAR is a novel agent automating complex data science workflows by combining an automatic diverse file analysis with an iterative planning process enhanced by LLM-based verification. DS-STAR achieves new state-of-the-art results on major benchmarks, paving the way for more accessible and effective data science tools to drive innovation across fields.
—
### Acknowledgments
We thank Jiefeng Chen, Jinwoo Shin, Raj Sinha, Mihir Parmar, George Lee, Vishy Tirumalashetty, Tomas Pfister, and Burak Gokturk for their valuable contributions.
—
For further reading and resources, the original paper is available [here](https://arxiv.org/pdf/2509.21825). Additional related posts of interest include:
– From Waveforms to Wisdom: The New Benchmark for Auditory Intelligence (Dec 3, 2025)
– Introducing Nested Learning: A New ML Paradigm for Continual Learning (Nov 7, 2025)
– StreetReaderAI: Towards Making Street View Accessible via Context-Aware Multimodal AI (Oct 29, 2025)
—
**Note:** You can add images referenced in the original post by uploading them to your WordPress media library, using the following URLs as sources for context images:
– https://storage.googleapis.com/gweb-research2023-media/images/DS-STAR_-1.width-1250.png
– https://storage.googleapis.com/gweb-research2023-media/images/DS-STAR_-2.width-1250.png
– https://storage.googleapis.com/gweb-research2023-media/images/DS-STAR_-3.width-1250.png
– https://storage.googleapis.com/gweb-research2023-media/images/DS-STAR_-4.width-1250.png
– https://storage.googleapis.com/gweb-research2023-media/images/DS-STAR_-_table.width-1250.png
– https://storage.googleapis.com/gweb-research2023-media/images/DS-STAR_-5.width-1250.png
This plain text content is now ready to be pasted into the WordPress editor without HTML formatting or CSS classes.
