A Modular, LLM-Based Text2SQL Pipeline

Overview of our text2SQL pipeline.

Setup

conda env create -f environment.yml
conda activate sql-icl

Access and Preprocess ScienceBenchmark

cd sciencebench_data/<dataset>
bash download.sh

Run sciencebench_data/<dataset>/extract_relevant_data.ipynb to transform the datasets to target format.

Run Agent

To utilize our LLM-based agent for text2SQL generation, run:

python src/run_agent.py

Execution Accuracy (EX) will be reported on the dev set. Parameters, arguments, and datasets can be set in ./config/run_config.yaml and ./config/agent/baseline.yaml.

Results

Ablation of our pipeline components:

Model	Cordis [EX]	OncoMx [EX]	SDSS [EX]
current SOTA on ScienceBenchmark	35%	56%	21%
Llama 3.3 70B	19%	8%	18%
+ error correction	23%	7%	17%
+ error correction + schema	38%	35%	19%
+ error correction + schema flattened	54%	47%	25%
+ error correction + schema flattened + ICL (k=10)	51%	62%	28%
+ error correction + schema flattened + ICL (k=30)	57%	62%	30%
+ error correction + schema flattened + ICL (k=60)	58%	62%	25%
+ error correction + schema flattened + ICL (k=100)	56%	58%	25%

Experiments with different LLM backbones using the best, above-reported config:

Model	Cordis [EX]	OncoMx [EX]	SDSS [EX]
SOTA on ScienceBenchmark	35%	56%	21%
Llama 3.3 70B (best config)	58%	62%	30%
QWEN2.5 72B (best config)	57%	67% (+11pp)	31%
QWEN2.5-Coder 32B (best config)	52%	62%	37% (+16pp)
Mistral Nemo (best config)	47%	59%	19%
DeepSeek-R1 70B (best config)	62% (+27pp)	58%	25%
QWQ 32B (best config)	57%	63%	35%
Phi4 (best config)	55%	55%	26%
Starcoder2 15B (best config)	7%	8%	--
DeepSeek-Coder-V2 16B (best config)	53%	52%	21%
DeepSeek Coder 33B (best config)	46%	57%	25%
SQLCoder 15B (best config)	11%	8%	--
mannix/defog-llama3-sqlcoder 8B (best config)	9%	7%	--
Gemma3 27B (best config)	54%	60%	25%

ScienceBenchmark

@article{zhang2023sciencebenchmark,
  title={Sciencebenchmark: A complex real-world benchmark for evaluating natural language to sql systems},
  author={Zhang, Yi and Deriu, Jan and Katsogiannis-Meimarakis, George and Kosten, Catherine and Koutrika, Georgia and Stockinger, Kurt},
  journal={arXiv preprint arXiv:2306.04743},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
config		config
docs		docs
sciencebench_data		sciencebench_data
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Modular, LLM-Based Text2SQL Pipeline

Setup

Access and Preprocess ScienceBenchmark

Run Agent

Results

ScienceBenchmark

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Modular, LLM-Based Text2SQL Pipeline

Setup

Access and Preprocess ScienceBenchmark

Run Agent

Results

ScienceBenchmark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages