Skip to content

bwittmann/SQL-ICL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Modular, LLM-Based Text2SQL Pipeline

Overview of the proposed model

Overview of our text2SQL pipeline.

Setup

conda env create -f environment.yml
conda activate sql-icl

Access and Preprocess ScienceBenchmark

cd sciencebench_data/<dataset>
bash download.sh

Run sciencebench_data/<dataset>/extract_relevant_data.ipynb to transform the datasets to target format.

Run Agent

To utilize our LLM-based agent for text2SQL generation, run:

python src/run_agent.py

Execution Accuracy (EX) will be reported on the dev set. Parameters, arguments, and datasets can be set in ./config/run_config.yaml and ./config/agent/baseline.yaml.

Results

Ablation of our pipeline components:

Model Cordis [EX] OncoMx [EX] SDSS [EX]
current SOTA on ScienceBenchmark 35% 56% 21%
Llama 3.3 70B 19% 8% 18%
+ error correction 23% 7% 17%
+ error correction + schema 38% 35% 19%
+ error correction + schema flattened 54% 47% 25%
+ error correction + schema flattened + ICL (k=10) 51% 62% 28%
+ error correction + schema flattened + ICL (k=30) 57% 62% 30%
+ error correction + schema flattened + ICL (k=60) 58% 62% 25%
+ error correction + schema flattened + ICL (k=100) 56% 58% 25%

Experiments with different LLM backbones using the best, above-reported config:

Model Cordis [EX] OncoMx [EX] SDSS [EX]
SOTA on ScienceBenchmark 35% 56% 21%
Llama 3.3 70B (best config) 58% 62% 30%
QWEN2.5 72B (best config) 57% 67% (+11pp) 31%
QWEN2.5-Coder 32B (best config) 52% 62% 37% (+16pp)
Mistral Nemo (best config) 47% 59% 19%
DeepSeek-R1 70B (best config) 62% (+27pp) 58% 25%
QWQ 32B (best config) 57% 63% 35%
Phi4 (best config) 55% 55% 26%
Starcoder2 15B (best config) 7% 8% --
DeepSeek-Coder-V2 16B (best config) 53% 52% 21%
DeepSeek Coder 33B (best config) 46% 57% 25%
SQLCoder 15B (best config) 11% 8% --
mannix/defog-llama3-sqlcoder 8B (best config) 9% 7% --
Gemma3 27B (best config) 54% 60% 25%

ScienceBenchmark

@article{zhang2023sciencebenchmark,
  title={Sciencebenchmark: A complex real-world benchmark for evaluating natural language to sql systems},
  author={Zhang, Yi and Deriu, Jan and Katsogiannis-Meimarakis, George and Kosten, Catherine and Koutrika, Georgia and Stockinger, Kurt},
  journal={arXiv preprint arXiv:2306.04743},
  year={2023}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 96.2%
  • Python 2.3%
  • Shell 1.5%