LLM Prompting for Text2SQL via Gradual SQL Refinement

Overview

1. Clause Decomposition

We decompose the natural language question into logical clauses based on semantic units and incorporate this information into the prompt, allowing the LLM to generate Pre-SQL. At this stage of generating Pre-SQL, we ensure that the model fully utilizes the information from the Question and hint, as well as the DB schema information without value details.

2. SQL-driven Schema Linking

We instruct the model to extract the tables and columns involved in the Pre-SQL and then construct the following information:

In the DB schema where the tables and columns involved in the Pre-SQL are masked, the model is tasked to explore potential table and column information based on the Question.
For the tables and columns involved in the Pre-SQL, a value condition checker is used to further filter out the columns related to value condition judgments:

2.1 For columns involved in value condition judgments, similarity search methods are used to provide value examples with high similarity to the keywords in the natural language question.

2.2 For columns not involved in value condition judgments, SQL queries are directly constructed to fetch value examples.
For the tables and columns involved in the Pre-SQL, if their relationships are incorrect or if there are errors in table names or column names, all errors are captured using SQL.

In summary, three pieces of information will be obtained:

Simplified DB schema information with the Pre-SQL tables and columns masked.
Value example information based on the Pre-SQL.
Potential error information in the Pre-SQL.

Then, the model is allowed to correct the Pre-SQL based on the information obtained above.

3. SQL Execution Refinement

Execute the Second-SQL on the database, then integrate the execution results with the prompt as input for the model. Instruct the model to analyze whether the execution results of the Second-SQL are reasonable and refine the Second-SQL accordingly to produce the Final-SQL.

Project directory structure

GSR/
├── README.md
├── requirements.txt
│
├── data/
│   └── databases/
│		└── dev_20240627/
│
├── data_process/
│   └── sql_data_process_BIRD.py
│
├── run/
│   └── GSR.py
│
└── tools/

Environment

conda create -n GSR python=3.10
conda activate GSR
pip install -r requirements.txt

RUN

1. Data Preprocessing

Please place the test set files in the directory data/database/. Then set the path parameters.

In data_process_config.py, you need to set SQL_DATA_INFO and DATABASE_PATH. The parameters you need to set in SQL_DATA_INFO include ‘data_source’, ‘file’, ‘tables_file’, ‘database_name’.

cd data_process/
python sql_data_process_BIRD.py

Four files are generated after execution.

all_mappings.json
raw_format_data.json
Pre_input.json
Second_input.json

2. Generate SQL

Please set the parameters of ICL-SQL. In ICL-SQL.py, mainly set database_file_path, start_idx and end_idx.

cd run/
python GSR.py

You will get the generated SQL file in the output after execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Prompting for Text2SQL via Gradual SQL Refinement

Overview

1. Clause Decomposition

2. SQL-driven Schema Linking

3. SQL Execution Refinement

Project directory structure

Environment

RUN

1. Data Preprocessing

2. Generate SQL

Results

Comparsion of execution accuracy

Ablation Study

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
data_process		data_process
evaluation		evaluation
figs		figs
output		output
run		run
tools		tools
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

GSR-SQL/GSR

Folders and files

Latest commit

History

Repository files navigation

LLM Prompting for Text2SQL via Gradual SQL Refinement

Overview

1. Clause Decomposition

2. SQL-driven Schema Linking

3. SQL Execution Refinement

Project directory structure

Environment

RUN

1. Data Preprocessing

2. Generate SQL

Results

Comparsion of execution accuracy

Ablation Study

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages