Skip to content

Commit 76df96e

Browse files
authored
fix: refine prompt (#987)
* refactor: rename failed_exp_and_feedback_list to include _after_sota suffix * refactor: merge prompts_v3 into prompts_v2 and update references
1 parent 41d0290 commit 76df96e

File tree

4 files changed

+24
-85
lines changed

4 files changed

+24
-85
lines changed

rdagent/scenarios/data_science/proposal/exp_gen/base.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -182,27 +182,27 @@ def experiment_and_feedback_list_after_init(
182182
final_component = self.COMPLETE_ORDER[-1]
183183
has_final_component = True if DS_RD_SETTING.coder_on_whole_pipeline else False
184184
SOTA_exp_and_feedback_list = []
185-
failed_exp_and_feedback_list = []
185+
failed_exp_and_feedback_list_after_sota = []
186186
for exp, fb in search_list:
187187
if has_final_component:
188188
if fb.decision:
189189
SOTA_exp_and_feedback_list.append((exp, fb))
190-
failed_exp_and_feedback_list = []
190+
failed_exp_and_feedback_list_after_sota = []
191191
else:
192-
failed_exp_and_feedback_list.append((exp, fb))
192+
failed_exp_and_feedback_list_after_sota.append((exp, fb))
193193
if exp.hypothesis.component == final_component and fb:
194194
has_final_component = True
195-
if max_retrieve_num is not None and (SOTA_exp_and_feedback_list or failed_exp_and_feedback_list):
195+
if max_retrieve_num is not None and (SOTA_exp_and_feedback_list or failed_exp_and_feedback_list_after_sota):
196196
SOTA_exp_and_feedback_list = SOTA_exp_and_feedback_list[
197197
-min(max_retrieve_num, len(SOTA_exp_and_feedback_list)) :
198198
]
199-
failed_exp_and_feedback_list = failed_exp_and_feedback_list[
200-
-min(max_retrieve_num, len(failed_exp_and_feedback_list)) :
199+
failed_exp_and_feedback_list_after_sota = failed_exp_and_feedback_list_after_sota[
200+
-min(max_retrieve_num, len(failed_exp_and_feedback_list_after_sota)) :
201201
]
202202
if return_type == "all":
203-
return SOTA_exp_and_feedback_list + failed_exp_and_feedback_list
203+
return SOTA_exp_and_feedback_list + failed_exp_and_feedback_list_after_sota
204204
elif return_type == "failed":
205-
return failed_exp_and_feedback_list
205+
return failed_exp_and_feedback_list_after_sota
206206
elif return_type == "sota":
207207
return SOTA_exp_and_feedback_list
208208
else:

rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,7 @@ scenario_problem:
55
66
You will be provided with:
77
1. A detailed competition scenario description;
8-
2. A history of previous SOTA experiments and their associated feedbacks, typically indexed or ordered from oldest to newest;
9-
3. A history of previous failed experiments and their associated feedbacks, chronologically ordered, where each failed experiment did not surpass the SOTA that was current at the time of its execution;
8+
2. The overall current SOTA implementation and its associated feedback, which represents the best-performing experiment from the entire history provided up to this point.
109
1110
Your task is to analyze the provided information (primarily the scenario and current SOTA, if available) and identify a concise list of **Key Challenges** or **Core Problems** relevant to achieving success in this competition and improving the target metric. Aim for **FEWER BUT BETTER** challenges (e.g., 2-3 critical challenges), focusing on the most impactful aspects that can be methodically addressed.
1211
@@ -46,8 +45,8 @@ feedback_problem:
4645
4746
You will be provided with:
4847
1. A detailed competition scenario description;
49-
2. A history of previous SOTA experiments and their associated feedbacks, typically indexed or ordered from oldest to newest;
50-
3. A history of previous failed experiments and their associated feedbacks, chronologically ordered, where each failed experiment did not surpass the SOTA that was current at the time of its execution;
48+
2. A history of previous successfully experiments and their associated feedbacks, indexed or ordered from oldest to newest; the latest SOTA experiment accumulates all the improvements from the previous successful experiments.
49+
3. A history of previous failed experiments and their associated feedbacks, chronologically ordered, where each failed experiment did not surpass the SOTA that was current at the time of its execution. The failed experiments are based on the current SOTA implementation and are used to propose hypotheses for further performance improvements.
5150
4. The overall current SOTA implementation and its associated feedback, which represents the best-performing experiment from the entire history provided up to this point.
5251
5352
Your task is to analyze all this provided historical information and extract **Key Learnings and Unresolved Challenges** from the experiment history. These should guide concrete improvements in subsequent iterations.
@@ -99,7 +98,7 @@ feedback_problem:
9998
user: |-
10099
# Scenario Description
101100
{{ scenario_desc }}
102-
101+
103102
# Previous Experiments and Feedbacks
104103
{{ exp_and_feedback_list_desc }}
105104
@@ -155,8 +154,8 @@ hypothesis_gen:
155154
The user is iteratively improving a Kaggle competition implementation. Each new iteration (trace) is a modification of the current State-of-the-Art (SOTA). If a new trace surpasses the current SOTA, it becomes the new SOTA. Otherwise, it's a failed experiment.
156155
You will be provided with:
157156
1. A detailed competition scenario description.
158-
2. Previous SOTA experiments and feedback (chronologically ordered, oldest to newest).
159-
3. Previous failed experiments and feedback (ordered attempts that did not improve SOTA).
157+
2. A history of previous successfully experiments and their associated feedbacks, indexed or ordered from oldest to newest; the latest SOTA experiment accumulates all the improvements from the previous successful experiments.
158+
3. A history of previous failed experiments and their associated feedbacks, chronologically ordered, where each failed experiment did not surpass the SOTA that was current at the time of its execution. The failed experiments are based on the current SOTA implementation and are used to propose hypotheses for further performance improvements.
160159
4. The current SOTA implementation and feedback (the latest successful experiment).
161160
5. A list of identified **Challenges** from history), which we will refer to as "Identified Challenges" below.
162161
@@ -275,10 +274,9 @@ task_gen:
275274
276275
You will be provided with the following inputs:
277276
1. **Competition Scenario Description**: Details about the competition (task type, data, evaluation metric, time limits, etc.).
278-
2. **Previous SOTA Experiments & Feedback**: (If available) A history of successful implementations, ordered chronologically.
279-
3. **Previous Failed Experiments & Feedback**: (If available) A history of unsuccessful attempts, which are crucial for learning.
280-
4. **Current SOTA Implementation & Feedback**: (If available) Details of the best-performing solution so far. **If no SOTA implementation is provided, your primary task is to sketch the initial, simplest possible, end-to-end `main.py` workflow.**
281-
5. **Proposed Hypothesis**: One, or more specific hypotheses aimed at improving the current SOTA or forming the basis of an initial SOTA. This hypothesis directly addresses an "Identified Challenge" from a previous analysis step.
277+
2. **Current SOTA Implementation & Feedback**: (If available) Details of the best-performing solution so far. **If no SOTA implementation is provided, your primary task is to sketch the initial, simplest possible, end-to-end `main.py` workflow.**
278+
3. **Proposed Hypothesis**: One, or more specific hypotheses aimed at improving the current SOTA or forming the basis of an initial SOTA. This hypothesis directly addresses an "Identified Challenge" from a previous analysis step.
279+
4. **Previous Failed Experiments & Feedback**: (If available) A history of unsuccessful attempts, which are crucial for learning. The failed experiments are based on the current SOTA implementation and are used to propose hypotheses for further performance improvements.
282280
283281
Your primary goal is to generate a detailed, step-by-step **sketch or refinement plan** for a new data processing and modeling pipeline, specifically for the main workflow script (`main.py`), that effectively implements the `Proposed Hypothesis`. This sketch will guide a developer to write the code correctly.
284282
@@ -381,7 +379,7 @@ task_gen:
381379
# Data Folder Structure (All files are under {% include "scenarios.data_science.share:scen.input_path" %})
382380
{{ data_folder_info }}
383381
384-
# Current SOTA Implementation
382+
# Current SOTA Implementation & Feedback
385383
{{ sota_exp_desc }}
386384
387385
# Proposed Hypothesis
@@ -393,7 +391,8 @@ task_gen:
393391
**Hypothesis:** {{ hypothesis.hypothesis }}
394392
395393
{% endfor %}
396-
# Feedback from Previous Failed Experiments (e.g., experiments that did not pass evaluation, encountered bugs, or failed to surpass SOTA performance):
394+
# Previous Failed Experiments & Feedback (e.g., experiments that did not pass evaluation, encountered bugs, or failed to surpass SOTA performance)
395+
397396
{{ failed_exp_and_feedback_list_desc }}
398397
399398
idea_sample:

rdagent/scenarios/data_science/proposal/exp_gen/prompts_v3.yaml

Lines changed: 2 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -87,47 +87,7 @@ feedback_problem:
8787
{{ sota_exp_desc }}
8888
8989
scenario_description: |-
90-
{% if use_raw_description -%}
91-
====== Background ======
92-
{{ raw_description }}
93-
94-
{% else %}
95-
====== Background ======
96-
{{ background }}
97-
98-
{% if eda_output is not none %}
99-
====== Data Overview (EDA) ======
100-
{{ eda_output }}
101-
{% endif %}
102-
103-
====== Submission Format ======
104-
Please ensure your submission adheres to the following specifications:
105-
{{ submission_specifications }}
106-
107-
====== Important Guidelines ======
108-
Before submitting your results, please note the following:
109-
- We have numerous tests in place to check your code.
110-
- Ensure your submission is genuine.
111-
- Do not manipulate data or return values solely to pass preliminary tests, as this will not lead to successful final evaluation.
112-
113-
{% endif %}
114-
115-
====== Evaluation ======
116-
{% if not use_raw_description and metric_name %}
117-
The primary evaluation metric for this task is: **{{ metric_name }}**.
118-
{% endif %}
119-
This metric is considered better when it is **{% if metric_direction %}larger{% else %}smaller{% endif %}**.
120-
121-
{% if evaluation is not none %}
122-
Additional Evaluation Details:
123-
{{ evaluation }}
124-
{% endif %}
125-
126-
{% if time_limit %}
127-
====== Time Limit ======
128-
Your code's execution is limited to **{{ time_limit }}**.
129-
Please optimize your model and parameters to ensure your code runs within this specified time constraint.
130-
{% endif %}
90+
{% include "scenarios.data_science.proposal.exp_gen.prompts_v2:scenario_description" %}
13191
13292
hypothesis_gen:
13393
system: |-
@@ -320,24 +280,4 @@ task_gen:
320280
- Double-check that validation scores are saved correctly to `scores.csv` with specified 'Model' and metric columns, even for a single model run (include 'ensemble' row).
321281
322282
user: |-
323-
# Competition Scenario Description
324-
{{ scenario_desc }}
325-
326-
# Data Folder Structure (All files are under {% include "scenarios.data_science.share:scen.input_path" %})
327-
{{ data_folder_info }}
328-
329-
# Current SOTA Implementation
330-
{{ sota_exp_desc }}
331-
332-
# Proposed Hypothesis
333-
This sketch should implement the following hypotheses:
334-
335-
{% for hypothesis in hypotheses %}
336-
## {{ hypothesis.problem_name }}
337-
**Why:** {{ hypothesis.problem_desc }}
338-
**Hypothesis:** {{ hypothesis.hypothesis }}
339-
340-
{% endfor %}
341-
# Feedback from Previous Failed Experiments (e.g., experiments that did not pass evaluation, encountered bugs, or failed to surpass SOTA performance)
342-
343-
{{ failed_exp_and_feedback_list_desc }}
283+
{% include "scenarios.data_science.proposal.exp_gen.prompts_v2:task_gen.user" %}

rdagent/scenarios/data_science/proposal/exp_gen/proposal.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -731,7 +731,7 @@ def task_gen(
731731
component_desc=component_desc,
732732
workflow_check=not pipeline and hypotheses[0].component != "Workflow",
733733
)
734-
user_prompt = T(".prompts_v3:task_gen.user").r(
734+
user_prompt = T(".prompts_v2:task_gen.user").r(
735735
scenario_desc=scenario_desc,
736736
data_folder_info=data_folder_info,
737737
sota_exp_desc=sota_exp_desc,
@@ -774,7 +774,7 @@ def task_gen(
774774
return exp
775775

776776
def get_scenario_all_desc(self, trace: DSTrace, eda_output=None) -> str:
777-
return T(".prompts_v3:scenario_description").r(
777+
return T(".prompts_v2:scenario_description").r(
778778
background=trace.scen.background,
779779
submission_specifications=trace.scen.submission_specifications,
780780
evaluation=trace.scen.metric_description,

0 commit comments

Comments
 (0)