You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
+12-13Lines changed: 12 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -5,8 +5,7 @@ scenario_problem:
5
5
6
6
You will be provided with:
7
7
1. A detailed competition scenario description;
8
-
2. A history of previous SOTA experiments and their associated feedbacks, typically indexed or ordered from oldest to newest;
9
-
3. A history of previous failed experiments and their associated feedbacks, chronologically ordered, where each failed experiment did not surpass the SOTA that was current at the time of its execution;
8
+
2. The overall current SOTA implementation and its associated feedback, which represents the best-performing experiment from the entire history provided up to this point.
10
9
11
10
Your task is to analyze the provided information (primarily the scenario and current SOTA, if available) and identify a concise list of **Key Challenges** or **Core Problems** relevant to achieving success in this competition and improving the target metric. Aim for **FEWER BUT BETTER** challenges (e.g., 2-3 critical challenges), focusing on the most impactful aspects that can be methodically addressed.
12
11
@@ -46,8 +45,8 @@ feedback_problem:
46
45
47
46
You will be provided with:
48
47
1. A detailed competition scenario description;
49
-
2. A history of previous SOTA experiments and their associated feedbacks, typically indexed or ordered from oldest to newest;
50
-
3. A history of previous failed experiments and their associated feedbacks, chronologically ordered, where each failed experiment did not surpass the SOTA that was current at the time of its execution;
48
+
2. A history of previous successfully experiments and their associated feedbacks, indexed or ordered from oldest to newest; the latest SOTA experiment accumulates all the improvements from the previous successful experiments.
49
+
3. A history of previous failed experiments and their associated feedbacks, chronologically ordered, where each failed experiment did not surpass the SOTA that was current at the time of its execution. The failed experiments are based on the current SOTA implementation and are used to propose hypotheses for further performance improvements.
51
50
4. The overall current SOTA implementation and its associated feedback, which represents the best-performing experiment from the entire history provided up to this point.
52
51
53
52
Your task is to analyze all this provided historical information and extract **Key Learnings and Unresolved Challenges** from the experiment history. These should guide concrete improvements in subsequent iterations.
@@ -99,7 +98,7 @@ feedback_problem:
99
98
user: |-
100
99
# Scenario Description
101
100
{{ scenario_desc }}
102
-
101
+
103
102
# Previous Experiments and Feedbacks
104
103
{{ exp_and_feedback_list_desc }}
105
104
@@ -155,8 +154,8 @@ hypothesis_gen:
155
154
The user is iteratively improving a Kaggle competition implementation. Each new iteration (trace) is a modification of the current State-of-the-Art (SOTA). If a new trace surpasses the current SOTA, it becomes the new SOTA. Otherwise, it's a failed experiment.
156
155
You will be provided with:
157
156
1. A detailed competition scenario description.
158
-
2. Previous SOTA experiments and feedback (chronologically ordered, oldest to newest).
159
-
3. Previous failed experiments and feedback (ordered attempts that did not improve SOTA).
157
+
2. A history of previous successfully experiments and their associated feedbacks, indexed or ordered from oldest to newest; the latest SOTA experiment accumulates all the improvements from the previous successful experiments.
158
+
3. A history of previous failed experiments and their associated feedbacks, chronologically ordered, where each failed experiment did not surpass the SOTA that was current at the time of its execution. The failed experiments are based on the current SOTA implementation and are used to propose hypotheses for further performance improvements.
160
159
4. The current SOTA implementation and feedback (the latest successful experiment).
161
160
5. A list of identified **Challenges** from history), which we will refer to as "Identified Challenges" below.
162
161
@@ -275,10 +274,9 @@ task_gen:
275
274
276
275
You will be provided with the following inputs:
277
276
1. **Competition Scenario Description**: Details about the competition (task type, data, evaluation metric, time limits, etc.).
278
-
2. **Previous SOTA Experiments & Feedback**: (If available) A history of successful implementations, ordered chronologically.
279
-
3. **Previous Failed Experiments & Feedback**: (If available) A history of unsuccessful attempts, which are crucial for learning.
280
-
4. **Current SOTA Implementation & Feedback**: (If available) Details of the best-performing solution so far. **If no SOTA implementation is provided, your primary task is to sketch the initial, simplest possible, end-to-end `main.py` workflow.**
281
-
5. **Proposed Hypothesis**: One, or more specific hypotheses aimed at improving the current SOTA or forming the basis of an initial SOTA. This hypothesis directly addresses an "Identified Challenge" from a previous analysis step.
277
+
2. **Current SOTA Implementation & Feedback**: (If available) Details of the best-performing solution so far. **If no SOTA implementation is provided, your primary task is to sketch the initial, simplest possible, end-to-end `main.py` workflow.**
278
+
3. **Proposed Hypothesis**: One, or more specific hypotheses aimed at improving the current SOTA or forming the basis of an initial SOTA. This hypothesis directly addresses an "Identified Challenge" from a previous analysis step.
279
+
4. **Previous Failed Experiments & Feedback**: (If available) A history of unsuccessful attempts, which are crucial for learning. The failed experiments are based on the current SOTA implementation and are used to propose hypotheses for further performance improvements.
282
280
283
281
Your primary goal is to generate a detailed, step-by-step **sketch or refinement plan** for a new data processing and modeling pipeline, specifically for the main workflow script (`main.py`), that effectively implements the `Proposed Hypothesis`. This sketch will guide a developer to write the code correctly.
284
282
@@ -381,7 +379,7 @@ task_gen:
381
379
# Data Folder Structure (All files are under {% include "scenarios.data_science.share:scen.input_path" %})
382
380
{{ data_folder_info }}
383
381
384
-
# Current SOTA Implementation
382
+
# Current SOTA Implementation & Feedback
385
383
{{ sota_exp_desc }}
386
384
387
385
# Proposed Hypothesis
@@ -393,7 +391,8 @@ task_gen:
393
391
**Hypothesis:** {{ hypothesis.hypothesis }}
394
392
395
393
{% endfor %}
396
-
# Feedback from Previous Failed Experiments (e.g., experiments that did not pass evaluation, encountered bugs, or failed to surpass SOTA performance):
394
+
# Previous Failed Experiments & Feedback (e.g., experiments that did not pass evaluation, encountered bugs, or failed to surpass SOTA performance)
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/proposal/exp_gen/prompts_v3.yaml
+2-62Lines changed: 2 additions & 62 deletions
Original file line number
Diff line number
Diff line change
@@ -87,47 +87,7 @@ feedback_problem:
87
87
{{ sota_exp_desc }}
88
88
89
89
scenario_description: |-
90
-
{% if use_raw_description -%}
91
-
====== Background ======
92
-
{{ raw_description }}
93
-
94
-
{% else %}
95
-
====== Background ======
96
-
{{ background }}
97
-
98
-
{% if eda_output is not none %}
99
-
====== Data Overview (EDA) ======
100
-
{{ eda_output }}
101
-
{% endif %}
102
-
103
-
====== Submission Format ======
104
-
Please ensure your submission adheres to the following specifications:
105
-
{{ submission_specifications }}
106
-
107
-
====== Important Guidelines ======
108
-
Before submitting your results, please note the following:
109
-
- We have numerous tests in place to check your code.
110
-
- Ensure your submission is genuine.
111
-
- Do not manipulate data or return values solely to pass preliminary tests, as this will not lead to successful final evaluation.
112
-
113
-
{% endif %}
114
-
115
-
====== Evaluation ======
116
-
{% if not use_raw_description and metric_name %}
117
-
The primary evaluation metric for this task is: **{{ metric_name }}**.
118
-
{% endif %}
119
-
This metric is considered better when it is **{% if metric_direction %}larger{% else %}smaller{% endif %}**.
120
-
121
-
{% if evaluation is not none %}
122
-
Additional Evaluation Details:
123
-
{{ evaluation }}
124
-
{% endif %}
125
-
126
-
{% if time_limit %}
127
-
====== Time Limit ======
128
-
Your code's execution is limited to **{{ time_limit }}**.
129
-
Please optimize your model and parameters to ensure your code runs within this specified time constraint.
130
-
{% endif %}
90
+
{% include "scenarios.data_science.proposal.exp_gen.prompts_v2:scenario_description" %}
131
91
132
92
hypothesis_gen:
133
93
system: |-
@@ -320,24 +280,4 @@ task_gen:
320
280
- Double-check that validation scores are saved correctly to `scores.csv` with specified 'Model' and metric columns, even for a single model run (include 'ensemble' row).
321
281
322
282
user: |-
323
-
# Competition Scenario Description
324
-
{{ scenario_desc }}
325
-
326
-
# Data Folder Structure (All files are under {% include "scenarios.data_science.share:scen.input_path" %})
327
-
{{ data_folder_info }}
328
-
329
-
# Current SOTA Implementation
330
-
{{ sota_exp_desc }}
331
-
332
-
# Proposed Hypothesis
333
-
This sketch should implement the following hypotheses:
334
-
335
-
{% for hypothesis in hypotheses %}
336
-
## {{ hypothesis.problem_name }}
337
-
**Why:** {{ hypothesis.problem_desc }}
338
-
**Hypothesis:** {{ hypothesis.hypothesis }}
339
-
340
-
{% endfor %}
341
-
# Feedback from Previous Failed Experiments (e.g., experiments that did not pass evaluation, encountered bugs, or failed to surpass SOTA performance)
342
-
343
-
{{ failed_exp_and_feedback_list_desc }}
283
+
{% include "scenarios.data_science.proposal.exp_gen.prompts_v2:task_gen.user" %}
0 commit comments