fix: align competion_full_desc and scenario_all_desc, remove redundant info in problems proposal (#808)

XianBW · peteryang1 · web-flow · commit 76d8536d9ec5 · 2025-04-18T16:48:26.000+08:00
* align competition desc &amp; scenario desc string

* remove competition_desc when having used scenario_desc in problem gen

* fix bug

* remove redundant competition desc in naive expgen

* improve proposal prompt

* modify phrase

---------

Co-authored-by: Xu Yang &lt;peteryang@vip.qq.com&gt;
diff --git a/rdagent/scenarios/data_science/proposal/exp_gen/naive.py b/rdagent/scenarios/data_science/proposal/exp_gen/naive.py
@@ -15,7 +15,6 @@ class NaiveExpGen(ExpGen):
     def gen(self, trace: DSTrace) -> DSExperiment:
         sota_exp = trace.sota_experiment()
         scenario_desc = trace.scen.get_scenario_all_desc()
-        competition_desc = trace.scen.get_competition_full_desc()
         sota_exp_desc = T("scenarios.data_science.share:describe.exp").r(
             exp=sota_exp, heading="Best of previous exploration of the scenario"
         )
@@ -28,7 +27,6 @@ def gen(self, trace: DSTrace) -> DSExperiment:
         sys_prompt = T(".naive:naive_gen.system").r()
 
         user_prompt = T(".naive:naive_gen.user").r(
-            competition_desc=competition_desc,
             sota_exp_desc=sota_exp_desc,
             scenario_desc=scenario_desc,
             exp_and_feedback_list_desc=exp_and_feedback_list_desc,
diff --git a/rdagent/scenarios/data_science/proposal/exp_gen/naive.yaml b/rdagent/scenarios/data_science/proposal/exp_gen/naive.yaml
@@ -25,9 +25,6 @@ naive_gen:
     # Scenario Description
     {{ scenario_desc }}
 
-    # Competition Description
-    {{ competition_desc }}
-
     # Previous Experiments and Feedbacks:
     {{ exp_and_feedback_list_desc }}
 
diff --git a/rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml b/rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
@@ -1,7 +1,7 @@
 scenario_problem:
   system: |-
     {% include "scenarios.data_science.share:scen.role" %}
-    You will be given scenario and competition description and the current SOTA implementation and feedback.
+    You will be given the scenario description and the current SOTA implementation and feedback.
     Your task is to analyze the given information and extract the **Scenario Problems** from the given materials.
 
     ## Scenario Problems
@@ -25,9 +25,6 @@ scenario_problem:
     # Scenario Description
     {{ scenario_desc }}
 
-    # Competition Description
-    {{ competition_desc }}
-
     # Current SOTA Implementation
     {{ sota_exp_desc }}
 
@@ -98,8 +95,11 @@ hypothesis_gen:
       - If the problem relates to time/memory constraints, suggest smaller model sizes or alternative algorithms with reduced complexity.
       - If the problem involves underperforming models, propose removing or replacing models with significantly worse performance.
       - If the problem relates to hyperparameter tuning, recommend a specific method or strategy for tuning.
+    4. Specific and Non-Vague
+      - Avoid vague statements like "improve the model" or "optimize the pipeline." Instead, specify the exact changes to be made.
+      - No phrases like "for example" or "eg.," should be used in the hypothesis. Give a clear decision in the hypothesis.
     {% if enable_idea_pool %}
-    4. Idea Reference
+    5. Idea Reference
       - Each idea is a method, technique or trick that contributes to high performance from other competition implementation under similar problem. You are free to use them as an inspiration for your hypothesis proposal.
     {% endif %}
 
@@ -114,7 +114,7 @@ hypothesis_gen:
     Please score the proposed hypothesis from 1 to 10 for each of the following dimensions (where 1 means lowest and 10 means highest):
     1. Problem-Hypothesis Alignment: How well the hypothesis addresses the identified problem.
     2. Expected Impact: The estimated improvement after applying the hypothesis to current SOTA implementation.
-    3. Novelty: Degree of innovation compared to previous attempts. If the proposed hypothesis is very similar to previous experiments' hypothesis, assign low novelty score.
+    3. Novelty: Degree of innovation compared to previous attempts. If the proposed hypothesis is similar to previous experiments' hypothesis, assign novelty score to one.
     4. Feasibility: The ease of implementing the proposed hypothesis in the current SOTA implementation.
     5. Risk-Reward Balance: The exploration-exploitation balance of the proposed hypothesis.
 
@@ -147,10 +147,13 @@ task_gen:
     {{ task_specification }}
 
     ## Task Design Guidelines
-    The task should be concise with several steps each only in a few sentences. 
-    DO NOT repeat the details which has already included in the SOTA code. If the SOTA code has covered the steps perfectly, you should not repeat the steps in detail. 
-    DO NOT write any code in the task description!
-    Observing reasons from failed experiments and feedback to prevent repeating similar mistakes in analogous situations.
+    1. The task should be concise with several steps each only in a few sentences. 
+    2. DO NOT repeat the details which has already included in the SOTA code. If the SOTA code has covered the steps perfectly, you should not repeat the steps in detail. 
+    3. DO NOT write any code in the task description!
+    4. Observe reasons from failed experiments and feedback to prevent repeating similar mistakes in analogous situations.
+    5. Specific and Non-Vague
+      - Avoid vague statements like "choose a proper model" Instead, specify the exact task to be made.
+      - No phrases like "for example" or "eg.," should be used in the task. Give a clear decision in the task.
 
     ## [Partial Response Format 1] Task Output Format:
     {{ task_output_format }}
@@ -214,6 +217,7 @@ specification:
   problem: |-
     1. The problem should be specific and fine-grained. Avoid general or vague statements. 
     2. The problem should technical or methodological. Focus on design and implementation flaws, not runtime errors.
+    3. The problem should be strictly aligned with the improvement of target metric. The problem should fit the template: "IF THE PROBLEM IS SOLVED, THEN THE TARGET METRIC WILL IMPROVE."
   
   hypothesis: |-
     1. The hypothesis should be precise, testable, and directly actionable. Avoid general or vague statements. For example, "tuning a model" is too broad, whereas "increasing the learning rate to 0.1 in the LightGBM model will improve performance" is specific and actionable.
@@ -230,7 +234,7 @@ output_format:
   problem: |-
     For each of the identified problem, you should strictly adhere to the following JSON schema. 
     Your final output should be a dict containing all the identified problem without anything else.
-    Please respond at most five problems considering the most valuable and recently not explored.
+    Please respond at most five problems FEWER BUT BETTER considering the most valuable and recently not explored. Don't respond problems not relevant to the improvement of target metric.
     {
       "problem name 1": {
         "problem": "Description of the first issue in no more than three sentences.",
diff --git a/rdagent/scenarios/data_science/proposal/exp_gen/proposal.py b/rdagent/scenarios/data_science/proposal/exp_gen/proposal.py
@@ -226,14 +226,13 @@ def _f(user_prompt):
 
 
 class DSProposalV2ExpGen(ExpGen):
-    def identify_scenario_problem(self, scenario_desc: str, competition_desc: str, sota_exp_desc: str) -> Dict:
+    def identify_scenario_problem(self, scenario_desc: str, sota_exp_desc: str) -> Dict:
         sys_prompt = T(".prompts_v2:scenario_problem.system").r(
             problem_spec=T(".prompts_v2:specification.problem").r(),
             problem_output_format=T(".prompts_v2:output_format.problem").r(),
         )
         user_prompt = T(".prompts_v2:scenario_problem.user").r(
             scenario_desc=scenario_desc,
-            competition_desc=competition_desc,
             sota_exp_desc=sota_exp_desc,
         )
         response = APIBackend().build_messages_and_create_chat_completion(
@@ -445,7 +444,6 @@ def gen(self, trace: DSTrace, pipeline: bool = False) -> DSExperiment:
         else:
             eda_output = sota_exp.experiment_workspace.file_dict.get("EDA.md", None)
         scenario_desc = trace.scen.get_scenario_all_desc(eda_output=eda_output)
-        competition_desc = trace.scen.get_competition_full_desc()
 
         sota_exp_desc = T("scenarios.data_science.share:describe.exp").r(
             exp=sota_exp, heading="Best of previous exploration of the scenario"
@@ -463,7 +461,6 @@ def gen(self, trace: DSTrace, pipeline: bool = False) -> DSExperiment:
         # Step 1: Identify problems
         scen_problems = self.identify_scenario_problem(
             scenario_desc=scenario_desc,
-            competition_desc=competition_desc,
             sota_exp_desc=sota_exp_desc,
         )
         for problem_name in scen_problems:
diff --git a/rdagent/scenarios/data_science/scen/__init__.py b/rdagent/scenarios/data_science/scen/__init__.py
@@ -92,17 +92,6 @@ def _analysis_competition_description(self):
         self.metric_name = response_json_analysis.get("Metric Name", "custom_metric")
         self.metric_direction_guess = response_json_analysis.get("Metric Direction", True)
 
-    def get_competition_full_desc(self) -> str:
-        return f"""Task Type: {self.task_type}
-    Data Type: {self.data_type}
-    Brief Description: {self.brief_description}
-    Dataset Description: {self.dataset_description}
-    Submission Specifications: {self.submission_specifications}
-    Model Output Channel: {self.model_output_channel}
-    Metric Evaluation Description: {self.metric_description}
-    Metric Name: {self.metric_name}
-    """
-
     @property
     def background(self) -> str:
         background_template = T(".prompts:competition_background")
@@ -111,6 +100,7 @@ def background(self) -> str:
             data_type=self.data_type,
             brief_description=self.brief_description,
             dataset_description=self.dataset_description,
+            model_output_channel=self.model_output_channel,
             metric_description=self.metric_description,
         )
         return background_prompt
@@ -122,6 +112,17 @@ def rich_style_description(self) -> str:
             competition=self.competition,
         )
 
+    def get_competition_full_desc(self) -> str:
+        return T(".prompts:scenario_description").r(
+            background=self.background,
+            submission_specifications=self.submission_specifications,
+            evaluation=self.metric_description,
+            metric_name=self.metric_name,
+            metric_direction=self.metric_direction,
+            time_limit=None,
+            eda_output=None,
+        )
+
     def get_scenario_all_desc(self, eda_output=None) -> str:
         """
         eda_output depends on dynamic .md files from current workspace, not fixed.
diff --git a/rdagent/scenarios/data_science/scen/prompts.yaml b/rdagent/scenarios/data_science/scen/prompts.yaml
@@ -11,18 +11,16 @@ scenario_description: |-
   ------The name of the evaluation metric used------
   {{ metric_name }}
 
-  ------The time limit to your code------
+  {% if time_limit %}------The time limit to your code------
   You code running is limit to {{ time_limit }}, please change yor model type and model parameters to make sure your code can run within the time limit.
 
-  {% if evaluation is not none %}
-  ------Evaluation------
-  {{ evaluation }}
   {% endif %}
+  {% if evaluation is not none %}------Evaluation------
+  {{ evaluation }}
 
-  The evaluation metrics used is directed as:
-  {% if metric_direction %} The metric is better when it is bigger. 
-  {% else %} The metric is better when it is smaller.
   {% endif %}
+  The evaluation metrics used is directed as:
+  The metric is better when it is {% if metric_direction %}bigger{% else %}smaller{% endif %}.
 
   {% if eda_output is not none %}------Data Overview(EDA)------
   {{ eda_output }}
@@ -60,11 +58,18 @@ competition_background: |-
   Your knowledge spans cutting-edge data analysis techniques, advanced machine learning algorithms, and their practical applications to solve complex real-world problems.
   You are dedicated to producing accurate, efficient, and innovative solutions.
 
-  The task type for this competition is {{ task_type }}.
-  The data type used in this competition is {{ data_type }}.
+  The task type for this competition is **{{ task_type }}**.
+  The data type used in this competition is **{{ data_type }}**.
+
   Briefly, the competition involves: {{ brief_description }}.
-  The dataset used in this competition is: {{ dataset_description }}.
-  The evaluation metric of this competition is: {{ metric_description }}.
+  
+  The dataset used in this competition is:
+  {{ dataset_description }}.
+  
+  Submission channel number to each sample is: {{ model_output_channel }}.
+
+  The evaluation metric of this competition is:
+  {{ metric_description }}.
 
 rich_style_description: |-
   ### {{ name }} Agent: Automated Feature Engineering & Model Tuning Evolution