Dataset Object #95

pythonomar22 · 2025-11-21T04:44:46Z

adding kernelbenchdataset object and making code cleaner

Goal is to make a very clean dataset object that handle problem indexing and verification, no matter where the problem it is from (local / HF), or any special properties about the problem (subset etc)

Copilot

Pull request overview

This PR introduces a unified dataset abstraction for KernelBench that provides a clean, consistent interface for accessing problems regardless of their source (local filesystem or HuggingFace). The design centers around a Problem dataclass and abstract BaseDataset class with concrete implementations for both data sources.

Key changes:

New dataset abstraction with Problem dataclass and BaseDataset interface supporting both local and HuggingFace sources
Comprehensive test suite with 30+ unit tests covering dataset construction, problem access, subsetting, and compatibility
Migration of all existing scripts and utilities to use the new unified dataset API, eliminating source-specific branching logic

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
src/kernelbench/dataset.py	Core dataset abstraction with Problem dataclass, BaseDataset interface, and LocalKernelBenchDataset/HuggingFaceKernelBenchDataset implementations
src/kernelbench/unit_tests/test_dataset.py	Comprehensive test suite with 30+ tests for hash functions, dataset construction, problem access, subsetting, iteration, and HuggingFace parity
src/kernelbench/eval.py	Updated to use new dataset API, adds fetch_baseline_time utility function
src/kernelbench/timing.py	Updated fetch_baseline_time to accept BaseDataset parameter instead of list
src/kernelbench/utils.py	Removed commented import statement for datasets
src/kernelbench/frameworks.py	Removed commented import statement for datasets
scripts/verify_bench.py	Refactored to iterate over dataset objects instead of directory paths
scripts/run_and_check.py	Simplified to use unified dataset interface, eliminating HuggingFace-specific code paths
scripts/inspect_triton.py	Updated to use new dataset API with BaseDataset parameter
scripts/inspect_baseline.py	Refactored to use construct_kernelbench_dataset instead of directory-based loading
scripts/generate_samples.py	Unified dataset loading logic, eliminating separate HuggingFace and local code paths
scripts/generate_baseline_time_modal.py	Updated to use new dataset API and removed duplicate fetch_ref_arch_from_dataset function
scripts/generate_baseline_time.py	Updated to use new dataset API and iterate over problem IDs instead of indices
scripts/generate_and_eval_single_sample_modal.py	Simplified problem fetching using unified dataset interface
scripts/generate_and_eval_single_sample.py	Unified dataset loading and problem access across both data sources
scripts/eval_from_generations.py	Migrated to use new dataset API, simplified problem ID handling
scripts/benchmark_eval_analysis.py	Updated to align eval results with baseline results using problem IDs and names from dataset

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-06T05:00:03Z

scripts/verify_bench.py

+            # Dynamically import the module
+            spec = importlib.util.spec_from_file_location(
+                module_name, problem.path


When using HuggingFace datasets, problem.path will be None, causing spec_from_file_location to fail with a TypeError or ValueError. This script assumes a local dataset, but should either validate that only local datasets are used or handle the None case explicitly.

Suggested change

# Dynamically import the module

spec = importlib.util.spec_from_file_location(

module_name, problem.path

problem_path = getattr(problem, "path", None)

if not problem_path:

raise ValueError(

f"Problem '{module_name}' does not have a local file path; "

"verify_bench.py only supports local datasets."

)

# Dynamically import the module

spec = importlib.util.spec_from_file_location(

module_name, problem_path

Copilot · 2026-01-06T05:00:04Z

scripts/generate_samples.py

+        start, end = config.subset
+        problem_ids_to_run = [pid for pid in all_problem_ids if start <= pid <= end]
+        if not problem_ids_to_run:
+             print(f"Warning: No problems found in subset range {config.subset}")


Inconsistent indentation: this line uses extra spaces while the rest of the file uses standard indentation. Should align with line 260 above.

Suggested change

print(f"Warning: No problems found in subset range {config.subset}")

print(f"Warning: No problems found in subset range {config.subset}")

Copilot · 2026-01-06T05:00:04Z

src/kernelbench/eval.py

+def fetch_ref_arch_from_problem_id(problem_id: int, dataset: "BaseDataset", with_name=False) -> Union[str, tuple[str, str]]:
    """
-    Fetches the reference architecture in string for a given problem_id
+    Fetches the reference architecture for a given problem_id from the dataset.
    """
    if isinstance(problem_id, str):
        problem_id = int(problem_id)

-    problem_path = problems[problem_id]
-
-    # problem_path = os.path.join(REPO_ROOT_PATH, problem)
-    if not os.path.exists(problem_path):
-        raise FileNotFoundError(f"Problem file at {problem_path} does not exist.")
-
-    ref_arch = utils.read_file(problem_path)
+    problem = dataset.get_problem_by_id(problem_id)
+    ref_arch = problem.code
+
    if not with_name:
        return ref_arch
    else:
-        return (problem_path, ref_arch)
+        return (problem.path, ref_arch)


The return value when with_name=True is inconsistent with the return type annotation. The annotation says tuple[str, str] but the function returns (problem.path, ref_arch) where problem.path could be None for HuggingFace datasets. The return type should be Union[str, tuple[Optional[str], str]] or the function should handle the None case more explicitly.

Copilot · 2026-01-06T05:00:04Z

scripts/inspect_baseline.py

 def inspect_torch_compile_triton(level_num, problem_id):
    ref_arch_name, ref_arch_src = fetch_ref_arch_from_level_problem_id(
        level_num, problem_id, with_name=True
    )
-    ref_arch_name = ref_arch_name.split("/")[-1]
+    ref_arch_name = os.path.basename(ref_arch_name)


When with_name=True, fetch_ref_arch_from_problem_id returns (problem.path, ref_arch) where problem.path can be None for HuggingFace datasets. This will cause os.path.basename(ref_arch_name) on line 103 to fail with a TypeError when ref_arch_name is None. The code should handle the None case or ensure it only works with local datasets.

Copilot · 2026-01-06T05:00:04Z

src/kernelbench/eval.py

+    """
+    if not os.path.exists(baseline_time_filepath):
+        raise FileNotFoundError(
+            f"Baseline time file not found at {baseline_time_filepath}"
+        )
+
+    with open(baseline_time_filepath, "r") as f:
+        baseline_json = json.load(f)
+
+    problem = dataset.get_problem_by_id(problem_id)
+    problem_name = problem.name
+
+    baseline_time = baseline_json[level_name].get(problem_name, None)
+    return baseline_time



The fetch_baseline_time function is duplicated in both timing.py (line 395) and eval.py (line 883) with identical logic. This violates the DRY principle. Consider removing the duplicate from one file and importing it from the other, or consolidating both into a single location.

Suggested change

"""

if not os.path.exists(baseline_time_filepath):

raise FileNotFoundError(

f"Baseline time file not found at {baseline_time_filepath}"

)

with open(baseline_time_filepath, "r") as f:

baseline_json = json.load(f)

problem = dataset.get_problem_by_id(problem_id)

problem_name = problem.name

baseline_time = baseline_json[level_name].get(problem_name, None)

return baseline_time

This function delegates to :func:`timing.fetch_baseline_time` to avoid

duplicating logic across modules.

"""

return timing.fetch_baseline_time(

level_name=level_name,

problem_id=problem_id,

dataset=dataset,

baseline_time_filepath=baseline_time_filepath,

)

Copilot · 2026-01-06T05:00:06Z

scripts/inspect_baseline.py

        level_num, problem_id, with_name=True
    )
-    ref_arch_name = ref_arch_name.split("/")[-1]
+    ref_arch_name = os.path.basename(ref_arch_name)


Variable ref_arch_name is not used.

Copilot · 2026-01-06T05:00:06Z

scripts/inspect_baseline.py

    )

-    ref_arch_name = ref_arch_name.split("/")[-1]
+    ref_arch_name = os.path.basename(ref_arch_name)


Variable ref_arch_name is not used.

Copilot · 2026-01-06T05:00:06Z

src/kernelbench/dataset.py


+from abc import ABC, abstractmethod
+from dataclasses import dataclass
+from typing import Iterator, Optional, Union


Import of 'Union' is not used.

Suggested change

from typing import Iterator, Optional, Union

from typing import Iterator, Optional

Copilot · 2026-01-06T05:00:06Z

src/kernelbench/eval.py

 from pydantic import BaseModel

-from . import utils, timing
+from . import utils, timing, dataset


Import of 'utils' is not used.

Suggested change

from . import utils, timing, dataset

from . import timing, dataset

Copilot · 2026-01-06T05:00:07Z

scripts/generate_and_eval_single_sample_modal.py

-#from src.dataset import construct_kernelbench_dataset
-from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, read_file, create_inference_server_from_presets
+from kernelbench.dataset import construct_kernelbench_dataset
+from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, create_inference_server_from_presets


Import of 'query_server' is not used.
Import of 'set_gpu_arch' is not used.

Suggested change

from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, create_inference_server_from_presets

from kernelbench.utils import extract_first_code, create_inference_server_from_presets

simonguozirui · 2026-01-06T05:10:55Z

Here is the new dataset object design to make KernelBench less confusing and easier to integrate with other frameworks

Key Advantages

Source Agnostic: Switch between local and huggingface sources with one flag, adding a lot of checks within the object, save a lot of the code within scripts
Robust Alignment: Eliminates mismatch bugs by using stable, 1-indexed problem_id lookups instead of relying on os file order.
Flexible Filtering: Native support for creating subsets via specific ID lists or index ranges.
Built-in support for Representative Subsets

from kernelbench.dataset import construct_kernelbench_dataset

# 1. Flexible Initialization
# Subset by specific IDs
ds_list = construct_kernelbench_dataset(level=1, problem_ids=[1, 3, 5])
# Subset by a range
ds_range = construct_kernelbench_dataset(level=1, id_range=(10, 20))

# 2. Source switching
ds_hf = construct_kernelbench_dataset(level=1, source="huggingface")

# 3. Quick testing with Curated Representative Subsets
rep_ds = ds_hf.get_representative_subset() 
# result: a diverse ~15-kernel dataset (matmul, conv, norms, etc.)

# 4. Safe lookups
problem = ds_hf.get_problem_by_id(42)
print(f"Name: {problem.name}, Code Hash: {problem.hash}")

… local GPU)

PaliC · 2026-01-06T18:09:37Z

.env.example


 # Google Gemini
-GEMINI_API_KEY=...
+GEMINI_API_KEY=AIzaSyC2rq0xg1Ucpr7cXF5jh82RVBLi8MdnPcU


I'll properly take a review later! This looks exciting!

However, please fix this!!!!

@ethanbc key disabled now, please DO NOT post API key anywhere in public

@pythonomar22 A decent rule of thumb is to only modify the .env file when you're experimenting / plugging in keys, and stick .env in the .gitignore. When you're done update the .env.example if you are adding / removing env variables.

This prevents this from happening :)

simonguozirui · 2026-01-07T00:52:36Z

@ethanboneh tested locally and on modal. Same behavior as before but much easier and cleaner to instantiate the dataset object, while making it easy for people to extend in the future. Merging.

pythonomar22 added 5 commits November 20, 2025 18:20

before testing

509ed99

fixing some syntax

c77955f

fixing off by one error after testing

660a47f

Remove timing JSONs from PR and ignore them

dfbef11

fallback correcting

3e31530

simonguozirui mentioned this pull request Nov 29, 2025

Toml-based Prompt Organization #85

Merged

simonguozirui changed the title ~~0index2~~ Dataset Object Nov 29, 2025

simonguozirui added 3 commits January 5, 2026 19:58

keep this branch up to date with the new structure

601b980

better abstractions on the dataset object

1058878

refactor many old and unreliable list based dataset fetching

7bc4a2f

simonguozirui requested a review from Copilot January 6, 2026 04:55

simonguozirui self-assigned this Jan 6, 2026

simonguozirui requested a review from ethanboneh January 6, 2026 04:56

Copilot started reviewing on behalf of simonguozirui January 6, 2026 04:56 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

validated dataset is working (validated on Modal setup only due to no…

a2fb76a

… local GPU)

PaliC requested changes Jan 6, 2026

View reviewed changes

ethanboneh and others added 4 commits January 6, 2026 11:15

my fault fixing the key

0cf6f19

fixed everything except paths issue

10892cb

fixed inspection scripts

97894b9

fixing dependencies

7a86a5b

simonguozirui merged commit b15a6c9 into main Jan 7, 2026

-            # Dynamically import the module
-            spec = importlib.util.spec_from_file_location(
-                module_name, problem.path
+            problem_path = getattr(problem, "path", None)
+            if not problem_path:
+                raise ValueError(
+                    f"Problem '{module_name}' does not have a local file path; "
+                    "verify_bench.py only supports local datasets."
+                )
+            # Dynamically import the module
+            spec = importlib.util.spec_from_file_location(
+                module_name, problem_path

	print(f"Warning: No problems found in subset range {config.subset}")
	print(f"Warning: No problems found in subset range {config.subset}")

	from typing import Iterator, Optional, Union
	from typing import Iterator, Optional

	from . import utils, timing, dataset
	from . import timing, dataset

	from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, create_inference_server_from_presets
	from kernelbench.utils import extract_first_code, create_inference_server_from_presets

Dataset Object #95

Dataset Object #95

Uh oh!

Conversation

pythonomar22 commented Nov 21, 2025 • edited by simonguozirui Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

simonguozirui commented Jan 6, 2026

Uh oh!

PaliC Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

simonguozirui Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

PaliC Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

simonguozirui commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pythonomar22 commented Nov 21, 2025 •

edited by simonguozirui

Loading