Skip to content

Conversation

@pythonomar22
Copy link
Collaborator

@pythonomar22 pythonomar22 commented Nov 21, 2025

adding kernelbenchdataset object and making code cleaner

Goal is to make a very clean dataset object that handle problem indexing and verification, no matter where the problem it is from (local / HF), or any special properties about the problem (subset etc)

@simonguozirui simonguozirui changed the title 0index2 Dataset Object Nov 29, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a unified dataset abstraction for KernelBench that provides a clean, consistent interface for accessing problems regardless of their source (local filesystem or HuggingFace). The design centers around a Problem dataclass and abstract BaseDataset class with concrete implementations for both data sources.

Key changes:

  • New dataset abstraction with Problem dataclass and BaseDataset interface supporting both local and HuggingFace sources
  • Comprehensive test suite with 30+ unit tests covering dataset construction, problem access, subsetting, and compatibility
  • Migration of all existing scripts and utilities to use the new unified dataset API, eliminating source-specific branching logic

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/kernelbench/dataset.py Core dataset abstraction with Problem dataclass, BaseDataset interface, and LocalKernelBenchDataset/HuggingFaceKernelBenchDataset implementations
src/kernelbench/unit_tests/test_dataset.py Comprehensive test suite with 30+ tests for hash functions, dataset construction, problem access, subsetting, iteration, and HuggingFace parity
src/kernelbench/eval.py Updated to use new dataset API, adds fetch_baseline_time utility function
src/kernelbench/timing.py Updated fetch_baseline_time to accept BaseDataset parameter instead of list
src/kernelbench/utils.py Removed commented import statement for datasets
src/kernelbench/frameworks.py Removed commented import statement for datasets
scripts/verify_bench.py Refactored to iterate over dataset objects instead of directory paths
scripts/run_and_check.py Simplified to use unified dataset interface, eliminating HuggingFace-specific code paths
scripts/inspect_triton.py Updated to use new dataset API with BaseDataset parameter
scripts/inspect_baseline.py Refactored to use construct_kernelbench_dataset instead of directory-based loading
scripts/generate_samples.py Unified dataset loading logic, eliminating separate HuggingFace and local code paths
scripts/generate_baseline_time_modal.py Updated to use new dataset API and removed duplicate fetch_ref_arch_from_dataset function
scripts/generate_baseline_time.py Updated to use new dataset API and iterate over problem IDs instead of indices
scripts/generate_and_eval_single_sample_modal.py Simplified problem fetching using unified dataset interface
scripts/generate_and_eval_single_sample.py Unified dataset loading and problem access across both data sources
scripts/eval_from_generations.py Migrated to use new dataset API, simplified problem ID handling
scripts/benchmark_eval_analysis.py Updated to align eval results with baseline results using problem IDs and names from dataset

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 87 to 89
# Dynamically import the module
spec = importlib.util.spec_from_file_location(
module_name, problem.path
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using HuggingFace datasets, problem.path will be None, causing spec_from_file_location to fail with a TypeError or ValueError. This script assumes a local dataset, but should either validate that only local datasets are used or handle the None case explicitly.

Suggested change
# Dynamically import the module
spec = importlib.util.spec_from_file_location(
module_name, problem.path
problem_path = getattr(problem, "path", None)
if not problem_path:
raise ValueError(
f"Problem '{module_name}' does not have a local file path; "
"verify_bench.py only supports local datasets."
)
# Dynamically import the module
spec = importlib.util.spec_from_file_location(
module_name, problem_path

Copilot uses AI. Check for mistakes.
start, end = config.subset
problem_ids_to_run = [pid for pid in all_problem_ids if start <= pid <= end]
if not problem_ids_to_run:
print(f"Warning: No problems found in subset range {config.subset}")
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: this line uses extra spaces while the rest of the file uses standard indentation. Should align with line 260 above.

Suggested change
print(f"Warning: No problems found in subset range {config.subset}")
print(f"Warning: No problems found in subset range {config.subset}")

Copilot uses AI. Check for mistakes.
Comment on lines 42 to 55
def fetch_ref_arch_from_problem_id(problem_id: int, dataset: "BaseDataset", with_name=False) -> Union[str, tuple[str, str]]:
"""
Fetches the reference architecture in string for a given problem_id
Fetches the reference architecture for a given problem_id from the dataset.
"""
if isinstance(problem_id, str):
problem_id = int(problem_id)

problem_path = problems[problem_id]

# problem_path = os.path.join(REPO_ROOT_PATH, problem)
if not os.path.exists(problem_path):
raise FileNotFoundError(f"Problem file at {problem_path} does not exist.")

ref_arch = utils.read_file(problem_path)
problem = dataset.get_problem_by_id(problem_id)
ref_arch = problem.code

if not with_name:
return ref_arch
else:
return (problem_path, ref_arch)
return (problem.path, ref_arch)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value when with_name=True is inconsistent with the return type annotation. The annotation says tuple[str, str] but the function returns (problem.path, ref_arch) where problem.path could be None for HuggingFace datasets. The return type should be Union[str, tuple[Optional[str], str]] or the function should handle the None case more explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines 100 to 103
def inspect_torch_compile_triton(level_num, problem_id):
ref_arch_name, ref_arch_src = fetch_ref_arch_from_level_problem_id(
level_num, problem_id, with_name=True
)
ref_arch_name = ref_arch_name.split("/")[-1]
ref_arch_name = os.path.basename(ref_arch_name)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When with_name=True, fetch_ref_arch_from_problem_id returns (problem.path, ref_arch) where problem.path can be None for HuggingFace datasets. This will cause os.path.basename(ref_arch_name) on line 103 to fail with a TypeError when ref_arch_name is None. The code should handle the None case or ensure it only works with local datasets.

Copilot uses AI. Check for mistakes.
Comment on lines 888 to 902
"""
if not os.path.exists(baseline_time_filepath):
raise FileNotFoundError(
f"Baseline time file not found at {baseline_time_filepath}"
)

with open(baseline_time_filepath, "r") as f:
baseline_json = json.load(f)

problem = dataset.get_problem_by_id(problem_id)
problem_name = problem.name

baseline_time = baseline_json[level_name].get(problem_name, None)
return baseline_time

Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch_baseline_time function is duplicated in both timing.py (line 395) and eval.py (line 883) with identical logic. This violates the DRY principle. Consider removing the duplicate from one file and importing it from the other, or consolidating both into a single location.

Suggested change
"""
if not os.path.exists(baseline_time_filepath):
raise FileNotFoundError(
f"Baseline time file not found at {baseline_time_filepath}"
)
with open(baseline_time_filepath, "r") as f:
baseline_json = json.load(f)
problem = dataset.get_problem_by_id(problem_id)
problem_name = problem.name
baseline_time = baseline_json[level_name].get(problem_name, None)
return baseline_time
This function delegates to :func:`timing.fetch_baseline_time` to avoid
duplicating logic across modules.
"""
return timing.fetch_baseline_time(
level_name=level_name,
problem_id=problem_id,
dataset=dataset,
baseline_time_filepath=baseline_time_filepath,
)

Copilot uses AI. Check for mistakes.
level_num, problem_id, with_name=True
)
ref_arch_name = ref_arch_name.split("/")[-1]
ref_arch_name = os.path.basename(ref_arch_name)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable ref_arch_name is not used.

Copilot uses AI. Check for mistakes.
)

ref_arch_name = ref_arch_name.split("/")[-1]
ref_arch_name = os.path.basename(ref_arch_name)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable ref_arch_name is not used.

Copilot uses AI. Check for mistakes.

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Iterator, Optional, Union
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Union' is not used.

Suggested change
from typing import Iterator, Optional, Union
from typing import Iterator, Optional

Copilot uses AI. Check for mistakes.
from pydantic import BaseModel

from . import utils, timing
from . import utils, timing, dataset
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'utils' is not used.

Suggested change
from . import utils, timing, dataset
from . import timing, dataset

Copilot uses AI. Check for mistakes.
#from src.dataset import construct_kernelbench_dataset
from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, read_file, create_inference_server_from_presets
from kernelbench.dataset import construct_kernelbench_dataset
from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, create_inference_server_from_presets
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'query_server' is not used.
Import of 'set_gpu_arch' is not used.

Suggested change
from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, create_inference_server_from_presets
from kernelbench.utils import extract_first_code, create_inference_server_from_presets

Copilot uses AI. Check for mistakes.
@simonguozirui
Copy link
Collaborator

Here is the new dataset object design to make KernelBench less confusing and easier to integrate with other frameworks

Key Advantages

  • Source Agnostic: Switch between local and huggingface sources with one flag, adding a lot of checks within the object, save a lot of the code within scripts
  • Robust Alignment: Eliminates mismatch bugs by using stable, 1-indexed problem_id lookups instead of relying on os file order.
  • Flexible Filtering: Native support for creating subsets via specific ID lists or index ranges.
  • Built-in support for Representative Subsets
from kernelbench.dataset import construct_kernelbench_dataset

# 1. Flexible Initialization
# Subset by specific IDs
ds_list = construct_kernelbench_dataset(level=1, problem_ids=[1, 3, 5])
# Subset by a range
ds_range = construct_kernelbench_dataset(level=1, id_range=(10, 20))

# 2. Source switching
ds_hf = construct_kernelbench_dataset(level=1, source="huggingface")

# 3. Quick testing with Curated Representative Subsets
rep_ds = ds_hf.get_representative_subset() 
# result: a diverse ~15-kernel dataset (matmul, conv, norms, etc.)

# 4. Safe lookups
problem = ds_hf.get_problem_by_id(42)
print(f"Name: {problem.name}, Code Hash: {problem.hash}")

.env.example Outdated

# Google Gemini
GEMINI_API_KEY=...
GEMINI_API_KEY=AIzaSyC2rq0xg1Ucpr7cXF5jh82RVBLi8MdnPcU
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll properly take a review later! This looks exciting!

However, please fix this!!!!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanbc key disabled now, please DO NOT post API key anywhere in public

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pythonomar22 A decent rule of thumb is to only modify the .env file when you're experimenting / plugging in keys, and stick .env in the .gitignore. When you're done update the .env.example if you are adding / removing env variables.

This prevents this from happening :)

@simonguozirui
Copy link
Collaborator

@ethanboneh tested locally and on modal. Same behavior as before but much easier and cleaner to instantiate the dataset object, while making it easy for people to extend in the future. Merging.

@simonguozirui simonguozirui merged commit b15a6c9 into main Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants