-
Notifications
You must be signed in to change notification settings - Fork 110
Dataset Object #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset Object #95
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a unified dataset abstraction for KernelBench that provides a clean, consistent interface for accessing problems regardless of their source (local filesystem or HuggingFace). The design centers around a Problem dataclass and abstract BaseDataset class with concrete implementations for both data sources.
Key changes:
- New dataset abstraction with
Problemdataclass andBaseDatasetinterface supporting both local and HuggingFace sources - Comprehensive test suite with 30+ unit tests covering dataset construction, problem access, subsetting, and compatibility
- Migration of all existing scripts and utilities to use the new unified dataset API, eliminating source-specific branching logic
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| src/kernelbench/dataset.py | Core dataset abstraction with Problem dataclass, BaseDataset interface, and LocalKernelBenchDataset/HuggingFaceKernelBenchDataset implementations |
| src/kernelbench/unit_tests/test_dataset.py | Comprehensive test suite with 30+ tests for hash functions, dataset construction, problem access, subsetting, iteration, and HuggingFace parity |
| src/kernelbench/eval.py | Updated to use new dataset API, adds fetch_baseline_time utility function |
| src/kernelbench/timing.py | Updated fetch_baseline_time to accept BaseDataset parameter instead of list |
| src/kernelbench/utils.py | Removed commented import statement for datasets |
| src/kernelbench/frameworks.py | Removed commented import statement for datasets |
| scripts/verify_bench.py | Refactored to iterate over dataset objects instead of directory paths |
| scripts/run_and_check.py | Simplified to use unified dataset interface, eliminating HuggingFace-specific code paths |
| scripts/inspect_triton.py | Updated to use new dataset API with BaseDataset parameter |
| scripts/inspect_baseline.py | Refactored to use construct_kernelbench_dataset instead of directory-based loading |
| scripts/generate_samples.py | Unified dataset loading logic, eliminating separate HuggingFace and local code paths |
| scripts/generate_baseline_time_modal.py | Updated to use new dataset API and removed duplicate fetch_ref_arch_from_dataset function |
| scripts/generate_baseline_time.py | Updated to use new dataset API and iterate over problem IDs instead of indices |
| scripts/generate_and_eval_single_sample_modal.py | Simplified problem fetching using unified dataset interface |
| scripts/generate_and_eval_single_sample.py | Unified dataset loading and problem access across both data sources |
| scripts/eval_from_generations.py | Migrated to use new dataset API, simplified problem ID handling |
| scripts/benchmark_eval_analysis.py | Updated to align eval results with baseline results using problem IDs and names from dataset |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scripts/verify_bench.py
Outdated
| # Dynamically import the module | ||
| spec = importlib.util.spec_from_file_location( | ||
| module_name, problem.path |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using HuggingFace datasets, problem.path will be None, causing spec_from_file_location to fail with a TypeError or ValueError. This script assumes a local dataset, but should either validate that only local datasets are used or handle the None case explicitly.
| # Dynamically import the module | |
| spec = importlib.util.spec_from_file_location( | |
| module_name, problem.path | |
| problem_path = getattr(problem, "path", None) | |
| if not problem_path: | |
| raise ValueError( | |
| f"Problem '{module_name}' does not have a local file path; " | |
| "verify_bench.py only supports local datasets." | |
| ) | |
| # Dynamically import the module | |
| spec = importlib.util.spec_from_file_location( | |
| module_name, problem_path |
scripts/generate_samples.py
Outdated
| start, end = config.subset | ||
| problem_ids_to_run = [pid for pid in all_problem_ids if start <= pid <= end] | ||
| if not problem_ids_to_run: | ||
| print(f"Warning: No problems found in subset range {config.subset}") |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent indentation: this line uses extra spaces while the rest of the file uses standard indentation. Should align with line 260 above.
| print(f"Warning: No problems found in subset range {config.subset}") | |
| print(f"Warning: No problems found in subset range {config.subset}") |
src/kernelbench/eval.py
Outdated
| def fetch_ref_arch_from_problem_id(problem_id: int, dataset: "BaseDataset", with_name=False) -> Union[str, tuple[str, str]]: | ||
| """ | ||
| Fetches the reference architecture in string for a given problem_id | ||
| Fetches the reference architecture for a given problem_id from the dataset. | ||
| """ | ||
| if isinstance(problem_id, str): | ||
| problem_id = int(problem_id) | ||
|
|
||
| problem_path = problems[problem_id] | ||
|
|
||
| # problem_path = os.path.join(REPO_ROOT_PATH, problem) | ||
| if not os.path.exists(problem_path): | ||
| raise FileNotFoundError(f"Problem file at {problem_path} does not exist.") | ||
|
|
||
| ref_arch = utils.read_file(problem_path) | ||
| problem = dataset.get_problem_by_id(problem_id) | ||
| ref_arch = problem.code | ||
|
|
||
| if not with_name: | ||
| return ref_arch | ||
| else: | ||
| return (problem_path, ref_arch) | ||
| return (problem.path, ref_arch) |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value when with_name=True is inconsistent with the return type annotation. The annotation says tuple[str, str] but the function returns (problem.path, ref_arch) where problem.path could be None for HuggingFace datasets. The return type should be Union[str, tuple[Optional[str], str]] or the function should handle the None case more explicitly.
| def inspect_torch_compile_triton(level_num, problem_id): | ||
| ref_arch_name, ref_arch_src = fetch_ref_arch_from_level_problem_id( | ||
| level_num, problem_id, with_name=True | ||
| ) | ||
| ref_arch_name = ref_arch_name.split("/")[-1] | ||
| ref_arch_name = os.path.basename(ref_arch_name) |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When with_name=True, fetch_ref_arch_from_problem_id returns (problem.path, ref_arch) where problem.path can be None for HuggingFace datasets. This will cause os.path.basename(ref_arch_name) on line 103 to fail with a TypeError when ref_arch_name is None. The code should handle the None case or ensure it only works with local datasets.
src/kernelbench/eval.py
Outdated
| """ | ||
| if not os.path.exists(baseline_time_filepath): | ||
| raise FileNotFoundError( | ||
| f"Baseline time file not found at {baseline_time_filepath}" | ||
| ) | ||
|
|
||
| with open(baseline_time_filepath, "r") as f: | ||
| baseline_json = json.load(f) | ||
|
|
||
| problem = dataset.get_problem_by_id(problem_id) | ||
| problem_name = problem.name | ||
|
|
||
| baseline_time = baseline_json[level_name].get(problem_name, None) | ||
| return baseline_time | ||
|
|
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetch_baseline_time function is duplicated in both timing.py (line 395) and eval.py (line 883) with identical logic. This violates the DRY principle. Consider removing the duplicate from one file and importing it from the other, or consolidating both into a single location.
| """ | |
| if not os.path.exists(baseline_time_filepath): | |
| raise FileNotFoundError( | |
| f"Baseline time file not found at {baseline_time_filepath}" | |
| ) | |
| with open(baseline_time_filepath, "r") as f: | |
| baseline_json = json.load(f) | |
| problem = dataset.get_problem_by_id(problem_id) | |
| problem_name = problem.name | |
| baseline_time = baseline_json[level_name].get(problem_name, None) | |
| return baseline_time | |
| This function delegates to :func:`timing.fetch_baseline_time` to avoid | |
| duplicating logic across modules. | |
| """ | |
| return timing.fetch_baseline_time( | |
| level_name=level_name, | |
| problem_id=problem_id, | |
| dataset=dataset, | |
| baseline_time_filepath=baseline_time_filepath, | |
| ) |
| level_num, problem_id, with_name=True | ||
| ) | ||
| ref_arch_name = ref_arch_name.split("/")[-1] | ||
| ref_arch_name = os.path.basename(ref_arch_name) |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable ref_arch_name is not used.
| ) | ||
|
|
||
| ref_arch_name = ref_arch_name.split("/")[-1] | ||
| ref_arch_name = os.path.basename(ref_arch_name) |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable ref_arch_name is not used.
src/kernelbench/dataset.py
Outdated
|
|
||
| from abc import ABC, abstractmethod | ||
| from dataclasses import dataclass | ||
| from typing import Iterator, Optional, Union |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Union' is not used.
| from typing import Iterator, Optional, Union | |
| from typing import Iterator, Optional |
src/kernelbench/eval.py
Outdated
| from pydantic import BaseModel | ||
|
|
||
| from . import utils, timing | ||
| from . import utils, timing, dataset |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'utils' is not used.
| from . import utils, timing, dataset | |
| from . import timing, dataset |
| #from src.dataset import construct_kernelbench_dataset | ||
| from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, read_file, create_inference_server_from_presets | ||
| from kernelbench.dataset import construct_kernelbench_dataset | ||
| from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, create_inference_server_from_presets |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'query_server' is not used.
Import of 'set_gpu_arch' is not used.
| from kernelbench.utils import extract_first_code, query_server, set_gpu_arch, create_inference_server_from_presets | |
| from kernelbench.utils import extract_first_code, create_inference_server_from_presets |
|
Here is the new Key Advantages
|
.env.example
Outdated
|
|
||
| # Google Gemini | ||
| GEMINI_API_KEY=... | ||
| GEMINI_API_KEY=AIzaSyC2rq0xg1Ucpr7cXF5jh82RVBLi8MdnPcU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll properly take a review later! This looks exciting!
However, please fix this!!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanbc key disabled now, please DO NOT post API key anywhere in public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pythonomar22 A decent rule of thumb is to only modify the .env file when you're experimenting / plugging in keys, and stick .env in the .gitignore. When you're done update the .env.example if you are adding / removing env variables.
This prevents this from happening :)
|
@ethanboneh tested locally and on modal. Same behavior as before but much easier and cleaner to instantiate the dataset object, while making it easy for people to extend in the future. Merging. |
adding kernelbenchdataset object and making code cleaner
Goal is to make a very clean dataset object that handle problem indexing and verification, no matter where the problem it is from (local / HF), or any special properties about the problem (subset etc)