Skip to content

fix: misleading default shown in PageConfig docstring / type stub #558

@kh3rld

Description

@kh3rld

The docstring example in the PageConfig type stub (packages/python/kreuzberg/_internal_bindings.pyi) shows extract_pages=True in its example, implying that is the default or expected usage. The actual default is extract_pages=False, which contradicts the example in context.

Location

packages/python/kreuzberg/_internal_bindings.pyi, around line 1059:

class PageConfig:
    r"""Page extraction and tracking configuration.
    ...
    Example:
        >>> from kreuzberg import ExtractionConfig, PageConfig
        >>> config = ExtractionConfig(pages=PageConfig(extract_pages=True))
    """

    extract_pages: bool       # Default: False
    insert_page_markers: bool # Default: False
    ...

The attributes clearly document Default: False, but the example shows extract_pages=True without explaining why, which is confusing to users who assume the example demonstrates default or typical usage.

Expected behavior

The example should either:

  • Show the default (no-arg) constructor: PageConfig(), or
  • Explicitly annotate that extract_pages=True is being set to enable page tracking (i.e., it is non-default), and explain what it enables.

Additionally, since using result_format="element_based" without extract_pages=True produces incorrect page numbers (see Issue 1 & 2), the PageConfig docstring should note this interaction.


Environment

  • Platform: Windows (reported by user; also reproducible on other platforms)
  • Kreuzberg version: (reporter did not specify — please include your version)
  • Python version: (reporter did not specify)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions