Skip to content

Failed to yaml decode from prek==0.2.15 on VCR cassettes #1104

@jamesbraza

Description

@jamesbraza

Summary

Seen in this CI run from PaperQA:

check yaml....................................................................Failed
- hook id: check-yaml
- exit code: 1
  tests/cassettes/test_pdf_reader_match_doc_details.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 49, column 11)
  tests/cassettes/test_odd_client_requests.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_doi_search[paper_attributes1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 93, column 11)
  tests/cassettes/test_tricky_journal_quality_results[10.1016-j.bbcan.2023.188947-1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)

It seems check-yaml is throwing errors on VCR cassettes from https://github.com/kiwicom/pytest-recording. To repro in PaperQA, run prek run -a check-yaml

Platform

macOS Sequoia version 15.6.1

Version

prek 0.2.15 (11f369e 2025-11-17)

.pre-commit-config.yaml

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: check-yaml

Log file

DEBUG prek: 0.2.15 (11f369ed7 2025-11-17)
DEBUG Args: ["prek", "-vvv", "run", "-a", "check-yaml"]
DEBUG Git root: /Users/jamesbraza/code/paper-qa
DEBUG Found workspace root at `/Users/jamesbraza/code/paper-qa`
TRACE Include selectors: `check-yaml`
TRACE Skip selectors: ``
DEBUG discover{root="/Users/jamesbraza/code/paper-qa" config=None refresh=false}: Loaded workspace from cache
DEBUG discover{root="/Users/jamesbraza/code/paper-qa" config=None refresh=false}: Loading project configuration path=.pre-commit-config.yaml
TRACE discover{root="/Users/jamesbraza/code/paper-qa" config=None refresh=false}: close time.busy=1.18ms time.idle=2.00µs
TRACE Checking lock resource="store" path=/Users/jamesbraza/.cache/prek/.lock
DEBUG Acquired lock resource="store"
TRACE Skipping reading PEP 723 metadata for hook `black-jupyter` because it already has `additional_dependencies`
TRACE Skipping reading PEP 723 metadata for hook `codespell` because it already has `additional_dependencies`
TRACE Skipping reading PEP 723 metadata for hook `validate-pyproject` because it already has `additional_dependencies`
TRACE Skipping reading PEP 723 metadata for hook `jupytext` because it already has `additional_dependencies`
DEBUG Hooks going to run: ["check-yaml"]
TRACE Executing `/Users/jamesbraza/.cache/prek/hooks/python-uYwbn31xSL794TFVCByD/bin/python -I -c import sys
print(f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}")
print(sys.base_exec_prefix)
 [...]`
DEBUG Found installed environment for hook `check-yaml` at `/Users/jamesbraza/.cache/prek/hooks/python-uYwbn31xSL794TFVCByD`
TRACE Released lock path=/Users/jamesbraza/.cache/prek/.lock
TRACE Executing `cd /Users/jamesbraza/code/paper-qa && /opt/homebrew/bin/git ls-files -z -- /Users/jamesbraza/code/paper-qa`
DEBUG All files in the workspace: 175
TRACE Executing `/opt/homebrew/bin/git diff -- /Users/jamesbraza/code/paper-qa`
TRACE Files for project `.` after filtered: 175
TRACE Files for hook `check-yaml` after filtered: 53
check yaml...............................................................DEBUG Running builtin hook: check-yaml
TRACE Resolved command: check-yaml
TRACE Executing `/opt/homebrew/bin/git diff -- /Users/jamesbraza/code/paper-qa`
Failed
- hook id: check-yaml
- duration: 0.01s
- exit code: 1
  tests/cassettes/test_tricky_journal_quality_results[10.1146-annurev.pathol.4.110807.092311-2].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_does_openalex_work[not-in-openalex].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_doi_search[paper_attributes1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 93, column 11)
  tests/cassettes/test_partitioning_fn_docs[False].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 44, column 11)
  tests/cassettes/test_pdf_reader_match_doc_details.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 49, column 11)
  tests/cassettes/test_title_search[paper_attributes1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_partitioning_fn_docs[True].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 44, column 11)
  tests/cassettes/test_crossref_journalquality_fields_filtering.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_equations[parse_pdf_to_pages1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 74, column 11)
  tests/cassettes/test_partly_embedded_texts[True].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 44, column 11)
  tests/cassettes/test_get_reasoning[deepseek-reasoner].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 91, column 11)
  tests/cassettes/test_tricky_journal_quality_results[10.1016-j.semcdb.2016.08.024-1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_does_openalex_work[oa-in-openalex2].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_does_openalex_work[oa-in-openalex1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_docs_lifecycle.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 87, column 11)
  tests/cassettes/test_partly_embedded_texts[False].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 44, column 11)
  tests/cassettes/test_nonduplicate_contexts.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 44, column 11)
  tests/cassettes/test_bulk_doi_search.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 425, column 11)
  tests/cassettes/test_doi_search[paper_attributes0].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 92, column 11)
  tests/cassettes/test_bad_titles.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 190, column 11)
  tests/cassettes/test_doi_search[paper_attributes3].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 91, column 11)
  tests/cassettes/test_doi_search[paper_attributes2].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 84, column 11)
  tests/cassettes/test_bulk_title_search.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_image_enrichment_normal_use.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 91, column 11)
  tests/cassettes/test_tricky_journal_quality_results[10.1073-pnas.1205508109-3].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_does_openalex_work[not-oa-in-openalex].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_tricky_journal_quality_results[10.1016-j.bbcan.2023.188947-1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_tricky_journal_quality_results[10.1186-1471-2148-11-4-2].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_maybe_is_text.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_equations[parse_pdf_to_pages0].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_doi_search[paper_attributes4].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 97, column 11)
  tests/cassettes/test_ensure_sequential_run.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 172, column 11)
  tests/cassettes/test_title_search[paper_attributes2].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_title_search[paper_attributes0].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_get_reasoning[openrouter-deepseek].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 91, column 11)
  tests/cassettes/test_arxiv_doi_is_used_when_available.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_author_matching.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_odd_client_requests.yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)
  tests/cassettes/test_tricky_journal_quality_results[10.1038-s41598-018-27044-6-1].yaml: Failed to yaml decode (!!binary scalar is not valid UTF-8 at line 20, column 11)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions