Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add new generic_jsonl parser to support ingesting JSONL #1369

Closed
4 of 9 tasks
jimwins opened this issue Mar 1, 2024 · 3 comments
Closed
4 of 9 tasks
Labels
expected: release after next good first ticket help wanted size: medium status: backlog Work is planned someday but is not the highest priority at the moment touches: API/CLI/Spec why: functionality Intended to improve ArchiveBox functionality or features

Comments

@jimwins
Copy link
Contributor

jimwins commented Mar 1, 2024

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

JSONL is not supported.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

This should be a fairly simple addition to the generic_json parser. When the file fails to parse with json.parse(), try again to parse it as JSONL before trying the case that skips the first line.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
@pirate
Copy link
Member

pirate commented Mar 1, 2024

Maybe we can do it as a separate parser? generic_jsonl

I think making the parsers more narrow and explicit and having more of them is likely a better approach going forward to avoid the issues we've had in the past with trying to cram a bunch of workaround behaviors into a single parser.

@pirate pirate added size: medium why: functionality Intended to improve ArchiveBox functionality or features good first ticket help wanted status: backlog Work is planned someday but is not the highest priority at the moment touches: API/CLI/Spec type: enhancement expected: release after next labels Mar 1, 2024
@jimwins
Copy link
Contributor Author

jimwins commented Mar 1, 2024

Yeah, now that I play around with it we do need it to be a distinct parser because a single-line JSONL is a valid JSON file but not in the format that the generic_json parser expects. The two parsers can share code for turning each JSON object into a Link so that doesn't get duplicated, at least.

@pirate pirate changed the title Feature Request: Add support for JSONL to generic_json parser Feature Request: Add new generic_jsonl parser to support ingesting JSONL Mar 1, 2024
jimwins added a commit to jimwins/ArchiveBox that referenced this issue Mar 1, 2024
jimwins added a commit to jimwins/ArchiveBox that referenced this issue Mar 14, 2024
@pirate
Copy link
Member

pirate commented Mar 22, 2024

Closing as completed, thanks @jimwins!

@pirate pirate closed this as completed Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expected: release after next good first ticket help wanted size: medium status: backlog Work is planned someday but is not the highest priority at the moment touches: API/CLI/Spec why: functionality Intended to improve ArchiveBox functionality or features
Projects
None yet
Development

No branches or pull requests

2 participants