Skip to content

Conversation

@fancycode
Copy link
Contributor

With this change, the names are decoded internally, so they can be compared directly when adding entries to dictionaries. On writing, the names are encoded if necessary.

Also removed some duplicate code for name encoding / decoding and simplified object type tests.

Follow-up to #776 to also speed up parsing dictionaries that contain key with a #.

@hhrutter
Copy link
Collaborator

hhrutter commented Jan 29, 2024

Is there any way to get into a discussion of an issue.
This especially applies for PRs with heavy, critical changes.
eg. What's the motivation for this, the issue at hand etc.

@fancycode
Copy link
Contributor Author

As written above, it's a follow-up to the previous change in #776, related to issue #775.

Even with the change from #776 you can construct a PDF with dictionaries that take ages to parse (see the updated test in

sb.WriteString("/Key#28#29 (Value)")
).

I'm happy to discuss the changes in this merge request - at least that's my understanding what merge requests are for. You can easily comment on individual lines, add a review or add global comments as we are doing right now.

@hhrutter
Copy link
Collaborator

It's easy to come up with spec compliant PDFs that pdfcpu will choke on but that's true for many pdfcpu processors out there. Instead of focusing on theoretical corner cases I'd like to spend my time on real word PDFs that are spec compliant and yet cause trouble.

Yet I am on board if this is about speeding up parsing of average but bigger PDF files.
It will take me some time to get to reviewing your proposal, please bear with me.

@hhrutter
Copy link
Collaborator

Do you think you can rebase this onto the latest commit?
Will help big time cutting a new release by the end of the day 😉

With this change, the names are decoded internally, so they be can compared
directly when adding entries to dictionaries. On writing, the names are
encoded if necessary.

Also removed some duplicate code for name encoding / decoding.
@fancycode fancycode force-pushed the improve-name-parsing branch from 50422c4 to 044a6c0 Compare February 28, 2024 10:26
@fancycode
Copy link
Contributor Author

Sure, just rebased the branch on fc87a22.

@hhrutter
Copy link
Collaborator

heads up... ValidationNone is gone in case you were using it..

hhrutter added a commit that referenced this pull request Feb 29, 2024
@hhrutter hhrutter merged commit 044a6c0 into pdfcpu:master Feb 29, 2024
@hhrutter
Copy link
Collaborator

Thanks!

@fancycode fancycode deleted the improve-name-parsing branch February 29, 2024 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants