-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Description
OCRmyPDF is really marvelous! Thanks!
I have one question regarding output file size: Unless explicitly selecting pdf as output type, I have quite large file sizes (~4x) after "ocrmypdf in.pdf out.pdf". The pages are scanned text, i.e. actually there are no gray pixels only black or white ones. Only "--output-type pdf" keeps the file size similar.
For the first page (the others are similar) "pdfimages -list in.pdf" gives:
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 2697 4533 icc 1 1 ccitt no 17 0 600 601 88.3K 5.9%
out.pdf results in:
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 2697 4533 rgb 3 8 image no 12 0 600 601 385K 1.1%
Even --optimize 3 results in double file size for out.pdf (saved as pdf/a):
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 2697 4533 gray 1 1 image no 34 0 600 601 203K 14%
Is a conversion obligatory for pdf/a? Or is there a way to keep the original image type AND generate pdf/a?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels