Skip to content

file size increase for pdf/a #293

@femifrak

Description

@femifrak

OCRmyPDF is really marvelous! Thanks!

I have one question regarding output file size: Unless explicitly selecting pdf as output type, I have quite large file sizes (~4x) after "ocrmypdf in.pdf out.pdf". The pages are scanned text, i.e. actually there are no gray pixels only black or white ones. Only "--output-type pdf" keeps the file size similar.

For the first page (the others are similar) "pdfimages -list in.pdf" gives:

page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2697  4533  icc     1   1  ccitt  no        17  0   600   601 88.3K 5.9%

out.pdf results in:

page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2697  4533  rgb     3   8  image  no        12  0   600   601  385K 1.1%

Even --optimize 3 results in double file size for out.pdf (saved as pdf/a):

page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2697  4533  gray    1   1  image  no        34  0   600   601  203K  14%

Is a conversion obligatory for pdf/a? Or is there a way to keep the original image type AND generate pdf/a?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions