file size increase for pdf/a

OCRmyPDF is really marvelous! Thanks!

I have one question regarding output file size: Unless explicitly selecting pdf as output type, I have quite large file sizes (~4x) after "ocrmypdf in.pdf out.pdf". The pages are scanned text, i.e. actually there are no gray pixels only black or white ones. Only "--output-type pdf" keeps the file size similar.

For the first page (the others are similar) "pdfimages -list in.pdf" gives:

```
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2697  4533  icc     1   1  ccitt  no        17  0   600   601 88.3K 5.9%
```

out.pdf results in:

```
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2697  4533  rgb     3   8  image  no        12  0   600   601  385K 1.1%
```
Even --optimize 3 results in double file size for out.pdf (saved as pdf/a):

```
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2697  4533  gray    1   1  image  no        34  0   600   601  203K  14%
```
Is a conversion obligatory for pdf/a? Or is there a way to keep the original image type AND generate pdf/a? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

file size increase for pdf/a #293

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

file size increase for pdf/a #293

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions