-
-
Notifications
You must be signed in to change notification settings - Fork 579
Closed
Labels
Description
[x] Your issue is based on the latest commit - Using latest release v0.8.0
[x] State your OS and OS version - macOS 14.6.1
Description
The WriteImageToDisk function is responsible for writing extracted images files to disk.
The generated filename does not use entirely unique values, as such, different image files can be created with the same filename. This can result in extracted images being overwritten.
Example
Listing images in sample PDF (which unfortunately I am unable to share) via pdfcpu image list 8020932-report.pdf:
Page Obj# │ Id │ Type SoftMask ImgMask │ Width │ Height │ ColorSpace Comp bpc Interp │ Size │ Filters
━━━━━━━━━━┿━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━┿━━━━━━━━━━━━
3 23 │ img1 │ image * │ 1000 │ 367 │ DeviceRGB 3 8 │ 1 KB │ FlateDecode
24 │ img0 │ image │ 1000 │ 367 │ DeviceGray 1 8 │ 10 KB │ FlateDecode
70 │ img1 │ image * │ 834 │ 84 │ DeviceRGB 3 8 │ 2 KB │ FlateDecode
This shows two images on the same page which have the same id, namely img1.
When extracting this, it leads to one of the files being overwritten:
pdfcpu extract -m=i -p 3 8020932-report.pdf .
extracting images from 8020932-report.pdf into ./ ...
optimizing...
pages: 3
writing 8020932-report_3_img0.png
writing 8020932-report_3_img1.png
writing 8020932-report_3_img1.png
# Three images extracted, only two files exist:
❯ ls -laht *.png
-rw-r--r-- 1 user staff 1.6K 25 Aug 14:27 8020932-report_3_img1.png
-rw-r--r-- 1 user staff 14K 25 Aug 14:27 8020932-report_3_img0.png