<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>python programming on File Format Blog</title>
    <link>https://blog.fileformat.com/zh/tag/python-programming/</link>
    <description>Recent content in python programming on File Format Blog</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>zh</language>
    <lastBuildDate>Wed, 29 Jan 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.fileformat.com/zh/tag/python-programming/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>在 Python 中处理 PDF 文件</title>
      <link>https://blog.fileformat.com/zh/programming/working-with-pdf-files-in-python/</link>
      <pubDate>Wed, 29 Jan 2025 00:00:00 +0000</pubDate>
      
      <guid>https://blog.fileformat.com/zh/programming/working-with-pdf-files-in-python/</guid>
      <description>学习如何使用 Python 从 PDF 中提取文本、旋转 PDF 页面、合并多个 PDF、拆分 PDF，并使用 Python 库和简单的代码示例为您的 PDF 添加水印。</description>
      <content:encoded><![CDATA[<p><strong>最后更新</strong>: 2025年1月29日</p>
<figure class="align-center ">
    <img loading="lazy" src="images/working-with-pdf-files-in-python.png#center"
         alt="标题 - 在 Python 中处理 PDF 文件"/> 
</figure>

<p>本文将指导您<strong>如何在 Python 中处理 PDF 文件</strong>。为此，我们将利用 <a href="https://pypi.org/project/pypdf/"><strong>pypdf</strong></a> 库。</p>
<p>使用 <strong>pypdf</strong> 库，我们将演示如何在 Python 中执行以下操作：</p>
<ul>
<li>从 PDF 中提取文本</li>
<li>旋转 PDF 页面</li>
<li>合并多个 PDF</li>
<li>拆分 PDF 为单独文件</li>
<li>为 PDF 页添加水印</li>
</ul>
<p><em><strong>注意</strong>: 本文涵盖了许多有价值的细节，您可以随时跳到您最感兴趣的部分！内容按易于导航的方式组织，您可以快速专注于对您而言最重要的部分。</em></p>
<figure class="align-center ">
    <img loading="lazy" src="images/pdf-manipulation-with-pypdf.webp#center"
         alt="插图 - 在 Python 中处理 PDF 文件"/> 
</figure>

<h2 id="示例代码">示例代码</h2>
<p>您可以从以下链接下载本文使用的所有示例代码，包括代码、输入文件和输出文件。</p>
<ul>
<li><a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python">在 Python 中处理 PDF 文件的代码示例及输入文件</a></li>
</ul>
<h2 id="安装-pypdf">安装 pypdf</h2>
<p>要安装 pypdf，只需在终端或命令提示符中运行以下命令：</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>pip install pypdf
</span></span></code></pre></div><p><strong>注意：</strong> 上述命令区分大小写。</p>
<h2 id="1-使用-python-从-pdf-文件提取文本">1. 使用 Python 从 PDF 文件提取文本</h2>
<script type="application/javascript" src="https://gist.github.com/fileformat-blog-gists/e2b43a49dbad9e89745f8f9777817acb.js?file=extract-text-from-pdf-using-pypdf-in-python.py"></script>

<h3 id="代码解释"><strong>代码解释</strong></h3>
<p><strong>1. 创建 PDF 阅读器对象</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>reader <span style="color:#f92672">=</span> PdfReader(pdf_file)
</span></span></code></pre></div><ul>
<li><code>PdfReader(pdf_file)</code> 将 PDF 文件加载到<strong>阅读器对象</strong>中。</li>
<li>该对象允许访问页面及其内容。</li>
</ul>
<p><strong>2. 遍历页面</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">for</span> page_number, page <span style="color:#f92672">in</span> enumerate(reader<span style="color:#f92672">.</span>pages, start<span style="color:#f92672">=</span><span style="color:#ae81ff">1</span>):
</span></span></code></pre></div><ul>
<li><code>reader.pages</code> 返回 PDF 中的页面列表。</li>
<li><code>enumerate(..., start=1)</code> 为页面分配<strong>从 1 开始的页码</strong>。</li>
</ul>
<p><strong>3. 打印提取的文本</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>    print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Page </span><span style="color:#e6db74">{</span>page_number<span style="color:#e6db74">}</span><span style="color:#e6db74">:&#34;</span>)
</span></span><span style="display:flex;"><span>    print(page<span style="color:#f92672">.</span>extract_text())
</span></span><span style="display:flex;"><span>    print(<span style="color:#e6db74">&#34;-&#34;</span> <span style="color:#f92672">*</span> <span style="color:#ae81ff">50</span>)  <span style="color:#75715e"># 可读性分隔符</span>
</span></span></code></pre></div><ul>
<li><code>page.extract_text()</code> 从当前页面提取文本内容。</li>
<li>脚本打印提取的文本和<strong>页码</strong>。</li>
<li><code>&quot;-&quot; * 50</code> 打印分隔线（<code>--------------------------------------------------</code>）以提高可读性。</li>
</ul>
<h3 id="代码中使用的输入-pdf-文件">代码中使用的输入 PDF 文件</h3>
<ul>
<li><strong>输入文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/blob/main/working-with-pdf-files-in-python/pdf-to-extract-text/">下载链接</a></li>
</ul>
<h3 id="代码输出">代码输出</h3>
<script type="application/javascript" src="https://gist.github.com/fileformat-blog-gists/ab6976aa3a0fc2999093f5f9320a9e20.js?file=Output%20-%20extract-text-from-pdf-using-pypdf-in-python.txt"></script>

<h2 id="2-使用-python-旋转-pdf-页面">2. 使用 Python 旋转 PDF 页面</h2>
<script type="application/javascript" src="https://gist.github.com/fileformat-blog-gists/760d480cfede4178296c353d60662e1a.js?file=rotate-pdf-page-using-pypdf-in-python.py"></script>

<h3 id="代码解释-1">代码解释</h3>
<p>该代码基本上将<strong>第一页</strong>顺时针旋转<strong>90°</strong>并保存修改后的 PDF，而不影响其他页面。</p>
<p><strong>1. 导入必需类</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> pypdf <span style="color:#f92672">import</span> PdfReader, PdfWriter
</span></span></code></pre></div><ul>
<li><code>PdfReader</code>: 读取输入 PDF。</li>
<li><code>PdfWriter</code>: 创建带有修改的新 PDF。</li>
</ul>
<p><strong>2. 定义输入和输出文件路径</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>input_pdf <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pdf-to-rotate/input.pdf&#34;</span>
</span></span><span style="display:flex;"><span>output_pdf <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pdf-to-rotate/rotated_output.pdf&#34;</span>
</span></span></code></pre></div><ul>
<li>脚本从 <code>input.pdf</code> 读取并将修改后的文件保存为 <code>rotated_output.pdf</code>。</li>
</ul>
<p><strong>3. 读取 PDF 并创建 Writer 对象</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>reader <span style="color:#f92672">=</span> PdfReader(input_pdf)
</span></span><span style="display:flex;"><span>writer <span style="color:#f92672">=</span> PdfWriter()
</span></span></code></pre></div><ul>
<li><code>reader</code> 加载现有 PDF。</li>
<li><code>writer</code> 用于存储修改后的页面。</li>
</ul>
<p><strong>4. 将第一页旋转 90 度</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>page <span style="color:#f92672">=</span> reader<span style="color:#f92672">.</span>pages[<span style="color:#ae81ff">0</span>]
</span></span><span style="display:flex;"><span>page<span style="color:#f92672">.</span>rotate(<span style="color:#ae81ff">90</span>)  <span style="color:#75715e"># 顺时针旋转 90 度</span>
</span></span><span style="display:flex;"><span>writer<span style="color:#f92672">.</span>add_page(page)
</span></span></code></pre></div><ul>
<li>提取<strong>第一页</strong>，将其旋转<strong>90 度</strong>，并将其添加到新 PDF 中。</li>
</ul>
<p><strong>5. 添加其余页面而不做更改</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">for</span> i <span style="color:#f92672">in</span> range(<span style="color:#ae81ff">1</span>, len(reader<span style="color:#f92672">.</span>pages)):
</span></span><span style="display:flex;"><span>    writer<span style="color:#f92672">.</span>add_page(reader<span style="color:#f92672">.</span>pages[i])
</span></span></code></pre></div><ul>
<li>遍历剩余页面并按原样添加。</li>
</ul>
<p><strong>6. 保存新的 PDF</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(output_pdf, <span style="color:#e6db74">&#34;wb&#34;</span>) <span style="color:#66d9ef">as</span> file:
</span></span><span style="display:flex;"><span>    writer<span style="color:#f92672">.</span>write(file)
</span></span></code></pre></div><ul>
<li>以写入二进制模式打开 <code>rotated_output.pdf</code> 并保存新 PDF。</li>
</ul>
<p><strong>7. 打印确认信息</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Rotated page saved to </span><span style="color:#e6db74">{</span>output_pdf<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><ul>
<li>显示成功消息。</li>
</ul>
<h3 id="代码中使用的输入-pdf-及其旋转输出">代码中使用的输入 PDF 及其旋转输出</h3>
<ul>
<li><strong>输入 PDF 文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdf-to-rotate/">下载链接</a></li>
<li><strong>输出旋转 PDF 文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdf-to-rotate/rotated_output.pdf">下载链接</a></li>
</ul>
<p><strong>截图</strong>
<img loading="lazy" src="https://raw.githubusercontent.com/fileformat-blog-gists/content/main/working-with-pdf-files-in-python/rotated-pdf.png" alt="使用 Python 旋转 PDF 页面的截图"  />
</p>
<h2 id="3-使用-python-合并-pdf-文件">3. 使用 Python 合并 PDF 文件</h2>
<p>此 Python 脚本演示了如何使用 <strong>PyPDF</strong> 库<strong>合并目录中的多个 PDF 文件</strong>为单个 PDF。</p>
<script type="application/javascript" src="https://gist.github.com/fileformat-blog-gists/a1a571783e0f5e699678d1094bf1afa5.js?file=merge_pdf_files_using_pypdf_in_python.py"></script>

<h3 id="代码解释-2">代码解释</h3>
<ul>
<li>此脚本自动将指定目录（<code>pdfs-to-merge</code>）中的所有 PDF 文件合并为一个输出文件（<code>merged_output.pdf</code>）。</li>
<li>它确保输出目录存在并按列出顺序添加每个 PDF 的页面。</li>
<li>它将最终合并后的文件输出到 <code>output-dir</code> 子目录中。</li>
</ul>
<p><strong>代码拆解</strong></p>
<p><strong>1. 导入库</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> os
</span></span><span style="display:flex;"><span><span style="color:#f92672">from</span> pypdf <span style="color:#f92672">import</span> PdfReader, PdfWriter
</span></span></code></pre></div><ul>
<li><code>os</code>: 用于与文件系统交互，例如读取目录和管理文件路径。</li>
<li><code>PdfReader</code>: 读取 PDF 文件内容。</li>
<li><code>PdfWriter</code>: 创建并写入新 PDF 文件。</li>
</ul>
<p><strong>2. 定义目录和输出文件</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>directory <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pdfs-to-merge&#34;</span>
</span></span><span style="display:flex;"><span>output_file <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;output-dir/merged_output.pdf&#34;</span>
</span></span></code></pre></div><ul>
<li><code>directory</code>: 指定存储 PDF 文件的文件夹。</li>
<li><code>output_file</code>: 定义合并的 PDF 的输出路径和名称。</li>
</ul>
<p><strong>3. 如果输出目录不存在则创建</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>os<span style="color:#f92672">.</span>makedirs(os<span style="color:#f92672">.</span>path<span style="color:#f92672">.</span>join(directory, <span style="color:#e6db74">&#34;output-dir&#34;</span>), exist_ok<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>)
</span></span></code></pre></div><ul>
<li>这确保<strong>输出目录</strong>存在，如果不存在，它会创建它。</li>
</ul>
<p><strong>4. 创建一个 PdfWriter 对象</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>writer <span style="color:#f92672">=</span> PdfWriter()
</span></span></code></pre></div><ul>
<li><code>writer</code>用于收集并组合所有 PDF 的页面。</li>
</ul>
<p><strong>5. 遍历目录中的所有 PDF 文件</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">for</span> file_name <span style="color:#f92672">in</span> sorted(os<span style="color:#f92672">.</span>listdir(directory)):
</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> file_name<span style="color:#f92672">.</span>endswith(<span style="color:#e6db74">&#34;.pdf&#34;</span>):
</span></span><span style="display:flex;"><span>        file_path <span style="color:#f92672">=</span> os<span style="color:#f92672">.</span>path<span style="color:#f92672">.</span>join(directory, file_name)
</span></span><span style="display:flex;"><span>        print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Adding: </span><span style="color:#e6db74">{</span>file_name<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><ul>
<li>这个循环遍历指定目录中的所有文件，检查扩展名为 <code>.pdf</code> 的文件。它使用 <code>sorted()</code> 按字母顺序处理它们。</li>
</ul>
<p><strong>6. 读取每个 PDF 并附加页面到 Writer</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>reader <span style="color:#f92672">=</span> PdfReader(file_path)
</span></span><span style="display:flex;"><span>writer<span style="color:#f92672">.</span>append(reader)
</span></span></code></pre></div><ul>
<li>对于每个 PDF，<code>PdfReader</code> 读取文件，然后从该 PDF 附加所有页面到 <code>writer</code>。</li>
</ul>
<p><strong>7. 将合并后的 PDF 写入输出文件</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>output_path <span style="color:#f92672">=</span> os<span style="color:#f92672">.</span>path<span style="color:#f92672">.</span>join(directory, output_file)
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(output_path, <span style="color:#e6db74">&#34;wb&#34;</span>) <span style="color:#66d9ef">as</span> output_pdf:
</span></span><span style="display:flex;"><span>    writer<span style="color:#f92672">.</span>write(output_pdf)
</span></span></code></pre></div><ul>
<li>收集完所有页面后，<code>writer.write()</code> 将合并后的 PDF 写入到指定的输出路径。</li>
</ul>
<p><strong>8. 打印确认信息</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Merged PDF saved as: </span><span style="color:#e6db74">{</span>output_path<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><ul>
<li>打印确认保存合并后的 PDF 位置的成功信息。</li>
</ul>
<h3 id="代码中使用的输入-pdf-文件及合并后的输出-pdf">代码中使用的输入 PDF 文件及合并后的输出 PDF</h3>
<ul>
<li><strong>输入 PDF 文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdfs-to-merge">下载链接</a></li>
<li><strong>合并后的输出 PDF：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdfs-to-merge/output-dir">下载链接</a></li>
</ul>
<h2 id="4-使用-python-拆分-pdf">4. 使用 Python 拆分 PDF</h2>
<script type="application/javascript" src="https://gist.github.com/fileformat-blog-gists/0dee64422ac0dcf44cf027d90567bbf8.js?file=split-pdf-using-pypdf-in-python.py"></script>

<h3 id="代码解释-3">代码解释</h3>
<p>上述 Python 脚本使用 <strong>PyPDF</strong> 库将一个 PDF 拆分为单独的页面。它首先确保输出目录存在，然后读取输入 PDF 文件。脚本遍历每一页，创建一个新的 <strong>PdfWriter</strong> 对象，并将每一页保存为单独的 PDF 文件。输出文件按顺序命名（例如，<strong>page_1.pdf, page_2.pdf</strong>）并存储在 <strong><code>output-dir</code></strong> 文件夹中。最后，它为每个创建的文件打印确认消息，并在完成时通知。</p>
<h3 id="输入-pdf-文件和拆分后的输出文件">输入 PDF 文件和拆分后的输出文件</h3>
<ul>
<li><strong>输入 PDF 文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdf-to-split">下载链接</a></li>
<li><strong>拆分后的输出文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdf-to-split/output-dir">下载链接</a></li>
</ul>
<h2 id="5-使用-python-为-pdf-添加水印">5. 使用 Python 为 PDF 添加水印</h2>
<p>您可以通过将水印 PDF 叠加到现有 PDF 上，使用 PyPDF 库为 PDF 添加水印。确保水印 PDF 只有一页，以便正确地应用到主 PDF 的每页。</p>
<script type="application/javascript" src="https://gist.github.com/fileformat-blog-gists/af057943580e2fcde6a635df34d7e39a.js?file=watermark-pdf-using-pypdf-in-python.py"></script>

<h3 id="代码解释-4">代码解释</h3>
<p>上述 Python 脚本读取一个输入 PDF，提取一个单页的水印 PDF，将水印覆盖在输入 PDF 的每一页上，并保存最终的带水印的 PDF。</p>
<p><strong>代码拆解</strong></p>
<p>以下是每部分的简要解释</p>
<p><strong>1. 导入必需类</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">from</span> pypdf <span style="color:#f92672">import</span> PdfReader, PdfWriter
</span></span></code></pre></div><ul>
<li><strong><code>PdfReader</code></strong> 用于读取现有 PDF。</li>
<li><strong><code>PdfWriter</code></strong> 用于创建并写入新 PDF。</li>
</ul>
<p><strong>2. 定义文件路径</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>input_pdf <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pdf-to-watermark/input.pdf&#34;</span>
</span></span><span style="display:flex;"><span>watermark_pdf <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pdf-to-watermark/watermark.pdf&#34;</span>
</span></span><span style="display:flex;"><span>output_pdf <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;pdf-to-watermark/output_with_watermark.pdf&#34;</span>
</span></span></code></pre></div><ul>
<li><code>input_pdf</code>: 将添加水印的原始 PDF。</li>
<li><code>watermark_pdf</code>: 一个单页独立 PDF，作为水印。</li>
<li><code>output_pdf</code>: 包含水印页面的输出文件。</li>
</ul>
<p><strong>3. 阅读 PDF</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>reader <span style="color:#f92672">=</span> PdfReader(input_pdf)
</span></span><span style="display:flex;"><span>watermark <span style="color:#f92672">=</span> PdfReader(watermark_pdf)
</span></span></code></pre></div><ul>
<li><code>reader</code>: 读取输入 PDF。</li>
<li><code>watermark</code>: 读取水印 PDF。</li>
</ul>
<p><strong>4. 创建 Writer 对象</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>writer <span style="color:#f92672">=</span> PdfWriter()
</span></span></code></pre></div><ul>
<li>这将用于创建最终带水印的 PDF。</li>
</ul>
<p><strong>5. 提取水印页</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>watermark_page <span style="color:#f92672">=</span> watermark<span style="color:#f92672">.</span>pages[<span style="color:#ae81ff">0</span>]
</span></span></code></pre></div><ul>
<li>假定水印 PDF 只包含<strong>一页</strong>，用于覆盖所有页面。</li>
</ul>
<p><strong>6. 遍历输入 PDF 页面并合并水印</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">for</span> page <span style="color:#f92672">in</span> reader<span style="color:#f92672">.</span>pages:
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># 将水印与当前页面合并</span>
</span></span><span style="display:flex;"><span>    page<span style="color:#f92672">.</span>merge_page(watermark_page)
</span></span><span style="display:flex;"><span>    
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># 将合并后的页面添加到 writer</span>
</span></span><span style="display:flex;"><span>    writer<span style="color:#f92672">.</span>add_page(page)
</span></span></code></pre></div><ul>
<li>遍历 <code>input_pdf</code> 的每一页。</li>
<li><strong><code>merge_page(watermark_page)</code></strong> 在当前页面上覆盖水印。</li>
<li>将修改过的页面添加到 <code>writer</code>。</li>
</ul>
<p><strong>7. 保存带水印的 PDF</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(output_pdf, <span style="color:#e6db74">&#34;wb&#34;</span>) <span style="color:#66d9ef">as</span> output_file:
</span></span><span style="display:flex;"><span>    writer<span style="color:#f92672">.</span>write(output_file)
</span></span></code></pre></div><ul>
<li>将修改后的页面写入新 PDF 文件。</li>
</ul>
<p><strong>8. 打印确认信息</strong></p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>print(<span style="color:#e6db74">f</span><span style="color:#e6db74">&#34;Watermarked PDF saved as: </span><span style="color:#e6db74">{</span>output_pdf<span style="color:#e6db74">}</span><span style="color:#e6db74">&#34;</span>)
</span></span></code></pre></div><ul>
<li>打印输出路径以供确认。</li>
</ul>
<h3 id="输入-pdf水印-pdf-和输出带水印-pdf">输入 PDF、水印 PDF 和输出带水印 PDF</h3>
<ul>
<li><strong>输入 PDF 文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdf-to-watermark">下载链接</a></li>
<li><strong>水印 PDF 文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdf-to-watermark">下载链接</a></li>
<li><strong>输出带水印 PDF 文件：</strong> <a href="https://github.com/fileformat-blog-gists/code/tree/main/working-with-pdf-files-in-python/pdf-to-watermark">下载链接</a></li>
</ul>
<p><strong>截图</strong>
<img loading="lazy" src="https://raw.githubusercontent.com/fileformat-blog-gists/content/main/working-with-pdf-files-in-python/watermark-pdf.png" alt="使用 Python 为 PDF 添加水印的截图"  />
</p>
<h2 id="结论">结论</h2>
<p>在本指南中，我们探讨了 Python 中的重要 PDF 操作，包括提取文本、旋转页面、合并、拆分和添加水印。有了这些技能，您现在可以构建自己的 PDF 管理器，并高效地自动化各种 PDF 任务。</p>
]]></content:encoded>
    </item>
    
  </channel>
</rss>
