Python 自动化办公神器｜一键转换所有文档为 PDF

Python 自动化办公神器｜一键转换所有文档为 PDF_html

本文介绍了一款基于 Python 的自动化文档转换工具，支持 Word、Excel、PPT、TXT、HTML 及图像文件批量转换为 PDF，内建错误处理和日志记录功能。脚本结构清晰，适用于 Windows 平台，适合办公自动化、资料整理、教学备课等场景使用。

前言

在日常工作和学习中，我们常常需要将各种格式的文件（如 Word、Excel、PPT、TXT、HTML 和图片）统一转换为 PDF 格式，以便于归档、打印或分享。手动操作不仅效率低，还容易出错。

本文提供一个完整的 Python 脚本，能够实现对多种办公文档和图像格式的自动化批量转换，并内置了错误处理机制与日志记录系统，确保整个过程安全可靠、可追踪、易维护。

掌握这一工具，将大大提升你的工作效率。

第一章：为什么需要批量文档转PDF？

PDF 是一种跨平台通用的文档格式，具有以下优点：

不依赖特定软件即可查看；
排版不会因设备不同而错乱；
支持加密、签名等安全功能；
便于归档、打印和分享。

然而，面对大量文档时，逐个转换费时费力。这时，使用 Python 编写一个自动化的批量转换脚本就显得尤为重要。

它不仅能节省时间，还能减少人为操作失误，是现代数字办公不可或缺的一环。

第二章：支持转换的文件类型与技术原理

该脚本目前支持以下文件类型的转换：

Microsoft Office 系列：

Word（.doc / .docx）
Excel（.xls / .xlsx）
PowerPoint（.ppt / .pptx）

文本类文档：

TXT（纯文本）
HTML（网页内容）

图像类文件：

JPG、PNG、BMP、GIF、TIFF、TIF 等主流格式

📌 技术说明：
使用 win32com 控制 Microsoft Office 实现 Word、Excel、PPT 的转换；
使用 FPDF + pdfkit 处理 TXT 和 HTML；
使用 PIL（Pillow）处理图像文件；
日志系统采用标准库 logging；
错误处理通过 try-except 捕获异常并记录详细信息。

第三章：安装依赖模块（Windows 平台）

由于部分模块仅适用于 Windows 环境（如 win32com），因此该脚本主要运行于 Windows 系统，并需提前安装 Microsoft Office。

🔧 安装命令如下：

pip install pywin32 pillow fpdf pdfkit

此外，还需安装 wkhtmltopdf 来支持 HTML 到 PDF 的转换：

下载地址：https://wkhtmltopdf.org/downloads.html

下载后请将其添加到系统环境变量 PATH 中，例如：

C:\Program Files\wkhtmltopdf\bin

第四章：转换所有文档为 PDF 的 Python 脚本

import os
import sys
import logging
from win32com import client as win32_client
from PIL import Image
from fpdf import FPDF
import pdfkit
import traceback# 配置日志系统
logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(levelname)s - %(message)s',handlers=[logging.FileHandler('conversion.log', encoding='utf-8'),logging.StreamHarness(sys.stdout)]
)class DocumentConverter:def __init__(self, input_dir, output_dir):self.input_dir = input_dirself.output_dir = output_dirself.supported_extensions = {'doc': self._convert_word,'docx': self._convert_word,'xls': self._convert_excel,'xlsx': self._convert_excel,'ppt': self._convert_powerpoint,'pptx': self._convert_powerpoint,'txt': self._convert_txt,'html': self._convert_html,'htm': self._convert_html,'jpg': self._convert_image,'jpeg': self._convert_image,'png': self._convert_image,'bmp': self._convert_image,'gif': self._convert_image,'tiff': self._convert_image,'tif': self._convert_image}def _ensure_output_dir(self):"""确保输出目录存在"""if not os.path.exists(self.output_dir):os.makedirs(self.output_dir)def _get_files(self):"""获取输入目录中的所有支持文件"""for root, _, files in os.walk(self.input_dir):for file in files:ext = file.split('.')[-1].lower()if ext in self.supported_extensions:yield os.path.join(root, file), extdef _convert_word(self, file_path):"""转换 Word 文档为 PDF"""try:word = win32_client.Dispatch("Word.Application")doc = word.Documents.Open(file_path)output_file = self._output_path(file_path, 'pdf')doc.ExportAsFixedFormat(OutputFileName=output_file,ExportFormat=0,  # wdExportFormatPDFOpenAfterExport=False,OptimizeFor=0,CreateBookmarks=1)doc.Close()word.Quit()logging.info(f"✅ 已成功转换: {file_path}")except Exception as e:logging.error(f"❌ Word转换失败: {file_path} | 错误: {str(e)}\n{traceback.format_exc()}")def _convert_excel(self, file_path):"""转换 Excel 表格为 PDF"""try:excel = win32_client.Dispatch("Excel.Application")wb = excel.Workbooks.Open(file_path)output_file = self._output_path(file_path, 'pdf')wb.ExportAsFixedFormat(Type=0,  # xlTypePDFOutputFileName=output_file,Quality=1)wb.Close()excel.Quit()logging.info(f"✅ 已成功转换: {file_path}")except Exception as e:logging.error(f"❌ Excel转换失败: {file_path} | 错误: {str(e)}\n{traceback.format_exc()}")def _convert_powerpoint(self, file_path):"""转换 PPT 文件为 PDF"""try:powerpoint = win32_client.Dispatch("PowerPoint.Application")presentation = powerpoint.Presentations.Open(file_path)output_file = self._output_path(file_path, 'pdf')presentation.SaveAs(output_file, 32)  # 32 代表 PDF 格式presentation.Close()powerpoint.Quit()logging.info(f"✅ 已成功转换: {file_path}")except Exception as e:logging.error(f"❌ PPT转换失败: {file_path} | 错误: {str(e)}\n{traceback.format_exc()}")def _convert_txt(self, file_path):"""将 TXT 文件转换为 PDF"""try:with open(file_path, 'r', encoding='utf-8') as f:content = f.read()pdf = FPDF()pdf.add_page()pdf.set_auto_page_break(auto=True, margin=15)pdf.set_font("Arial", size=12)for line in content.split('\n'):pdf.cell(0, 10, txt=line, ln=1)output_file = self._output_path(file_path, 'pdf')pdf.output(output_file)logging.info(f"✅ 已成功转换: {file_path}")except Exception as e:logging.error(f"❌ TXT转换失败: {file_path} | 错误: {str(e)}\n{traceback.format_exc()}")def _convert_html(self, file_path):"""将 HTML 文件转换为 PDF"""try:output_file = self._output_path(file_path, 'pdf')pdfkit.from_file(file_path, output_file)logging.info(f"✅ 已成功转换: {file_path}")except Exception as e:logging.error(f"❌ HTML转换失败: {file_path} | 错误: {str(e)}\n{traceback.format_exc()}")def _convert_image(self, file_path):"""将图像文件转换为 PDF"""try:image = Image.open(file_path)if image.mode != "RGB":image = image.convert("RGB")output_file = self._output_path(file_path, 'pdf')image.save(output_file, save_all=True, append_images=[image])logging.info(f"✅ 已成功转换: {file_path}")except Exception as e:logging.error(f"❌ 图像转换失败: {file_path} | 错误: {str(e)}\n{traceback.format_exc()}")def _output_path(self, file_path, new_ext):"""生成输出路径"""filename = os.path.basename(file_path)name = os.path.splitext(filename)[0]return os.path.join(self.output_dir, f"{name}.{new_ext}")def convert_all(self):"""开始批量转换"""self._ensure_output_dir()count = 0for file_path, ext in self._get_files():logging.info(f"🔄 正在转换: {file_path}")self.supported_extensions[ext](file_path)count += 1logging.info(f"📊 共转换 {count} 个文件")if __name__ == '__main__':import argparseparser = argparse.ArgumentParser(description="批量将文档转换为 PDF")parser.add_argument("--input", required=True, help="源文件夹路径")parser.add_argument("--output", required=True, help="目标输出文件夹")args = parser.parse_args()converter = DocumentConverter(args.input, args.output)converter.convert_all()

第五章：如何使用这个脚本？

✅ 使用方法：

将上述脚本保存为 batch_convert_to_pdf.py
在终端执行命令：

python batch_convert_to_pdf.py --input D:\\Documents --output D:\\ConvertedPDFs

📁 输入输出要求：

输入路径应包含待转换的文档；
输出路径会自动创建，无需手动建立；
转换结果以原文件名命名，扩展名为 .pdf。

⚠️ 注意事项：

仅限 Windows 平台使用；
需要安装 Microsoft Office；
建议管理员权限运行；
可根据需求修改日志级别、字体样式、页面设置等。

第六章：脚本亮点与实际应用场景

✨ 脚本亮点：

自动识别多种格式，智能选择对应转换方式；
支持多层级文件夹扫描；
所有转换过程均记录日志，便于排查问题；
支持中断恢复机制（日志可追溯）；
可轻松拓展新增格式支持。

📌 应用场景举例：

学生整理课程资料为 PDF；
企业集中归档合同、报告；
开发者自动生成文档集；
内容创作者打包作品为 PDF；
图书馆或档案馆数字化处理原始文档。

有了这样一个高效的转换工具，你就能把精力集中在更重要的任务上。

第七章：常见问题与解决方案

❗ 问题1：提示“ModuleNotFoundError”

这表示某些依赖未安装，请检查是否已安装以下模块：

pip install pywin32 pillow fpdf pdfkit

同时确认 wkhtmltopdf 是否已加入系统路径。

❗ 问题2：转换 Word/Excel/PPT 失败

可能是 Office 组件版本不兼容或未正确注册 COM 对象。

✅ 解决办法：

重启 Office 或系统；
使用管理员身份运行脚本；
更新 Office 至最新版本；

❗ 问题3：HTML 转换乱码或样式丢失

HTML 内容复杂度高时，pdfkit 可能无法完美还原页面样式。

✅ 解决方案：

使用 --no-sandbox 参数（慎用）；
调整 pdfkit 配置，启用 JavaScript 支持；
若对排版要求极高，建议使用浏览器插件导出 PDF。

总结

该 Python 脚本，支持将 Word、Excel、PPT、TXT、HTML、图像等多种格式批量转换为 PDF，并具备良好的错误处理和日志记录机制。

无论你是学生、教师、行政人员还是开发者，这个脚本都能帮你节省大量时间，让你专注于更有价值的工作。