PyMuPDF 1.24.4 中文文档(六)(2)https://developer.aliyun.com/article/1559576
提取字体和图像
从选定的 PDF 页面提取字体或图像到指定目录:
pymupdf extract -h usage: fitz extract [-h] [-images] [-fonts] [-output OUTPUT] [-password PASSWORD] [-pages PAGES] input --------------------- extract images and fonts to disk -------------------- positional arguments: input PDF filename optional arguments: -h, --help show this help message and exit -images extract images -fonts extract fonts -output OUTPUT output directory, defaults to current -password PASSWORD password -pages PAGES only consider these pages, format: 1,5-7,50-N
图像文件名 根据命名方案构建:“img-xref.ext”,其中“ext”是与图像关联的扩展名,xref
是图像 PDF 对象的 xref
。
字体文件名 由字体名称和相关扩展名组成。字体名称中的任何空格都将替换为连字符“-”。
输出目录必须已经存在。
注意
除了输出目录的创建外,此功能在功能上等同于并淘汰了 此脚本。
合并 PDF 文档
要合并多个 PDF 文件,请指定:
pymupdf join -h usage: fitz join [-h] -output OUTPUT [input [input ...]] ---------------------------- join PDF documents --------------------------- positional arguments: input input filenames optional arguments: -h, --help show this help message and exit -output OUTPUT output filename specify each input as 'filename[,password[,pages]]'
注意
- 每个输入必须输入为 “filename,password,pages”。密码和页面是可选的。
- 如果使用“pages”条目,则需要密码输入。如果 PDF 不需要密码,请指定两个逗号。
- “pages” 的格式与本节顶部解释的相同。
- 每个输入文件在使用后立即关闭。因此,您可以使用其中一个作为输出文件名,并覆盖它。
示例:要合并以下文件
- file1.pdf: 所有页面,从后往前,无密码
- file2.pdf: 最后一页,第一页,密码:“secret”
- file3.pdf: 从第 5 页到最后一页,无密码
并将结果存储为 output.pdf,请输入此命令:
pymupdf join -o output.pdf file1.pdf,N-1 file2.pdf,secret,N,1 file3.pdf,5-N
低级信息
显示 PDF 的内部信息。同样,与 “mutool show” 有相似之处:
pymupdf show -h usage: fitz show [-h] [-password PASSWORD] [-catalog] [-trailer] [-metadata] [-xrefs XREFS] [-pages PAGES] input ------------------------- display PDF information ------------------------- positional arguments: input PDF filename optional arguments: -h, --help show this help message and exit -password PASSWORD password -catalog show PDF catalog -trailer show PDF trailer -metadata show PDF metadata -xrefs XREFS show selected objects, format: 1,5-7,N -pages PAGES show selected pages, format: 1,5-7,50-N
示例:
pymupdf show x.pdf PDF is password protected pymupdf show x.pdf -pass hugo authentication unsuccessful pymupdf show x.pdf -pass jorjmckie authenticated as owner file 'x.pdf', pages: 1, objects: 19, 58 MB, PDF 1.4, encryption: Standard V5 R6 256-bit AES Document contains 15 embedded files. pymupdf show FDA-1572_508_R6_FINAL.pdf -tr -m 'FDA-1572_508_R6_FINAL.pdf', pages: 2, objects: 1645, 1.4 MB, PDF 1.6, encryption: Standard V4 R4 128-bit AES document contains 740 root form fields and is signed ------------------------------- PDF metadata ------------------------------ format: PDF 1.6 title: FORM FDA 1572 author: PSC Publishing Services subject: Statement of Investigator keywords: None creator: PScript5.dll Version 5.2.2 producer: Acrobat Distiller 9.0.0 (Windows) creationDate: D:20130522104413-04'00' modDate: D:20190718154905-07'00' encryption: Standard V4 R4 128-bit AES ------------------------------- PDF trailer ------------------------------- << /DecodeParms << /Columns 5 /Predictor 12 >> /Encrypt 1389 0 R /Filter /FlateDecode /ID [ <9252E9E39183F2A0B0C51BE557B8A8FC> <85227BE9B84B724E8F678E1529BA8351> ] /Index [ 1388 258 ] /Info 1387 0 R /Length 253 /Prev 1510559 /Root 1390 0 R /Size 1646 /Type /XRef /W [ 1 3 1 ] >>
嵌入文件命令
以下命令处理嵌入文件 - 这是自 MuPDF v1.14 后完全从 MuPDF 及其所有命令行工具中移除的功能。
信息
显示嵌入文件名(长格式或短格式):
pymupdf embed-info -h usage: fitz embed-info [-h] [-name NAME] [-detail] [-password PASSWORD] input --------------------------- list embedded files --------------------------- positional arguments: input PDF filename optional arguments: -h, --help show this help message and exit -name NAME if given, report only this one -detail show detail information -password PASSWORD password
示例:
pymupdf embed-info some.pdf 'some.pdf' contains the following 15 embedded files. 20110813_180956_0002.jpg 20110813_181009_0003.jpg 20110813_181012_0004.jpg 20110813_181131_0005.jpg 20110813_181144_0006.jpg 20110813_181306_0007.jpg 20110813_181307_0008.jpg 20110813_181314_0009.jpg 20110813_181315_0010.jpg 20110813_181324_0011.jpg 20110813_181339_0012.jpg 20110813_181913_0013.jpg insta-20110813_180944_0001.jpg markiert-20110813_180944_0001.jpg neue.datei
每个条目的详细输出如下所示:
name: neue.datei filename: text-tester.pdf ufilename: text-tester.pdf desc: nur zum Testen! size: 4639 length: 1566
提取
像这样提取嵌入文件:
pymupdf embed-extract -h usage: fitz embed-extract [-h] -name NAME [-password PASSWORD] [-output OUTPUT] input ---------------------- extract embedded file to disk ---------------------- positional arguments: input PDF filename optional arguments: -h, --help show this help message and exit -name NAME name of entry -password PASSWORD password -output OUTPUT output filename, default is stored name
有关详细信息,请参阅 Document.embfile_get()
。示例(参见前一节):
pymupdf embed-extract some.pdf -name neue.datei Saved entry 'neue.datei' as 'text-tester.pdf'
删除
像这样删除嵌入文件:
pymupdf embed-del -h usage: fitz embed-del [-h] [-password PASSWORD] [-output OUTPUT] -name NAME input --------------------------- delete embedded file -------------------------- positional arguments: input PDF filename optional arguments: -h, --help show this help message and exit -password PASSWORD password -output OUTPUT output PDF filename, incremental save if none -name NAME name of entry to delete
有关详细信息,请参阅 Document.embfile_del()
。
插入
使用以下命令添加新的嵌入文件:
pymupdf embed-add -h usage: fitz embed-add [-h] [-password PASSWORD] [-output OUTPUT] -name NAME -path PATH [-desc DESC] input ---------------------------- add embedded file ---------------------------- positional arguments: input PDF filename optional arguments: -h, --help show this help message and exit -password PASSWORD password -output OUTPUT output PDF filename, incremental save if none -name NAME name of new entry -path PATH path to data for new entry -desc DESC description of new entry
“NAME” 绝不能已存在于 PDF 中。有关详细信息,请参阅 Document.embfile_add()
。
更新
使用以下命令更新现有的嵌入文件:
pymupdf embed-upd -h usage: fitz embed-upd [-h] -name NAME [-password PASSWORD] [-output OUTPUT] [-path PATH] [-filename FILENAME] [-ufilename UFILENAME] [-desc DESC] input --------------------------- update embedded file -------------------------- positional arguments: input PDF filename optional arguments: -h, --help show this help message and exit -name NAME name of entry -password PASSWORD password -output OUTPUT Output PDF filename, incremental save if none -path PATH path to new data for entry -filename FILENAME new filename to store in entry -ufilename UFILENAME new unicode filename to store in entry -desc DESC new description to store in entry except '-name' all parameters are optional
使用此方法更改文件的元信息 - 只需省略 “PATH”。有关详细信息,请参阅 Document.embfile_upd()
。
复制
在 PDF 之间复制嵌入的文件:
pymupdf embed-copy -h usage: fitz embed-copy [-h] [-password PASSWORD] [-output OUTPUT] -source SOURCE [-pwdsource PWDSOURCE] [-name [NAME [NAME ...]]] input --------------------- copy embedded files between PDFs -------------------- positional arguments: input PDF to receive embedded files optional arguments: -h, --help show this help message and exit -password PASSWORD password of input -output OUTPUT output PDF, incremental save to 'input' if omitted -source SOURCE copy embedded files from here -pwdsource PWDSOURCE password of 'source' PDF -name [NAME [NAME ...]] restrict copy to these entries
PyMuPDF 1.24.4 中文文档(六)(4)https://developer.aliyun.com/article/1559578