PyMuPDF 1.24.4 中文文档(三)(2)https://developer.aliyun.com/article/1559530
如何使用像素图:粘合图像
这展示了像素图如何用于纯粹的图形非文档目的。脚本读取图像文件并创建一个新图像,由原始图像的 3 * 4 个瓷砖组成:
import pymupdf src = pymupdf.Pixmap("img-7edges.png") # create pixmap from a picture col = 3 # tiles per row lin = 4 # tiles per column tar_w = src.width * col # width of target tar_h = src.height * lin # height of target # create target pixmap tar_pix = pymupdf.Pixmap(src.colorspace, (0, 0, tar_w, tar_h), src.alpha) # now fill target with the tiles for i in range(col): for j in range(lin): src.set_origin(src.width * i, src.height * j) tar_pix.copy(src, src.irect) # copy input to new loc tar_pix.save("tar.png")
这是输入图片:
这是输出结果:
如何使用 Pixmaps:制作分形
这是另一个 Pixmap 示例,创建了谢尔宾斯基地毯 - 将康托集推广到二维的分形。给定一个正方形地毯,标记其 9 个子正方形(3×3)并切除中心的一个。以相同方式处理剩下的八个子正方形,并无限继续。最终结果是一个面积为零、分形维度为 1.8928… 的集合。
此脚本通过一像素的粒度制作其近似图像,将其制作为 PNG。要增加图像的精度,请更改 n 的值(精度):
import pymupdf, time if not list(map(int, pymupdf.VersionBind.split("."))) >= [1, 14, 8]: raise SystemExit("need PyMuPDF v1.14.8 for this script") n = 6 # depth (precision) d = 3**n # edge length t0 = time.perf_counter() ir = (0, 0, d, d) # the pixmap rectangle pm = pymupdf.Pixmap(pymupdf.csRGB, ir, False) pm.set_rect(pm.irect, (255,255,0)) # fill it with some background color color = (0, 0, 255) # color to fill the punch holes # alternatively, define a 'fill' pixmap for the punch holes # this could be anything, e.g. some photo image ... fill = pymupdf.Pixmap(pymupdf.csRGB, ir, False) # same size as 'pm' fill.set_rect(fill.irect, (0, 255, 255)) # put some color in def punch(x, y, step): """Recursively "punch a hole" in the central square of a pixmap. Arguments are top-left coords and the step width. Some alternative punching methods are commented out. """ s = step // 3 # the new step # iterate through the 9 sub-squares # the central one will be filled with the color for i in range(3): for j in range(3): if i != j or i != 1: # this is not the central cube if s >= 3: # recursing needed? punch(x+i*s, y+j*s, s) # recurse else: # punching alternatives are: pm.set_rect((x+s, y+s, x+2*s, y+2*s), color) # fill with a color #pm.copy(fill, (x+s, y+s, x+2*s, y+2*s)) # copy from fill #pm.invert_irect((x+s, y+s, x+2*s, y+2*s)) # invert colors return #============================================================================== # main program #============================================================================== # now start punching holes into the pixmap punch(0, 0, d) t1 = time.perf_counter() pm.save("sierpinski-punch.png") t2 = time.perf_counter() print ("%g sec to create / fill the pixmap" % round(t1-t0,3)) print ("%g sec to save the image" % round(t2-t1,3))
结果应该看起来像这样:
如何与 NumPy 交互
这显示了如何从一个 numpy 数组创建 PNG 文件(比大多数其他方法快几倍):
import numpy as np import pymupdf #============================================================================== # create a fun-colored width * height PNG with pymupdf and numpy #============================================================================== height = 150 width = 100 bild = np.ndarray((height, width, 3), dtype=np.uint8) for i in range(height): for j in range(width): # one pixel (some fun coloring) bild[i, j] = [(i+j)%256, i%256, j%256] samples = bytearray(bild.tostring()) # get plain pixel data from numpy array pix = pymupdf.Pixmap(pymupdf.csRGB, width, height, samples, alpha=False) pix.save("test.png")
如何向 PDF 页面添加图像
有两种方法向 PDF 页面添加图像:Page.insert_image() 和 Page.show_pdf_page()。这两种方法有共同之处,但也有区别。
| 准则 | Page.insert_image() |
Page.show_pdf_page() |
| 可显示内容 | 图像文件、内存中的图像、Pixmap | PDF 页面 |
| 显示分辨率 | 图像分辨率 | 矢量化(除了光栅页面内容) |
| 旋转 | 0、90、180 或 270 度 | 任意角度 |
| 裁剪 | 否(仅完整图像) | 是 |
| 保持长宽比 | 是(默认选项) | 是(默认选项) |
| 透明度(水印) | 取决于图像 | 取决于页面 |
| 位置 / 放置 | 缩放以适应目标矩形 | 缩放以适应目标矩形 |
| 性能 | 自动防止重复; | 自动防止重复; |
| 多页面图像支持 | 否 | 是 |
| 使用便捷性 | 简单、直观; | 简单、直观;**转换为 PDF 后适用于所有文档类型(包括图像!)**通过 Document.convert_to_pdf() |
Page.insert_image() 的基本代码模式。如果不重新插入现有图像,则必须恰好给出一个参数文件名 / 流 / Pixmap:
page.insert_image( rect, # where to place the image (rect-like) filename=None, # image in a file stream=None, # image in memory (bytes) pixmap=None, # image from pixmap mask=None, # specify alpha channel separately rotate=0, # rotate (int, multiple of 90) xref=0, # re-use existing image oc=0, # control visibility via OCG / OCMD keep_proportion=True, # keep aspect ratio overlay=True, # put in foreground )
Page.show_pdf_page() 的基本代码模式。源 PDF 和目标 PDF 必须是不同的 Document 对象(但可以从同一文件打开):
page.show_pdf_page( rect, # where to place the image (rect-like) src, # source PDF pno=0, # page number in source PDF clip=None, # only display this area (rect-like) rotate=0, # rotate (float, any value) oc=0, # control visibility via OCG / OCMD keep_proportion=True, # keep aspect ratio overlay=True, # put in foreground )
如何使用 Pixmaps:检查文本可见性
给定文本是否实际上在页面上可见取决于多个因素:
- 文本未被其他对象覆盖,但可能与背景颜色相同,例如白色背景等。
- 文本可能被图像或矢量图形覆盖。检测这一点是一个重要的能力,例如揭示糟糕匿名化的法律文件。
- 文本被创建为隐藏状态。这种技术通常被 OCR 工具使用,将识别的文本存储在页面的一个不可见层中。
以下显示如何检测上述情况 1,或者如果遮盖对象是单色的,则检测情况 2。
pix = page.get_pixmap(dpi=150) # make page image with a decent resolution # the following matrix transforms page to pixmap coordinates mat = page.rect.torect(pix.irect) # search for some string "needle" rlist = page.search_for("needle") # check the visibility for each hit rectangle for rect in rlist: if pix.color_topusage(clip=rect * mat)[0] > 0.95: print("'needle' is invisible here:", rect)
方法 Pixmap.color_topusage() 返回一个元组 (ratio, pixel),其中 0 < ratio <= 1,pixel 是颜色的像素值。请注意,我们只创建一次pixmap。如果有多个命中矩形,这可以节省大量处理时间。
上述代码的逻辑是:如果针的矩形是(“几乎”: > 95%)单色的,则文本不可见。对于可见文本的典型结果返回背景的颜色(主要是白色)和约 0.7 到 0.8 的比率,例如 (0.685, b'xffxffxff')。
对本页面有任何反馈吗?
本软件按原样提供,不附带任何明示或暗示的保证。本软件在许可下分发,除非在该许可的条款下明确授权,否则不得复制、修改或分发。请参阅 artifex.com 的许可信息或联系位于美国旧金山 CA 94129 Mesa Street, Suite 108A 的 Artifex Software Inc. 了解更多信息。
本文档涵盖所有版本直至 1.24.4。
注释
原文:
pymupdf.readthedocs.io/en/latest/recipes-annotations.html
如何添加和修改注释
在 PyMuPDF 中,可以通过 Page 方法添加新注释。一旦注释存在,就可以使用 Annot 类的方法在很大程度上进行修改。
注释只能插入到 PDF 页面中 - 其他文档类型不支持插入注释。
与许多其他工具不同,初始插入注释时使用了最少数量的属性。我们留给程序员设置诸如作者、创建日期或主题等属性。
作为这些功能的概述,请看下面填充 PDF 页面的脚本,其中包含大部分可用注释。在后续部分中查看更多特殊情况:
# -*- coding: utf-8 -*- """ ------------------------------------------------------------------------------- Demo script showing how annotations can be added to a PDF using PyMuPDF. It contains the following annotation types: Caret, Text, FreeText, text markers (underline, strike-out, highlight, squiggle), Circle, Square, Line, PolyLine, Polygon, FileAttachment, Stamp and Redaction. There is some effort to vary appearances by adding colors, line ends, opacity, rotation, dashed lines, etc. Dependencies ------------ PyMuPDF v1.17.0 ------------------------------------------------------------------------------- """ from __future__ import print_function import gc import sys import pymupdf print(pymupdf.__doc__) if pymupdf.VersionBind.split(".") < ["1", "17", "0"]: sys.exit("PyMuPDF v1.17.0+ is needed.") gc.set_debug(gc.DEBUG_UNCOLLECTABLE) highlight = "this text is highlighted" underline = "this text is underlined" strikeout = "this text is striked out" squiggled = "this text is zigzag-underlined" red = (1, 0, 0) blue = (0, 0, 1) gold = (1, 1, 0) green = (0, 1, 0) displ = pymupdf.Rect(0, 50, 0, 50) r = pymupdf.Rect(72, 72, 220, 100) t1 = u"têxt üsès Lätiñ charß,\nEUR: €, mu: µ, super scripts: ²³!" def print_descr(annot): """Print a short description to the right of each annot rect.""" annot.parent.insert_text( annot.rect.br + (10, -5), "%s annotation" % annot.type[1], color=red ) doc = pymupdf.open() page = doc.new_page() page.set_rotation(0) annot = page.add_caret_annot(r.tl) print_descr(annot) r = r + displ annot = page.add_freetext_annot( r, t1, fontsize=10, rotate=90, text_color=blue, fill_color=gold, align=pymupdf.TEXT_ALIGN_CENTER, ) annot.set_border(width=0.3, dashes=[2]) annot.update(text_color=blue, fill_color=gold) print_descr(annot) r = annot.rect + displ annot = page.add_text_annot(r.tl, t1) print_descr(annot) # Adding text marker annotations: # first insert a unique text, then search for it, then mark it pos = annot.rect.tl + displ.tl page.insert_text( pos, # insertion point highlight, # inserted text morph=(pos, pymupdf.Matrix(-5)), # rotate around insertion point ) rl = page.search_for(highlight, quads=True) # need a quad b/o tilted text annot = page.add_highlight_annot(rl[0]) print_descr(annot) pos = annot.rect.bl # next insertion point page.insert_text(pos, underline, morph=(pos, pymupdf.Matrix(-10))) rl = page.search_for(underline, quads=True) annot = page.add_underline_annot(rl[0]) print_descr(annot) pos = annot.rect.bl page.insert_text(pos, strikeout, morph=(pos, pymupdf.Matrix(-15))) rl = page.search_for(strikeout, quads=True) annot = page.add_strikeout_annot(rl[0]) print_descr(annot) pos = annot.rect.bl page.insert_text(pos, squiggled, morph=(pos, pymupdf.Matrix(-20))) rl = page.search_for(squiggled, quads=True) annot = page.add_squiggly_annot(rl[0]) print_descr(annot) pos = annot.rect.bl r = pymupdf.Rect(pos, pos.x + 75, pos.y + 35) + (0, 20, 0, 20) annot = page.add_polyline_annot([r.bl, r.tr, r.br, r.tl]) # 'Polyline' annot.set_border(width=0.3, dashes=[2]) annot.set_colors(stroke=blue, fill=green) annot.set_line_ends(pymupdf.PDF_ANNOT_LE_CLOSED_ARROW, pymupdf.PDF_ANNOT_LE_R_CLOSED_ARROW) annot.update(fill_color=(1, 1, 0)) print_descr(annot) r += displ annot = page.add_polygon_annot([r.bl, r.tr, r.br, r.tl]) # 'Polygon' annot.set_border(width=0.3, dashes=[2]) annot.set_colors(stroke=blue, fill=gold) annot.set_line_ends(pymupdf.PDF_ANNOT_LE_DIAMOND, pymupdf.PDF_ANNOT_LE_CIRCLE) annot.update() print_descr(annot) r += displ annot = page.add_line_annot(r.tr, r.bl) # 'Line' annot.set_border(width=0.3, dashes=[2]) annot.set_colors(stroke=blue, fill=gold) annot.set_line_ends(pymupdf.PDF_ANNOT_LE_DIAMOND, pymupdf.PDF_ANNOT_LE_CIRCLE) annot.update() print_descr(annot) r += displ annot = page.add_rect_annot(r) # 'Square' annot.set_border(width=1, dashes=[1, 2]) annot.set_colors(stroke=blue, fill=gold) annot.update(opacity=0.5) print_descr(annot) r += displ annot = page.add_circle_annot(r) # 'Circle' annot.set_border(width=0.3, dashes=[2]) annot.set_colors(stroke=blue, fill=gold) annot.update() print_descr(annot) r += displ annot = page.add_file_annot( r.tl, b"just anything for testing", "testdata.txt" # 'FileAttachment' ) print_descr(annot) # annot.rect r += displ annot = page.add_stamp_annot(r, stamp=10) # 'Stamp' annot.set_colors(stroke=green) annot.update() print_descr(annot) r += displ + (0, 0, 50, 10) rc = page.insert_textbox( r, "This content will be removed upon applying the redaction.", color=blue, align=pymupdf.TEXT_ALIGN_CENTER, ) annot = page.add_redact_annot(r) print_descr(annot) doc.save(__file__.replace(".py", "-%i.pdf" % page.rotation), deflate=True)
此脚本应导致以下输出:
- 如何使用自由文本
此脚本展示了处理 ‘FreeText’ 注释的几种方式:
# -*- coding: utf-8 -*- import pymupdf # some colors blue = (0,0,1) green = (0,1,0) red = (1,0,0) gold = (1,1,0) # a new PDF with 1 page doc = pymupdf.open() page = doc.new_page() # 3 rectangles, same size, above each other r1 = pymupdf.Rect(100,100,200,150) r2 = r1 + (0,75,0,75) r3 = r2 + (0,75,0,75) # the text, Latin alphabet t = "¡Un pequeño texto para practicar!" # add 3 annots, modify the last one somewhat a1 = page.add_freetext_annot(r1, t, color=red) a2 = page.add_freetext_annot(r2, t, fontname="Ti", color=blue) a3 = page.add_freetext_annot(r3, t, fontname="Co", color=blue, rotate=90) a3.set_border(width=0) a3.update(fontsize=8, fill_color=gold) # save the PDF doc.save("a-freetext.pdf")
结果如下:
PyMuPDF 1.24.4 中文文档(三)(4)https://developer.aliyun.com/article/1559533