PyMuPDF 1.24.4 中文文档(四)(2)https://developer.aliyun.com/article/1559455
如何使用图片
图像可以在提供的 HTML 源码中引用,或者可以通过 Python API 存储对所需图像的引用。无论哪种方式,都需要使用一个 Archive,这指的是可以找到图像的位置。
注意
在 HTML 源码中嵌入二进制内容的图像不受支持。
我们从上面扩展我们的“Hello World”示例,并在文本后显示我们星球的图像。假设图像名称为“world.jpg”并位于脚本文件夹中,则以上 Python API 变体的修改版本如下:
import pymupdf MEDIABOX = pymupdf.paper_rect("letter") WHERE = MEDIABOX + (36, 36, -36, -36) # create story, let it look at script folder for resources story = pymupdf.Story(archive=".") body = story.body # access the body of its DOM with body.add_paragraph() as para: # store desired content para.set_font("sans-serif").set_color("blue").add_text("Hello World!") # another paragraph for our image: with body.add_paragraph() as para: # store image in another paragraph para.add_image("world.jpg") writer = pymupdf.DocumentWriter("output.pdf") more = 1 while more: device = writer.begin_page(MEDIABOX) more, _ = story.place(WHERE) story.draw(device) writer.end_page() writer.close()
如何读取故事的外部 HTML 和 CSS
这些情况都相当简单。
作为一般建议,应将 HTML 和 CSS 源文件作为二进制文件读取并在使用之前解码。Python 的pathlib.Path
提供了方便的方法来实现这一点:
import pathlib import pymupdf htmlpath = pathlib.Path("myhtml.html") csspath = pathlib.Path("mycss.css") HTML = htmlpath.read_bytes().decode() CSS = csspath.read_bytes().decode() story = pymupdf.Story(html=HTML, user_css=CSS)
如何使用故事模板输出数据库内容
这个脚本演示了如何使用HTML 模板报告 SQL 数据库内容。
此示例 SQL 数据库包含两个表:
- 表“films”包含每部电影的一行,字段包括**“title”、“director”和(发布)“year”**。
- 表“actors”包含每个演员和电影标题的一行(字段(演员)“name”和(电影)“title”)。
故事 DOM 包括一个电影模板,其中报告了与一组演员的电影数据。
文件:
docs/samples/filmfestival-sql.py
docs/samples/filmfestival-sql.db
查看配方
""" This is a demo script for using PyMuPDF with its "Story" feature. The following aspects are being covered here: * The script produces a report of films that are stored in an SQL database * The report format is provided as a HTML template The SQL database contains two tables: 1\. Table "films" which has the columns "title" (film title, str), "director" (str) and "year" (year of release, int). 2\. Table "actors" which has the columns "name" (actor name, str) and "title" (the film title where the actor had been casted, str). The script reads all content of the "films" table. For each film title it reads all rows from table "actors" which took part in that film. Comment 1 --------- To keep things easy and free from pesky technical detail, the relevant file names inherit the name of this script: - the database's filename is the script name with ".py" extension replaced by ".db". - the output PDF similarly has script file name with extension ".pdf". Comment 2 --------- The SQLITE database has been created using https://sqlitebrowser.org/, a free multi-platform tool to maintain or manipulate SQLITE databases. """ import os import sqlite3 import pymupdf # ---------------------------------------------------------------------- # HTML template for the film report # There are four placeholders coded as "id" attributes. # One "id" allows locating the template part itself, the other three # indicate where database text should be inserted. # ---------------------------------------------------------------------- festival_template = ( "<html><head><title>Just some arbitrary text</title></head>" '<body><h1 style="text-align:center">Hook Norton Film Festival</h1>' "<ol>" '<li id="filmtemplate">' '<b id="filmtitle"></b>' "<dl>" '<dt>Director<dd id="director">' '<dt>Release Year<dd id="filmyear">' '<dt>Cast<dd id="cast">' "</dl>" "</li>" "</ol>" "</body></html" ) # ------------------------------------------------------------------- # define database access # ------------------------------------------------------------------- dbfilename = __file__.replace(".py", ".db") # the SQLITE database file name assert os.path.isfile(dbfilename), f'{dbfilename}' database = sqlite3.connect(dbfilename) # open database cursor_films = database.cursor() # cursor for selecting the films cursor_casts = database.cursor() # cursor for selecting actors per film # select statement for the films - let SQL also sort it for us select_films = """SELECT title, director, year FROM films ORDER BY title""" # select stament for actors, a skeleton: sub-select by film title select_casts = """SELECT name FROM actors WHERE film = "%s" ORDER BY name""" # ------------------------------------------------------------------- # define the HTML Story and fill it with database data # ------------------------------------------------------------------- story = pymupdf.Story(festival_template) body = story.body # access the HTML body detail template = body.find(None, "id", "filmtemplate") # find the template part # read the films from the database and put them all in one Python list # NOTE: instead we might fetch rows one by one (advisable for large volumes) cursor_films.execute(select_films) # execute cursor, and ... films = cursor_films.fetchall() # read out what was found for title, director, year in films: # iterate through the films film = template.clone() # clone template to report each film film.find(None, "id", "filmtitle").add_text(title) # put title in templ film.find(None, "id", "director").add_text(director) # put director film.find(None, "id", "filmyear").add_text(str(year)) # put year # the actors reside in their own table - find the ones for this film title cursor_casts.execute(select_casts % title) # execute cursor casts = cursor_casts.fetchall() # read actors for the film # each actor name appears in its own tuple, so extract it from there film.find(None, "id", "cast").add_text("\n".join([c[0] for c in casts])) body.append_child(film) template.remove() # remove the template # ------------------------------------------------------------------- # generate the PDF # ------------------------------------------------------------------- writer = pymupdf.DocumentWriter(__file__.replace(".py", ".pdf"), "compress") mediabox = pymupdf.paper_rect("a4") # use pages in ISO-A4 format where = mediabox + (72, 36, -36, -72) # leave page borders more = 1 # end of output indicator while more: dev = writer.begin_page(mediabox) # make a new page more, filled = story.place(where) # arrange content for this page story.draw(dev, None) # write content to page writer.end_page() # finish the page writer.close() # close the PDF
如何与现有的 PDF 整合
因为 DocumentWriter 只能写入新文件,所以无法将故事放置在现有页面上。此脚本演示了如何绕过此限制。
基本思路是让 DocumentWriter 将输出到内存中的 PDF。一旦故事完成,我们重新打开此内存 PDF,并通过方法Page.show_pdf_page()
将其页面放置到现有页面的所需位置。
文件:
docs/samples/showpdf-page.py
查看配方
""" Demo of Story class in PyMuPDF ------------------------------- This script demonstrates how to the results of a pymupdf.Story output can be placed in a rectangle of an existing (!) PDF page. """ import io import os import pymupdf def make_pdf(fileptr, text, rect, font="sans-serif", archive=None): """Make a memory DocumentWriter from HTML text and a rect. Args: fileptr: a Python file object. For example an io.BytesIO(). text: the text to output (HTML format) rect: the target rectangle. Will use its width / height as mediabox font: (str) font family name, default sans-serif archive: pymupdf.Archive parameter. To be used if e.g. images or special fonts should be used. Returns: The matrix to convert page rectangles of the created PDF back to rectangle coordinates in the parameter "rect". Normal use will expect to fit all the text in the given rect. However, if an overflow occurs, this function will output multiple pages, and the caller may decide to either accept or retry with changed parameters. """ # use input rectangle as the page dimension mediabox = pymupdf.Rect(0, 0, rect.width, rect.height) # this matrix converts mediabox back to input rect matrix = mediabox.torect(rect) story = pymupdf.Story(text, archive=archive) body = story.body body.set_properties(font=font) writer = pymupdf.DocumentWriter(fileptr) while True: device = writer.begin_page(mediabox) more, _ = story.place(mediabox) story.draw(device) writer.end_page() if not more: break writer.close() return matrix # ------------------------------------------------------------- # We want to put this in a given rectangle of an existing page # ------------------------------------------------------------- HTML = """ <p>PyMuPDF is a great package! And it still improves significantly from one version to the next one!</p> <p>It is a Python binding for <b>MuPDF</b>, a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit.<br> Both are maintained and developed by Artifex Software, Inc.</p> <p>Via MuPDF it can access files in PDF, XPS, OpenXPS, CBZ, EPUB, MOBI and FB2 (e-books) formats,<br> and it is known for its top <b><i>performance</i></b> and <b><i>rendering quality.</p>""" # Make a PDF page for demo purposes root = os.path.abspath( f"{__file__}/..") doc = pymupdf.open(f"{root}/mupdf-title.pdf") page = doc[0] WHERE = pymupdf.Rect(50, 100, 250, 500) # target rectangle on existing page fileptr = io.BytesIO() # let DocumentWriter use this as its file # ------------------------------------------------------------------- # call DocumentWriter and Story to fill our rectangle matrix = make_pdf(fileptr, HTML, WHERE) # ------------------------------------------------------------------- src = pymupdf.open("pdf", fileptr) # open DocumentWriter output PDF if src.page_count > 1: # target rect was too small raise ValueError("target WHERE too small") # its page 0 contains our result page.show_pdf_page(WHERE, src, 0) doc.ez_save(f"{root}/mupdf-title-after.pdf")
如何制作多栏布局并从包pymupdf-fonts中访问字体
此脚本输出一篇文章(摘自维基百科),包含文本和多个图像,并使用两栏页面布局。
此外,使用包pymupdf-fonts中的两个“Ubuntu”字体系列,而不是默认的 Base-14 字体。
此处使用的另一个功能是将所有数据 - 图像和文章 HTML - 共同存储在 ZIP 文件中。
文件:
docs/samples/quickfox.py
docs/samples/quickfox.zip
查看配方
""" This is a demo script using PyMuPDF's Story class to output text as a PDF with a two-column page layout. The script demonstrates the following features: * How to fill columns or table cells of complex page layouts * How to embed images * How to modify existing, given HTML sources for output (text indent, font size) * How to use fonts defined in package "pymupdf-fonts" * How to use ZIP files as Archive -------------- The example is taken from the somewhat modified Wikipedia article https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog. -------------- """ import io import os import zipfile import pymupdf thisdir = os.path.dirname(os.path.abspath(__file__)) myzip = zipfile.ZipFile(os.path.join(thisdir, "quickfox.zip")) arch = pymupdf.Archive(myzip) if pymupdf.fitz_fontdescriptors: # we want to use the Ubuntu fonts for sans-serif and for monospace CSS = pymupdf.css_for_pymupdf_font("ubuntu", archive=arch, name="sans-serif") CSS = pymupdf.css_for_pymupdf_font("ubuntm", CSS=CSS, archive=arch, name="monospace") else: # No pymupdf-fonts available. CSS="" docname = __file__.replace(".py", ".pdf") # output PDF file name HTML = myzip.read("quickfox.html").decode() # make the Story object story = pymupdf.Story(HTML, user_css=CSS, archive=arch) # -------------------------------------------------------------- # modify the DOM somewhat # -------------------------------------------------------------- body = story.body # access HTML body body.set_properties(font="sans-serif") # and give it our font globally # modify certain nodes para = body.find("p", None, None) # find relevant nodes (here: paragraphs) while para != None: para.set_properties( # method MUST be used for existing nodes indent=15, fontsize=13, ) para = para.find_next("p", None, None) # choose PDF page size MEDIABOX = pymupdf.paper_rect("letter") # text appears only within this subrectangle WHERE = MEDIABOX + (36, 36, -36, -36) # -------------------------------------------------------------- # define page layout within the WHERE rectangle # -------------------------------------------------------------- COLS = 2 # layout: 2 cols 1 row ROWS = 1 TABLE = pymupdf.make_table(WHERE, cols=COLS, rows=ROWS) # fill the cells of each page in this sequence: CELLS = [TABLE[i][j] for i in range(ROWS) for j in range(COLS)] fileobject = io.BytesIO() # let DocumentWriter write to memory writer = pymupdf.DocumentWriter(fileobject) # define the writer more = 1 while more: # loop until all input text has been written out dev = writer.begin_page(MEDIABOX) # prepare a new output page for cell in CELLS: # content may be complete after any cell, ... if more: # so check this status first more, _ = story.place(cell) story.draw(dev) writer.end_page() # finish the PDF page writer.close() # close DocumentWriter output # for housekeeping work re-open from memory doc = pymupdf.open("pdf", fileobject) doc.ez_save(docname)
如何制作布局以包围预定义的“不适合区域”布局
这是一个演示脚本,使用 PyMuPDF 的 Story 类将文本输出为具有两栏页面布局的 PDF。
该脚本演示了以下功能:
- 将文本布局在现有(“目标”)PDF 的图像周围。
- 基于几个全局参数,识别每个页面上可用于接收由 Story 布局的文本的区域。
- 这些全局参数未存储在目标 PDF 中,因此必须以某种方式提供:
- 每个页面上边框的宽度。
- 用于文本的字体大小。该值决定提供的文本是否适合目标 PDF 的(固定)页面上的空白处。无法以任何方式预测。如果目标 PDF 页面不足,脚本将以异常结束,并且如果不是所有页面至少接收到一些文本,则打印警告消息。在这两种情况下,可以更改 FONTSIZE 的值(浮点数值)。
- 用于文本的两栏页面布局。
- 此布局创建一个临时(内存)PDF。其生成的页面内容(文本)用于覆盖相应的目标页面。如果文本需要的页面比目标 PDF 中可用的页面多,将引发异常。如果并非所有目标页面都至少接收到一些文本,则会打印警告。
- 此脚本在其自己的文件夹中读取“image-no-go.pdf”。这是“目标”PDF。它包含 2 页,每页有 2 张图片(来自原始文章),它们被定位在创建广泛的整体测试覆盖范围的地方。否则页面为空白。
- 此脚本生成了“quickfox-image-no-go.pdf”,其中包含原始页面和图像位置,但文本围绕它们布局。
文件:
docs/samples/quickfox-image-no-go.py
docs/samples/quickfox-image-no-go.pdf
docs/samples/quickfox.zip
查看步骤
""" This is a demo script using PyMuPDF's Story class to output text as a PDF with a two-column page layout. The script demonstrates the following features: * Layout text around images of an existing ("target") PDF. * Based on a few global parameters, areas on each page are identified, that can be used to receive text layouted by a Story. * These global parameters are not stored anywhere in the target PDF and must therefore be provided in some way. - The width of the border(s) on each page. - The fontsize to use for text. This value determines whether the provided text will fit in the empty spaces of the (fixed) pages of target PDF. It cannot be predicted in any way. The script ends with an exception if target PDF has not enough pages, and prints a warning message if not all pages receive at least some text. In both cases, the FONTSIZE value can be changed (a float value). - Use of a 2-column page layout for the text. * The layout creates a temporary (memory) PDF. Its produced page content (the text) is used to overlay the corresponding target page. If text requires more pages than are available in target PDF, an exception is raised. If not all target pages receive at least some text, a warning is printed. * The script reads "image-no-go.pdf" in its own folder. This is the "target" PDF. It contains 2 pages with each 2 images (from the original article), which are positioned at places that create a broad overall test coverage. Otherwise the pages are empty. * The script produces "quickfox-image-no-go.pdf" which contains the original pages and image positions, but with the original article text laid out around them. Note: -------------- This script version uses just image positions to derive "No-Go areas" for layouting the text. Other PDF objects types are detectable by PyMuPDF and may be taken instead or in addition, without influencing the layouting. The following are candidates for other such "No-Go areas". Each can be detected and located by PyMuPDF: * Annotations * Drawings * Existing text -------------- The text and images are taken from the somewhat modified Wikipedia article https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog. -------------- """ import io import os import zipfile import pymupdf thisdir = os.path.dirname(os.path.abspath(__file__)) myzip = zipfile.ZipFile(os.path.join(thisdir, "quickfox.zip")) docname = os.path.join(thisdir, "image-no-go.pdf") # "no go" input PDF file name outname = os.path.join(thisdir, "quickfox-image-no-go.pdf") # output PDF file name BORDER = 36 # global parameter FONTSIZE = 12.5 # global parameter COLS = 2 # number of text columns, global parameter def analyze_page(page): """Compute MediaBox and rectangles on page that are free to receive text. Notes: Assume a BORDER around the page, make 2 columns of the resulting sub-rectangle and extract the rectangles of all images on page. For demo purposes, the image rectangles are taken as "NO-GO areas" on the page when writing text with the Story. The function returns free areas for each of the columns. Returns: (page.number, mediabox, CELLS), where CELLS is a list of free cells. """ prect = page.rect # page rectangle - will be our MEDIABOX later where = prect + (BORDER, BORDER, -BORDER, -BORDER) TABLE = pymupdf.make_table(where, rows=1, cols=COLS) # extract rectangles covered by images on this page IMG_RECTS = sorted( # image rects on page (sort top-left to bottom-right) [pymupdf.Rect(item["bbox"]) for item in page.get_image_info()], key=lambda b: (b.y1, b.x0), ) def free_cells(column): """Return free areas in this column.""" free_stripes = [] # y-value pairs wrapping a free area stripe # intersecting images: block complete intersecting column stripe col_imgs = [(b.y0, b.y1) for b in IMG_RECTS if abs(b & column) > 0] s_y0 = column.y0 # top y-value of column for y0, y1 in col_imgs: # an image stripe if y0 > s_y0 + FONTSIZE: # image starts below last free btm value free_stripes.append((s_y0, y0)) # store as free stripe s_y0 = y1 # start of next free stripe if s_y0 + FONTSIZE < column.y1: # enough room to column bottom free_stripes.append((s_y0, column.y1)) if free_stripes == []: # covers "no image in this column" free_stripes.append((column.y0, column.y1)) # make available cells of this column CELLS = [pymupdf.Rect(column.x0, y0, column.x1, y1) for (y0, y1) in free_stripes] return CELLS # collection of available Story rectangles on page CELLS = [] for i in range(COLS): CELLS.extend(free_cells(TABLE[0][i])) return page.number, prect, CELLS HTML = myzip.read("quickfox.html").decode() # -------------------------------------------------------------- # Make the Story object # -------------------------------------------------------------- story = pymupdf.Story(HTML) # modify the DOM somewhat body = story.body # access HTML body body.set_properties(font="sans-serif") # and give it our font globally # modify certain nodes para = body.find("p", None, None) # find relevant nodes (here: paragraphs) while para != None: para.set_properties( # method MUST be used for existing nodes indent=15, fontsize=FONTSIZE, ) para = para.find_next("p", None, None) # we remove all image references, because the target PDF already has them img = body.find("img", None, None) while img != None: next_img = img.find_next("img", None, None) img.remove() img = next_img page_info = {} # contains MEDIABOX and free CELLS per page doc = pymupdf.open(docname) for page in doc: pno, mediabox, cells = analyze_page(page) page_info[pno] = (mediabox, cells) doc.close() # close target PDF for now - re-open later fileobject = io.BytesIO() # let DocumentWriter write to memory writer = pymupdf.DocumentWriter(fileobject) # define output writer more = 1 # stop if this ever becomes zero pno = 0 # count output pages while more: # loop until all HTML text has been written try: MEDIABOX, CELLS = page_info[pno] except KeyError: # too much text space required: reduce fontsize? raise ValueError("text does not fit on target PDF") dev = writer.begin_page(MEDIABOX) # prepare a new output page for cell in CELLS: # iterate over free cells on this page if not more: # need to check this for every cell continue more, _ = story.place(cell) story.draw(dev) writer.end_page() # finish the PDF page pno += 1 writer.close() # close DocumentWriter output # Re-open writer output, read its pages and overlay target pages with them. # The generated pages have same dimension as their targets. src = pymupdf.open("pdf", fileobject) doc = pymupdf.open(doc.name) for page in doc: # overlay every target page with the prepared text if page.number >= src.page_count: print(f"Text only uses {src.page_count} target pages!") continue # story did not need all target pages? # overlay target page page.show_pdf_page(page.rect, src, page.number) # DEBUG start --- draw the text rectangles # mb, cells = page_info[page.number] # for cell in cells: # page.draw_rect(cell, color=(1, 0, 0)) # DEBUG stop --- doc.ez_save(outname)
如何输出 HTML 表格
输出 HTML 表格的支持如下:
- 支持平面表格布局(“行 x 列”),不支持“colspan”/“rowspan”属性。
- 表头标签 th 支持带有值“row”或“col”的“scope”属性。适用的文本将默认为粗体。
- 列宽度根据列内容自动计算。它们不能直接设置。
- 表格单元格可能包含图片,这将被考虑在列宽度计算魔法中。
- 行高根据行内容自动计算 - 导致需要时出现多行行。
- 表格行的潜在多行将始终保持在一页(相应的“位置”矩形)上,并且不会被分割。
- 表头行仅在第一页/“位置”矩形上显示。
- 当直接在 HTML 表格元素中给出时,“style”属性将被忽略。表格及其元素的样式必须分别在 CSS 源或style标签中进行。
- 不支持和忽略tr元素的样式。因此,不支持整个表格范围的网格或交替行背景颜色。然而,以下示例脚本之一展示了处理此限制的简单方法。
文件:
docs/samples/table01.py
这个脚本反映了基本特性。
查看步骤
""" Demo script for basic HTML table support in Story objects Outputs a table with three columns that fits on one Letter page. The content of each row is filled via the Story's template mechanism. Column widths and row heights are automatically computed by MuPDF. Some styling via a CSS source is also demonstrated: - The table header row has a gray background - Each cell shows a border at its top - The Story's body uses the sans-serif font family - The text of one of the columns is set to blue Dependencies ------------- PyMuPDF v1.22.0 or later """ import pymupdf table_text = ( # the content of each table row ( "Length", "integer", """(Required) The number of bytes from the beginning of the line following the keyword stream to the last byte just before the keyword endstream. (There may be an additional EOL marker, preceding endstream, that is not included in the count and is not logically part of the stream data.) See “Stream Extent,” above, for further discussion.""", ), ( "Filter", "name or array", """(Optional) The name of a filter to be applied in processing the stream data found between the keywords stream and endstream, or an array of such names. Multiple filters should be specified in the order in which they are to be applied.""", ), ( "FFilter", "name or array", """(Optional; PDF 1.2) The name of a filter to be applied in processing the data found in the stream's external file, or an array of such names. The same rules apply as for Filter.""", ), ( "FDecodeParms", "dictionary or array", """(Optional; PDF 1.2) A parameter dictionary, or an array of such dictionaries, used by the filters specified by FFilter. The same rules apply as for DecodeParms.""", ), ( "DecodeParms", "dictionary or array", """(Optional) A parameter dictionary or an array of such dictionaries, used by the filters specified by Filter. If there is only one filter and that filter has parameters, DecodeParms must be set to the filter's parameter dictionary unless all the filter's parameters have their default values, in which case the DecodeParms entry may be omitted. If there are multiple filters and any of the filters has parameters set to nondefault values, DecodeParms must be an array with one entry for each filter: either the parameter dictionary for that filter, or the null object if that filter has no parameters (or if all of its parameters have their default values). If none of the filters have parameters, or if all their parameters have default values, the DecodeParms entry may be omitted. (See implementation note 7 in Appendix H.)""", ), ( "DL", "integer", """(Optional; PDF 1.5) A non-negative integer representing the number of bytes in the decoded (defiltered) stream. It can be used to determine, for example, whether enough disk space is available to write a stream to a file.\nThis value should be considered a hint only; for some stream filters, it may not be possible to determine this value precisely.""", ), ( "F", "file specification", """(Optional; PDF 1.2) The file containing the stream data. If this entry is present, the bytes between stream and endstream are ignored, the filters are specified by FFilter rather than Filter, and the filter parameters are specified by FDecodeParms rather than DecodeParms. However, the Length entry should still specify the number of those bytes. (Usually, there are no bytes and Length is 0.) (See implementation note 46 in Appendix H.)""", ), ) # Only a minimal HTML source is required to provide the Story's working HTML = """ <html> <body><h2>TABLE 3.4 Entries common to all stream dictionaries</h2> <table> <tr> <th>KEY</th><th>TYPE</th><th>VALUE</th> </tr> <tr id="row"> <td id="col0"></td><td id="col1"></td><td id="col2"></td> </tr> """ """ --------------------------------------------------------------------- Just for demo purposes, set: - header cell background to gray - text color in col1 to blue - a border line at the top of all table cells - all text to the sans-serif font --------------------------------------------------------------------- """ CSS = """th { background-color: #aaa; } td[id="col1"] { color: blue; } td, tr { border: 1px solid black; border-right-width: 0px; border-left-width: 0px; border-bottom-width: 0px; } body { font-family: sans-serif; } """ story = pymupdf.Story(HTML, user_css=CSS) # define the Story body = story.body # access the HTML <body> of it template = body.find(None, "id", "row") # find the template with name "row" parent = template.parent # access its parent i.e., the <table> for col0, col1, col2 in table_text: row = template.clone() # make a clone of the row template # add text to each cell in the duplicated row row.find(None, "id", "col0").add_text(col0) row.find(None, "id", "col1").add_text(col1) row.find(None, "id", "col2").add_text(col2) parent.append_child(row) # add new row to <table> template.remove() # remove the template # Story is ready - output it via a writer writer = pymupdf.DocumentWriter(__file__.replace(".py", ".pdf"), "compress") mediabox = pymupdf.paper_rect("letter") # size of one output page where = mediabox + (36, 36, -36, -36) # use this sub-area for the content more = True # detects end of output while more: dev = writer.begin_page(mediabox) # start a page, returning a device more, filled = story.place(where) # compute content fitting into "where" story.draw(dev) # output it to the page writer.end_page() # finalize the page writer.close() # close the output
docs/samples/national-capitals.py
通过简单的附加代码扩展了表格输出选项的高级脚本:
- 模拟重复标题行的多页输出
- 交替的表格行背景颜色
- 表格行和列由网格线分隔
- 表格行动态生成/填充来自 SQL 数据库的数据
PyMuPDF 1.24.4 中文文档(四)(4)https://developer.aliyun.com/article/1559457