使用@font-face语句在 CSS 语法中定义字体文件。对于希望支持的每种字体粗细和字体风格组合(例如粗体或斜体),都需要一个单独的@font-face。以下示例使用著名的 MS Comic Sans 字体及其四个变体:正常、粗体、斜体和粗斜体。

由于这四个字体文件位于系统文件夹C:/Windows/Fonts中,因此该方法需要一个指向该文件夹的 Archive 定义:

How to use your own fonts with method Page.insert_htmlbox().
import pymupdf
# Example text
text = """Lorem ipsum dolor sit amet, consectetur adipisici elit, sed
 eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad
 minim veniam, quis nostrud exercitation <b>ullamco <i>laboris</i></b>
 nisi ut aliquid ex ea commodi consequat. Quis aute iure
 <span style="color: red;">reprehenderit</span>
 in <span style="color: green;font-weight:bold;">voluptate</span> velit
 esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat
 cupiditat non proident, sunt in culpa qui
 <a href="">officia</a> deserunt mollit anim id
 est laborum."""
We need an Archive object to show where font files are located.
We intend to use the font family "MS Comic Sans".
arch = pymupdf.Archive("C:/Windows/Fonts")
# These statements define which font file to use for regular, bold,
# italic and bold-italic text.
# We assign an arbitary common font-family for all 4 font files.
# The Story algorithm will select the right file as required.
# We request to use "comic" throughout the text.
css = """
@font-face {font-family: comic; src: url(comic.ttf);}
@font-face {font-family: comic; src: url(comicbd.ttf);font-weight: bold;}
@font-face {font-family: comic; src: url(comicz.ttf);font-weight: bold;font-style: italic;}
@font-face {font-family: comic; src: url(comici.ttf);font-style: italic;}
* {font-family: comic;}
doc = pymupdf.Document()
page = doc.new_page(width=150, height=150)  # make small page
page.insert_htmlbox(page.rect, text, css=css, archive=arch)
doc.subset_fonts(verbose=True)  # build subset fonts to reduce file size
doc.ez_save(__file__.replace(".py", ".pdf")) 



  • 逆时针旋转文本 90 度。
  • 使用pymupdf-fonts包中的字体。您会发现在这种情况下,相应的 CSS 定义要简单得多。
  • 使用“justify”选项使文本对齐。
How to use a pymupdf font with method Page.insert_htmlbox().
import pymupdf
# Example text
text = """Lorem ipsum dolor sit amet, consectetur adipisici elit, sed
 eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad
 minim veniam, quis nostrud exercitation <b>ullamco <i>laboris</i></b>
 nisi ut aliquid ex ea commodi consequat. Quis aute iure
 <span style="color: red;">reprehenderit</span>
 in <span style="color: green;font-weight:bold;">voluptate</span> velit
 esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat
 cupiditat non proident, sunt in culpa qui
 <a href="">officia</a> deserunt mollit anim id
 est laborum."""
This is similar to font file support. However, we can use a convenience
function for creating required CSS definitions.
We still need an Archive for finding the font binaries.
arch = pymupdf.Archive()
# We request to use "myfont" throughout the text.
css = pymupdf.css_for_pymupdf_font("ubuntu", archive=arch, name="myfont")
css += "* {font-family: myfont;text-align: justify;}"
doc = pymupdf.Document()
page = doc.new_page(width=150, height=150)
page.insert_htmlbox(page.rect, text, css=css, archive=arch, rotate=90)
doc.ez_save(__file__.replace(".py", ".pdf")) 

### 如何编写文本行


import pymupdf
doc =  # new or existing PDF
page = doc.new_page()  # new or existing page via doc[n]
p = pymupdf.Point(50, 72)  # start point of 1st line
text = "Some text,\nspread across\nseveral lines."
# the same result is achievable by
# text = ["Some text", "spread across", "several lines."]
rc = page.insert_text(p,  # bottom-left of 1st char
                     text,  # the text (honors '\n')
                     fontname = "helv",  # the default font
                     fontsize = 11,  # the default font size
                     rotate = 0,  # also available: 90, 180, 270
print("%i lines printed on page %i." % (rc, page.number))"text.pdf") 

使用此方法,只控制行数以确保不超过页面高度。多余的行将不会被写入,并返回实际行数。计算使用从fontsize和 36 点(0.5 英寸)底部边距计算的行高。


但是,对于内置字体,有办法预先计算行宽 - 参见get_text_length()

这里是另一个例子。它使用四种不同的旋转选项插入了 4 个文本字符串,并因此解释了如何选择文本插入点以实现所需的结果:

import pymupdf
doc =
page = doc.new_page()
# the text strings, each having 3 lines
text1 = "rotate=0\nLine 2\nLine 3"
text2 = "rotate=90\nLine 2\nLine 3"
text3 = "rotate=-90\nLine 2\nLine 3"
text4 = "rotate=180\nLine 2\nLine 3"
red = (1, 0, 0) # the color for the red dots
# the insertion points, each with a 25 pix distance from the corners
p1 = pymupdf.Point(25, 25)
p2 = pymupdf.Point(page.rect.width - 25, 25)
p3 = pymupdf.Point(25, page.rect.height - 25)
p4 = pymupdf.Point(page.rect.width - 25, page.rect.height - 25)
# create a Shape to draw on
shape = page.new_shape()
# draw the insertion points as red, filled dots
shape.finish(width=0.3, color=red, fill=red)
# insert the text strings
shape.insert_text(p1, text1)
shape.insert_text(p3, text2, rotate=90)
shape.insert_text(p2, text3, rotate=-90)
shape.insert_text(p4, text4, rotate=180)
# store our work to the page



此脚本使用 4 种不同的旋转值填充 4 个不同的矩形框内的文本:

import pymupdf
doc =  # new or existing PDF
page = doc.new_page()  # new page, or choose doc[n]
# write in this overall area
rect = pymupdf.Rect(100, 100, 300, 150)
# partition the area in 4 equal sub-rectangles
CELLS = pymupdf.make_table(rect, cols=4, rows=1)
t1 = "text with rotate = 0."  # these texts we will written
t2 = "text with rotate = 90."
t3 = "text with rotate = 180."
t4 = "text with rotate = 270."
text = [t1, t2, t3, t4]
red = pymupdf.pdfcolor["red"]  # some colors
gold = pymupdf.pdfcolor["gold"]
blue = pymupdf.pdfcolor["blue"]
We use a Shape object (something like a canvas) to output the text and
the rectangles surrounding it for demonstration.
shape = page.new_shape()  # create Shape
for i in range(len(CELLS[0])):
    shape.draw_rect(CELLS[0][i])  # draw rectangle
        CELLS[0][i], text[i], fontname="hebo", color=blue, rotate=90 * i
shape.finish(width=0.3, color=red, fill=gold)
shape.commit()  # write all stuff to the page
doc.ez_save(__file__.replace(".py", ".pdf")) 

上面使用了一些默认值:字体大小 11 和文本对齐“左”。结果如下所示:

如何使用 HTML 文本填充框


这种方法不仅接受 HTML 标签,还可以包含样式指令来影响字体、字体粗细(加粗)和样式(斜体)、颜色等等。

也可以混合多种字体和语言,输出 HTML 表格并插入图像和 URI 链接。

为了更大的样式灵活性,还可以提供额外的 CSS 源。

该方法基于 Story 类。因此,支持复杂的脚本系统如 Devanagari,Nepali,Tamil 等,且使用 HarfBuzz 库正确地编写 - 提供这种所谓的**“文本整形”**功能。

自动从 Google NOTO 字体库中获取输出字符所需的字体作为后备(当选择性提供的用户字体不包含某些字形时)。

作为这里提供功能的小窥视,我们将输出以下 HTML 丰富的文本:

import pymupdf
rect = pymupdf.Rect(100, 100, 400, 300)
text = """Lorem ipsum dolor sit amet, consectetur adipisici elit, sed
 eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad
 minim veniam, quis nostrud exercitation <b>ullamco <i>laboris</i></b>
 nisi ut aliquid ex ea commodi consequat. Quis aute iure
 <span style="color: #f00;">reprehenderit</span>
 in <span style="color: #0f0;font-weight:bold;">voluptate</span> velit
 esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat
 cupiditat non proident, sunt in culpa qui
 <a href="">officia</a> deserunt mollit anim id
 est laborum."""
doc = pymupdf.Document()
page = doc.new_page()
page.insert_htmlbox(rect, text, css="* {font-family: sans-serif;font-size:14px;}")
doc.ez_save(__file__.replace(".py", ".pdf")) 

请注意“css”参数如何用于全局选择默认的“sans-serif”字体和字体大小为 14。


如何输出 HTML 表格和图像

这里是另一个例子,使用此方法输出一个包含表格的文本。这次,我们将所有样式都包含在 HTML 源码中。还请注意,如何在表格单元格内包含图像的工作方式:

import pymupdf
import os
filedir = os.path.dirname(__file__)
text = """
body {
 font-family: sans-serif;
th {
 border: 1px solid blue;
 border-right: none;
 border-bottom: none;
 padding: 5px;
 text-align: center;
table {
 border-right: 1px solid blue;
 border-bottom: 1px solid blue;
 border-spacing: 0;
<p><b>Some Colors</b></p>
 <td><img src="img-cake.png" width=50></td>
 <td>Between<br>Gray and Purple</td>
doc = pymupdf.Document()
page = doc.new_page()
rect = page.rect + (36, 36, -36, -36)
# we must specify an Archive because of the image
page.insert_htmlbox(rect, text, archive=pymupdf.Archive("."))
doc.ez_save(__file__.replace(".py", ".pdf")) 




import pymupdf
greetings = (
    "Hello, World!",  # english
    "Hallo, Welt!",  # german
    "سلام دنیا!",  # persian
    "வணக்கம், உலகம்!",  # tamil
    "สวัสดีชาวโลก!",  # thai
    "Привіт Світ!",  # ucranian
    "שלום עולם!",  # hebrew
    "ওহে বিশ্ব!",  # bengali
    "你好世界!",  # chinese
    "こんにちは世界!",  # japanese
    "안녕하세요, 월드!",  # korean
    "नमस्कार, विश्व !",  # sanskrit
    "हैलो वर्ल्ड!",  # hindi
doc =
page = doc.new_page()
rect = (50, 50, 200, 500)
# join greetings into one text string
text = " ... ".join([t for t in greetings])
# the output of the above is simple:
page.insert_htmlbox(rect, text)".py", ".pdf")) 



## 如何提取带颜色的文本


for page in doc:
    text_blocks = page.get_text("dict", flags=pymupdf.TEXTFLAGS_TEXT)["blocks"]
    for block in text_blocks:
        for line in block["lines"]:
            for span in line["spans"]:
                text = span["text"]
                color = pymupdf.sRGB_to_rgb(span["color"])
                print(f"Text: {text}, Color: {color}") 


本软件按原样提供,没有明示或暗示的任何保证。本软件根据许可协议分发,除非在该许可协议的条款下明确授权,否则不得复制、修改或分发本软件。请参阅 上的许可信息或联系 Artifex Software Inc.,39 Mesa Street, Suite 108A, San Francisco CA 94129, United States 以获取更多信息。

本文档覆盖了所有版本直到 1.24.4。

