本文的想法来自今年OpenAI cookbook的一篇实践：summarizing_long_documents，目标是演示如何以可控的细节程度总结大型文档。

如果我们想让大语言模型总结一份长文档（例如 10k 或更多tokens），但是直接输入大语言模型往往会得到一个相对较短的摘要，该摘要与文档的长度并不成比例。例如，20k tokens的文档的摘要不会是 10k tokens的文档摘要的两倍长。本文通过将文档分为几部分来解决这个问题，然后分段生成摘要。在对大语言模型进行多次查询后，可以重建完整的摘要。通过控制文本块的数量及其大小，我们最终可以控制输出中的细节级别。

本文使用的工具和模型如下：

大语言模型：Qwen2的GGUF格式模型

工具1：Ollama，将大语言模型GGUF部署成OpenAI格式的API

工具2：transformers，使用transformers的新功能，直接加载GGUF格式模型的tokenizer，用于文档长度查询和分段。

最佳实践

运行Qwen2模型（详见《魔搭社区GGUF模型怎么玩！看这篇就够了》）

复制模型路径，创建名为“ModelFile”的meta文件，内容如下：

FROM /mnt/workspace/qwen2-7b-instruct-q5_k_m.gguf
# set the temperature to 0.7 [higher is more creative, lower is more coherent]
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER repeat_penalty 1.05
TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
{{ .Response }}"""
# set the system message
SYSTEM """
You are a helpful assistant.
"""

使用ollama create命令创建自定义模型并运行

ollama create myqwen2 --file ./ModelFile
ollama run myqwen2

安装依赖&读取需要总结的文档

import os
from typing import List, Tuple, Optional
from openai import OpenAI
from transformers import AutoTokenizer
from tqdm import tqdm
# load doc
with open("data/artificial_intelligence_wikipedia.txt", "r") as file:
    artificial_intelligence_wikipedia_text = file.read()

加载encoding并检查文档长度

HuggingFace的transformers 支持加载GGUF单文件格式，以便为 gguf 模型提供进一步的训练/微调功能，然后再将这些模型转换回生态系统gguf中使用ggml，GGUF文件通常包含配置属性，tokenizer，以及其他的属性，以及要加载到模型的所有tensor，参考文档：https://huggingface.co/docs/transformers/gguf

目前支持的模型架构为：llama，mistral，qwen2

# load encoding and check the length of dataset
encoding = AutoTokenizer.from_pretrained("/mnt/workspace/cherry/",gguf_file="qwen2-7b-instruct-q5_k_m.gguf")
len(encoding.encode(artificial_intelligence_wikipedia_text))

调用LLM的OpenAI格式的API

client = OpenAI(
    base_url = 'http://127.0.0.1:11434/v1',
    api_key='ollama', # required, but unused
)
def get_chat_completion(messages, model='myqwen2'):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message.content

文档拆解

我们定义了一些函数，将大文档分成较小的部分。

def tokenize(text: str) -> List[str]:    
    return encoding.encode(text)
# This function chunks a text into smaller pieces based on a maximum token count and a delimiter.
def chunk_on_delimiter(input_string: str,
                       max_tokens: int, delimiter: str) -> List[str]:
    chunks = input_string.split(delimiter)
    combined_chunks, _, dropped_chunk_count = combine_chunks_with_no_minimum(
        chunks, max_tokens, chunk_delimiter=delimiter, add_ellipsis_for_overflow=True
    )
    if dropped_chunk_count > 0:
        print(f"warning: {dropped_chunk_count} chunks were dropped due to overflow")
    combined_chunks = [f"{chunk}{delimiter}" for chunk in combined_chunks]
    return combined_chunks
# This function combines text chunks into larger blocks without exceeding a specified token count. It returns the combined text blocks, their original indices, and the count of chunks dropped due to overflow.
def combine_chunks_with_no_minimum(
        chunks: List[str],
        max_tokens: int,
        chunk_delimiter="\n\n",
        header: Optional[str] = None,
        add_ellipsis_for_overflow=False,
) -> Tuple[List[str], List[int]]:
    dropped_chunk_count = 0
    output = []  # list to hold the final combined chunks
    output_indices = []  # list to hold the indices of the final combined chunks
    candidate = (
        [] if header is None else [header]
    )  # list to hold the current combined chunk candidate
    candidate_indices = []
    for chunk_i, chunk in enumerate(chunks):
        chunk_with_header = [chunk] if header is None else [header, chunk]
        if len(tokenize(chunk_delimiter.join(chunk_with_header))) > max_tokens:
            print(f"warning: chunk overflow")
            if (
                    add_ellipsis_for_overflow
                    and len(tokenize(chunk_delimiter.join(candidate + ["..."]))) <= max_tokens
            ):
                candidate.append("...")
                dropped_chunk_count += 1
            continue  # this case would break downstream assumptions
        # estimate token count with the current chunk added
        extended_candidate_token_count = len(tokenize(chunk_delimiter.join(candidate + [chunk])))
        # If the token count exceeds max_tokens, add the current candidate to output and start a new candidate
        if extended_candidate_token_count > max_tokens:
            output.append(chunk_delimiter.join(candidate))
            output_indices.append(candidate_indices)
            candidate = chunk_with_header  # re-initialize candidate
            candidate_indices = [chunk_i]
        # otherwise keep extending the candidate
        else:
            candidate.append(chunk)
            candidate_indices.append(chunk_i)
    # add the remaining candidate to output if it's not empty
    if (header is not None and len(candidate) > 1) or (header is None and len(candidate) > 0):
        output.append(chunk_delimiter.join(candidate))
        output_indices.append(candidate_indices)
    return output, output_indices, dropped_chunk_count

摘要函数

现在我们可以定义一个实用程序来以可控的细节级别总结文本（注意参数detail）。

该函数首先根据可控参数在最小和最大块数之间进行插值来确定块数detail。然后，它将文本拆分成块并对每个块进行总结。

def summarize(text: str,
              detail: float = 0,
              model: str = 'myqwen2',
              additional_instructions: Optional[str] = None,
              minimum_chunk_size: Optional[int] = 500,
              chunk_delimiter: str = "\n",
              summarize_recursively=False,
              verbose=False):
    """
    Summarizes a given text by splitting it into chunks, each of which is summarized individually. 
    The level of detail in the summary can be adjusted, and the process can optionally be made recursive.
    Parameters:
    - text (str): The text to be summarized.
    - detail (float, optional): A value between 0 and 1 indicating the desired level of detail in the summary.
      0 leads to a higher level summary, and 1 results in a more detailed summary. Defaults to 0.
    - model (str, optional): The model to use for generating summaries. Defaults to 'gpt-3.5-turbo'.
    - additional_instructions (Optional[str], optional): Additional instructions to provide to the model for customizing summaries.
    - minimum_chunk_size (Optional[int], optional): The minimum size for text chunks. Defaults to 500.
    - chunk_delimiter (str, optional): The delimiter used to split the text into chunks. Defaults to ".".
    - summarize_recursively (bool, optional): If True, summaries are generated recursively, using previous summaries for context.
    - verbose (bool, optional): If True, prints detailed information about the chunking process.
    Returns:
    - str: The final compiled summary of the text.
    The function first determines the number of chunks by interpolating between a minimum and a maximum chunk count based on the `detail` parameter. 
    It then splits the text into chunks and summarizes each chunk. If `summarize_recursively` is True, each summary is based on the previous summaries, 
    adding more context to the summarization process. The function returns a compiled summary of all chunks.
    """
    # check detail is set correctly
    assert 0 <= detail <= 1
    # interpolate the number of chunks based to get specified level of detail
    max_chunks = len(chunk_on_delimiter(text, minimum_chunk_size, chunk_delimiter))
    min_chunks = 1
    num_chunks = int(min_chunks + detail * (max_chunks - min_chunks))
    # adjust chunk_size based on interpolated number of chunks
    document_length = len(tokenize(text))
    chunk_size = max(minimum_chunk_size, document_length // num_chunks)
    text_chunks = chunk_on_delimiter(text, chunk_size, chunk_delimiter)
    if verbose:
        print(f"Splitting the text into {len(text_chunks)} chunks to be summarized.")
        print(f"Chunk lengths are {[len(tokenize(x)) for x in text_chunks]}")
    # set system message
    system_message_content = "Rewrite this text in summarized form."
    if additional_instructions is not None:
        system_message_content += f"\n\n{additional_instructions}"
    accumulated_summaries = []
    for chunk in tqdm(text_chunks):
        if summarize_recursively and accumulated_summaries:
            # Creating a structured prompt for recursive summarization
            accumulated_summaries_string = '\n\n'.join(accumulated_summaries)
            user_message_content = f"Previous summaries:\n\n{accumulated_summaries_string}\n\nText to summarize next:\n\n{chunk}"
        else:
            # Directly passing the chunk for summarization without recursive context
            user_message_content = chunk
        # Constructing messages based on whether recursive summarization is applied
        messages = [
            {"role": "system", "content": system_message_content},
            {"role": "user", "content": user_message_content}
        ]
        # Assuming this function gets the completion and works as expected
        response = get_chat_completion(messages, model=model)
        accumulated_summaries.append(response)
    # Compile final summary from partial summaries
    final_summary = '\n\n'.join(accumulated_summaries)
    return final_summary

现在，我们可以使用此实用程序生成具有不同详细程度的摘要。通过detail从 0 增加到 1，我们可以逐渐获得更长的底层文档摘要。参数值越高，detail摘要越详细，因为实用程序首先将文档拆分为更多块。然后对每个块进行汇总，最终摘要是所有块摘要的串联。

summary_with_detail_0 = summarize(artificial_intelligence_wikipedia_text, detail=0, verbose=True)

summary_with_detail_pt25 = summarize(artificial_intelligence_wikipedia_text, detail=0.25, verbose=True)

此实用程序还允许传递附加指令。

summary_with_additional_instructions = summarize(artificial_intelligence_wikipedia_text, detail=0.1,
                                                 additional_instructions="Write in point form and focus on numerical data.")
print(summary_with_additional_instructions)

最后，请注意，该实用程序允许递归汇总，其中每个摘要都基于先前的摘要，从而为汇总过程添加更多上下文。可以通过将参数设置summarize_recursively为 True 来启用此功能。这在计算上更昂贵，但可以提高组合摘要的一致性和连贯性。

recursive_summary = summarize(artificial_intelligence_wikipedia_text, detail=0.1, summarize_recursively=True)
print(recursive_summary)

可控细节的长文档摘要，探索开源LLM工具与实践

最佳实践

运行Qwen2模型（详见《魔搭社区GGUF模型怎么玩！看这篇就够了》）

安装依赖&读取需要总结的文档

加载encoding并检查文档长度

调用LLM的OpenAI格式的API

文档拆解

摘要函数

热门文章

最新文章

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

可控细节的长文档摘要，探索开源LLM工具与实践

最佳实践

运行Qwen2模型（详见《魔搭社区GGUF模型怎么玩！看这篇就够了》）

安装依赖&读取需要总结的文档

加载encoding并检查文档长度

调用LLM的OpenAI格式的API

文档拆解

摘要函数

热门文章

最新文章

相关电子书

相关实验场景