VLLM (Very Large Language Model) 是一种大型语言模型,通常具有数十亿或数万亿个参数,用于处理自然语言文本。VLLM 可以通过预训练和微调来执行各种任务,如文本分类、机器翻译、情感分析、问答等。
from vllm import LLM, SamplingParams
import os
# 设置环境变量,从魔搭下载模型
os.environ['VLLM_USE_MODELSCOPE'] = 'True'
llm = LLM(model="qwen/Qwen-1_8B", trust_remote_code=True)
prompts = [
"Hello, my name is",
"today is a sunny day,",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95,stop=["<|endoftext|>"])
outputs = llm.generate(prompts, sampling_params,)
# print the output
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
理解 VLLM 需要了解深度学习和自然语言处理的基本概念。在深度学习中,模型通过学习大量数据来自我优化,以提高其准确性。在自然语言处理中,VLLM 是一种语言模型,用于处理自然语言文本。
要应用 VLLM,需要使用深度学习框架,如 TensorFlow 或 PyTorch,并在该框架中加载 VLLM 模型。然后,可以使用该模型来处理输入文本并生成输出文本。例如,可以使用 VLLM 来回答问题、翻译文本或生成文本摘要。
以下是一个简单的 VLLM 应用示例:
import tensorflow as tf
加载 VLLM 模型
model = tf.keras.models.load_model('vllm_model.h5')
输入文本
input_text = "What is the capital of France?"
处理输入文本并生成输出文本
output_text = model.predict(input_text)
输出结果
print(output_text)
VLLM 是一种非常有用的技术,可以用于各种自然语言处理任务。
import sys
from vllm import LLM, SamplingParams
import os
from modelscope import AutoTokenizer, snapshot_download
# 设置环境变量,从魔搭下载模型
model_dir = snapshot_download("qwen/Qwen-1_8b-Chat")
sys.path.insert(0, model_dir)
from qwen_generation_utils import (
HistoryType,
make_context,
decode_tokens,
get_stop_words_ids,
StopWordsLogitsProcessor,
)
llm = LLM(model=model_dir, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
prompts = [
"Hello, my name is Alia",
"Today is a sunny day,",
"The capital of France is",
"Introduce YaoMing to me.",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=128, stop=['<|endoftext|>', '<|im_start|>'])
inputs = []
for prompt in prompts:
raw_text, context_tokens = make_context(
tokenizer,
prompt,
history=[],
system="You are a helpful assistant.",
chat_format='chatml',
)
inputs.append(context_tokens)
# call with prompt_token_ids, which has template information
outputs = llm.generate(prompt_token_ids=inputs, sampling_params=sampling_params,)
histories = []
for prompt, output in zip(prompts, outputs):
history = []
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
history.append((prompt, generated_text))
histories.append(history)
prompts_new = [
'What is my name again?',
'What is the weather I just said today?',
'What is the city you mentioned just now?',
'How tall is him?'
]
inputs = []
for prompt, history in zip(prompts_new, histories):
raw_text, context_tokens = make_context(
tokenizer,
prompt,
history=history,
system="You are a helpful assistant.",
chat_format='chatml',
)
inputs.append(context_tokens)
outputs = llm.generate(prompt_token_ids=inputs, sampling_params=sampling_params,)
# print the output
for prompt, output in zip(prompts_new, outputs):
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")