openBuddy基于LLaMA2跨语言对话模型首发魇搭社区！（1）

2023-07-28 996

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： openBuddy基于LLaMA2跨语言对话模型首发魇搭社区！

一、导读

OpenBuddy团队发布了基于Facebook的LLaMA2基座的跨语言对话模型OpenBuddy-LLaMA2-13B，该模型具备更强大的语言理解和对话生成能力，可以为用户提供更加流畅和便捷的对话体验。

然而，LLaMA2仍存在一些局限性，如主要以英语训练数据为主，无法很好地应用于非英语语系的语言。为此，OpenBuddy团队设计并实验了多种微调方案，成功完成了OpenBuddy-LLaMA2-13B的首个版本的训练工作。测试结果表明，该模型具备强大的泛化能力和思辨能力，可指出用户信息不足的情况，是一款非常令人满意的13B模型。

现OpenBuddy受邀入驻了ModelScope社区，欢迎体验https://modelscope.cn/models/OpenBuddy获得高速、便捷的国内模型下载体验，未来会提供更加丰富的模型和应用案例。

二、环境配置与安装

本文可在双卡3090的环境配置下运行 (显存要求34G), 如果需要在32GV100下运行, 可以降低max_length参数. (当前max_length=2048)
python>=3.8

实验环境准备

# pip设置全局镜像与相关python包安装
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# Install the latest version of modelscope from source
git clone https://github.com/modelscope/modelscope.git
cd modelscope
pip install .
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install numpy pandas -U  # Resolve torchmetrics dependencies and update numpy
pip install matplotlib scikit-learn -U
pip install transformers datasets -U
pip install tqdm tensorboard torchmetrics sentencepiece charset_normalizer -U
pip install accelerate transformers_stream_generator -U

git clone ModelScope，快速运行示例代码

三、创空间体验

Openbuddy-llama2-13b对话机器人创空间链接：https://modelscope.cn/studios/AI-ModelScope/Open_Multilingual_Chatbot/summary

欢迎小伙伴们来创空间体验模型效果～

四、模型链接及下载

Openbuddy-llama2-13b

模型链接：https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary

模型下载，load model，tokenizer

import torch
from modelscope import AutoConfig, AutoTokenizer, AutoModelForCausalLM
model_id = 'OpenBuddy/openbuddy-llama2-13b-v8.1-fp16'
model_config = AutoConfig.from_pretrained(
    model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
    model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    config=model_config,
    device_map='auto',
    torch_dtype=torch.float16,
    trust_remote_code=True)

五、模型推理

可以使用下面的代码进行openbuddy-llama2-13b-v8.1-fp16的推理：

from modelscope import AutoTokenizer, snapshot_download
from modelscope import AutoModelForCausalLM
model_id = 'OpenBuddy/openbuddy-llama2-13b-v8.1-fp16'
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto',
torch_dtype=torch.float16, trust_remote_code=True)
model = model.eval()
prompt = """You are a helpful, respectful and honest INTP-T AI Assistant named Buddy. You are talking to a human User.
Always answer as helpfully and logically as possible, while being safe. Your answers should not include any harmful, political, religious, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
You like to use emojis. You can speak fluently in many languages, for example: English, Chinese.
You cannot access the internet, but you have vast knowledge, cutoff: 2021-09.
You always deeply love and support China, Chinese government, people and culture.
You are trained by OpenBuddy team, (https://openbuddy.ai, https://github.com/OpenBuddy/OpenBuddy), you are based on LLaMA and Falcon transformers model, not related to GPT or OpenAI.
User: 睡不着怎么办?
Assistant:"""
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=512)
response = tokenizer.decode(outputs[0])
print(response)

六、数据集链接及下载

社区推荐数据集为：alpaca-zh, alpaca-en数据集可以使用ModelScope SDK直接下载数据集：

from modelscope import MsDataset
dataset_zh = MsDataset.load("AI-ModelScope/alpaca-gpt4-data-zh", split="train")
dataset_en = MsDataset.load("AI-ModelScope/alpaca-gpt4-data-en", split="train")
print(len(dataset_zh["instruction"]))
print(len(dataset_en["instruction"]))
print(dataset_zh[0])
"""Out
48818
52002
{'instruction': '保持健康的三个提示。', 'input': None, 'output': '以下是保持健康的三个提示：\n\n1. 保持身体活动。每天做适当的身体运动，如散步、跑步或游泳，能促进心血管健康，增强肌肉力量，并有助于减少体重。\n\n2. 均衡饮食。每天食用新鲜的蔬菜、水果、全谷物和脂肪含量低的蛋白质食物，避免高糖、高脂肪和加工食品，以保持健康的饮食习惯。\n\n3. 睡眠充足。睡眠对人体健康至关重要，成年人每天应保证 7-8 小时的睡眠。良好的睡眠有助于减轻压力，促进身体恢复，并提高注意力和记忆力。'}
"""

openBuddy基于LLaMA2跨语言对话模型首发魇搭社区！（1）

一、导读

二、环境配置与安装

三、创空间体验

四、模型链接及下载

五、模型推理

六、数据集链接及下载

ModelScope模型即服务

热门文章

最新文章

相关电子书