Evalscope压力测试,数据集为random时能够正常返回结果,而gsm8k找不到数据集的注册类

首先将DeepSeek-R1-Distill-Llama-70B模型部署到本地服务器的docker-a中;接着在docker-b中的conda环境下,使用evalscope进行大模型压力测试,当数据集为random时能够正常返回结果,但更换数据集为gsm8k时,EvalScope 无法找到 gsm8k 数据集的注册类。
shell中返回日志如下:

(evalscope) root@yjz-eval-1750728279:/yjz_spacec/eval-muxi-poc/dataset-test# python performance-test.py
2025-07-02 16:32:24,694 - evalscope - INFO - Save the result to: ./outputs/20250702_163224/
2025-07-02 16:32:24,694 - evalscope - INFO - Starting benchmark with args:
2025-07-02 16:32:24,694 - evalscope - INFO - {
    "model": "/models/deepseek/DeepSeek-R1-Distill-Llama-70B/",
    "model_id": "",
    "attn_implementation": null,
    "api": "openai",
    "tokenizer_path": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
    "port": 8877,
    "url": "http://10.118.17.119:9000/v1/chat/completions",
    "headers": {},
    "connect_timeout": 600,
    "read_timeout": 600,
    "api_key": null,
    "no_test_connection": false,
    "number": 2,
    "parallel": 1,
    "rate": -1,
    "log_every_n_query": 10,
    "debug": false,
    "wandb_api_key": null,
    "swanlab_api_key": null,
    "name": null,
    "outputs_dir": "./outputs/20250702_163224/",
    "max_prompt_length": 1024,
    "min_prompt_length": 1024,
    "prefix_length": 0,
    "prompt": null,
    "query_template": null,
    "apply_chat_template": true,
    "dataset": "gsm8k",
    "dataset_path": null,
    "frequency_penalty": null,
    "repetition_penalty": null,
    "logprobs": null,
    "max_tokens": 1024,
    "min_tokens": 1024,
    "n_choices": null,
    "seed": 0,
    "stop": null,
    "stop_token_ids": null,
    "stream": true,
    "temperature": 0.0,
    "top_p": null,
    "top_k": null,
    "extra_args": {
        "ignore_eos": true
    }
}
2025-07-02 16:32:24,753 - evalscope - INFO - Test connection successful.
2025-07-02 16:32:26,411 - evalscope - ERROR - Exception in async function 'benchmark': 'gsm8k'
Traceback (most recent call last):
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/utils/handler.py", line 17, in async_wrapper
    return await func(*args, **kwargs)
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/benchmark.py", line 197, in benchmark
    async for request in get_requests(args):
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/benchmark.py", line 75, in get_requests
    async for request in generator:
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/benchmark.py", line 41, in generate_requests_from_dataset
    message_generator_class = DatasetRegistry(args.dataset)
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/plugin/registry.py", line 20, in __call__
    return self.get_class(name)
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/plugin/registry.py", line 14, in get_class
    return self._registry[name]
KeyError: 'gsm8k'
2025-07-02 16:32:26,525 - asyncio - ERROR - Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<statistic_benchmark_metric() running at /opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/utils/handler.py:14>>
sys:1: RuntimeWarning: coroutine 'statistic_benchmark_metric' was never awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

我执行的python文件如下,我应该如何修改我的py文件呢:


#短上下文低并发,输入1k,输出1k,更换数据集为gsm8k
from evalscope.perf.main import run_perf_benchmark
from evalscope.perf.arguments import Arguments

task_cfg = Arguments(
    parallel=[1],        #请求的并发数,此处传入了多个并发数    
    number=[2],        #每个并发请求的数量,与parallel对应
    model='/models/deepseek/DeepSeek-R1-Distill-Llama-70B/',        #模型名称,确保与算力服务器中的Curl命令请求中的model字段一致
    url='http://10.118.17.119:9000/v1/chat/completions',    #请求的url地址,连接算力服务器,需要提前通过docker部署服务
    api='openai',                        #使用的api地址,默认为openai
    dataset='gsm8k',                    #随机生成数据集
    min_tokens=1*1024,                #生成的最少token数量,不是所有模型服务都支持该参数
    max_tokens=1*1024,                #可以生成的最大token数量
    prefix_length=0,                #promt的前缀长度,默认为0,仅对于random数据集有效
    min_prompt_length=1*1024,            #最小输入prompt长度,默认为0,小于该值时,将丢弃prompt
    max_prompt_length=1*1024,            #最大输入prompt长度,默认为131072,大于该值时,将丢弃prompt
    tokenizer_path='deepseek-ai/DeepSeek-R1-Distill-Llama-70B',    #模型的tokenizer路径,计算token数量
    extra_args={'ignore_eos': True}                #请求中的额外参数,此参数为忽略结束token
)
results = run_perf_benchmark(task_cfg)

展开
收起
aliyun2776170548 2025-07-02 16:46:14 48 分享 版权
0 条回答
写回答
取消 提交回答

ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!欢迎加入技术交流群:微信公众号:魔搭ModelScope社区,钉钉群号:44837352

热门讨论

热门文章

还有其他疑问?
咨询AI助理