Evalscope压力测试，数据集为random时能够正常返回结果，而gsm8k找不到数据集的注册类

首先将DeepSeek-R1-Distill-Llama-70B模型部署到本地服务器的docker-a中；接着在docker-b中的conda环境下，使用evalscope进行大模型压力测试,当数据集为random时能够正常返回结果，但更换数据集为gsm8k时，EvalScope 无法找到 gsm8k 数据集的注册类。
shell中返回日志如下：

(evalscope) root@yjz-eval-1750728279:/yjz_spacec/eval-muxi-poc/dataset-test# python performance-test.py
2025-07-02 16:32:24,694 - evalscope - INFO - Save the result to: ./outputs/20250702_163224/
2025-07-02 16:32:24,694 - evalscope - INFO - Starting benchmark with args:
2025-07-02 16:32:24,694 - evalscope - INFO - {
    "model": "/models/deepseek/DeepSeek-R1-Distill-Llama-70B/",
    "model_id": "",
    "attn_implementation": null,
    "api": "openai",
    "tokenizer_path": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
    "port": 8877,
    "url": "http://10.118.17.119:9000/v1/chat/completions",
    "headers": {},
    "connect_timeout": 600,
    "read_timeout": 600,
    "api_key": null,
    "no_test_connection": false,
    "number": 2,
    "parallel": 1,
    "rate": -1,
    "log_every_n_query": 10,
    "debug": false,
    "wandb_api_key": null,
    "swanlab_api_key": null,
    "name": null,
    "outputs_dir": "./outputs/20250702_163224/",
    "max_prompt_length": 1024,
    "min_prompt_length": 1024,
    "prefix_length": 0,
    "prompt": null,
    "query_template": null,
    "apply_chat_template": true,
    "dataset": "gsm8k",
    "dataset_path": null,
    "frequency_penalty": null,
    "repetition_penalty": null,
    "logprobs": null,
    "max_tokens": 1024,
    "min_tokens": 1024,
    "n_choices": null,
    "seed": 0,
    "stop": null,
    "stop_token_ids": null,
    "stream": true,
    "temperature": 0.0,
    "top_p": null,
    "top_k": null,
    "extra_args": {
        "ignore_eos": true
    }
}
2025-07-02 16:32:24,753 - evalscope - INFO - Test connection successful.
2025-07-02 16:32:26,411 - evalscope - ERROR - Exception in async function 'benchmark': 'gsm8k'
Traceback (most recent call last):
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/utils/handler.py", line 17, in async_wrapper
    return await func(*args, **kwargs)
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/benchmark.py", line 197, in benchmark
    async for request in get_requests(args):
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/benchmark.py", line 75, in get_requests
    async for request in generator:
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/benchmark.py", line 41, in generate_requests_from_dataset
    message_generator_class = DatasetRegistry(args.dataset)
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/plugin/registry.py", line 20, in __call__
    return self.get_class(name)
  File "/opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/plugin/registry.py", line 14, in get_class
    return self._registry[name]
KeyError: 'gsm8k'
2025-07-02 16:32:26,525 - asyncio - ERROR - Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<statistic_benchmark_metric() running at /opt/conda/envs/evalscope/lib/python3.10/site-packages/evalscope/perf/utils/handler.py:14>>
sys:1: RuntimeWarning: coroutine 'statistic_benchmark_metric' was never awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

我执行的python文件如下，我应该如何修改我的py文件呢：


#短上下文低并发，输入1k,输出1k，更换数据集为gsm8k
from evalscope.perf.main import run_perf_benchmark
from evalscope.perf.arguments import Arguments

task_cfg = Arguments(
    parallel=[1],        #请求的并发数，此处传入了多个并发数    
    number=[2],        #每个并发请求的数量，与parallel对应
    model='/models/deepseek/DeepSeek-R1-Distill-Llama-70B/',        #模型名称，确保与算力服务器中的Curl命令请求中的model字段一致
    url='http://10.118.17.119:9000/v1/chat/completions',    #请求的url地址,连接算力服务器，需要提前通过docker部署服务
    api='openai',                        #使用的api地址，默认为openai
    dataset='gsm8k',                    #随机生成数据集
    min_tokens=1*1024,                #生成的最少token数量，不是所有模型服务都支持该参数
    max_tokens=1*1024,                #可以生成的最大token数量
    prefix_length=0,                #promt的前缀长度，默认为0，仅对于random数据集有效
    min_prompt_length=1*1024,            #最小输入prompt长度，默认为0，小于该值时，将丢弃prompt
    max_prompt_length=1*1024,            #最大输入prompt长度，默认为131072，大于该值时，将丢弃prompt
    tokenizer_path='deepseek-ai/DeepSeek-R1-Distill-Llama-70B',    #模型的tokenizer路径，计算token数量
    extra_args={'ignore_eos': True}                #请求中的额外参数，此参数为忽略结束token
)
results = run_perf_benchmark(task_cfg)

Evalscope压力测试，数据集为random时能够正常返回结果，而gsm8k找不到数据集的注册类

ModelScope模型即服务

相关文章

热门讨论

热门文章