wsl ubuntu 22.04下加载数据集时存在如下报错:
Downloading and preparing dataset FrDoc2BotRetrieval/DAMO_ConvAI to file:///home/mike/.cache/modelscope/hub/datasets/DAMO_ConvAI/FrDoc2BotRetrieval/master/meta/modelscope___dataset_builder/DAMO_ConvAI-a66a14fd0003a8b5/master/train...
Traceback (most recent call last):
File "/home/mike/experience/DAMO-ConvAI/acl23doc2dial/train_retrieval.py", line 10, in <module>
fr_train_dataset = MsDataset.load(
File "/home/mike/anaconda3/envs/dialdoc2023/lib/python3.9/site-packages/modelscope/msdatasets/ms_dataset.py", line 255, in load
dataset_inst = RemoteDataLoaderManager(
File "/home/mike/anaconda3/envs/dialdoc2023/lib/python3.9/site-packages/modelscope/msdatasets/data_loader/data_loader_manager.py", line 132, in load_dataset
oss_data_loader.process()
File "/home/mike/anaconda3/envs/dialdoc2023/lib/python3.9/site-packages/modelscope/msdatasets/data_loader/data_loader.py", line 75, in process
self._prepare_and_download()
File "/home/mike/anaconda3/envs/dialdoc2023/lib/python3.9/site-packages/modelscope/msdatasets/data_loader/data_loader.py", line 140, in _prepare_and_download
self.dataset = self.data_files_manager.fetch_data_files(
File "/home/mike/anaconda3/envs/dialdoc2023/lib/python3.9/site-packages/modelscope/msdatasets/data_files/data_files_manager.py", line 114, in fetch_data_files
return builder.as_dataset()
File "/home/mike/anaconda3/envs/dialdoc2023/lib/python3.9/site-packages/datasets/builder.py", line 1051, in as_dataset
raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.
看起来是fsspec使用版本过高导致的,报错版本为2023.12.1
回退到2023.9.2没有问题,麻烦检查下是不是这个库升级后调用需要特别传参。
错误路径 - fsspec2023.12.1
file:///home/mike/.cache/modelscope/hub/datasets/DAMO_ConvAI/FrDoc2BotRetrieval/master/meta/modelscope___dataset_builder/DAMO_ConvAI-a66a14fd0003a8b5/master/train
正确路径 - fsspec2023.9.3
/home/mike/.cache/modelscope/hub/datasets/DAMO_ConvAI/FrDoc2BotRetrieval/master/meta/modelscope___dataset_builder/DAMO_ConvAI-a66a14fd0003a8b5/master/train
看起来问题确实与fsspec库的版本有关。根据您的情况,回退到fsspec 2023.9.3解决了加载数据集的问题。如果您希望在不降级fsspec的情况下解决这个问题,您可以尝试以下解决方案:
这个错误是因为在WSL(Windows Subsystem for Linux)环境下,ModelScope这个错误是因为在WSL(Windows Subsystem for Linux)环境下,ModelScope库不支持从本地文件系统加载数据集。你可以尝试将数据集下载到本地,然后再使用ModelScope加载。
首先,你需要找到数据集的下载链接。你可以在ModelScope的官方文档中找到数据集的详细信息,包括下载链接。然后,你可以使用requests
库下载数据集。以下是一个示例代码:
import requests
url = "数据集下载链接"
response = requests.get(url)
with open("数据集文件名", "wb") as f:
f.write(response.content)
下载完成后,你可以使用ModelScope加载数据集:
from modelscope.msdatasets import MsDataset
fr_train_dataset = MsDataset.load('数据集文件名')