把ModelScope的文件下载模块好好改改吧，现在没几个模型能成功下载！

最终抛出的错误是:
Max retries exceeded with url ...

完整的错误信息:
https://www.modelscope.cn/api/v1/models/tclf90/glm-4-9b-chat-GPTQ-Int8/repo?Revision=master&FilePath=model-00001-of-00003.safetensors (Caused by ChunkedEncodingError(ProtocolError('Connection broken: IncompleteRead(3804784 bytes read, 163967376 more expected)', IncompleteRead(3804784 bytes read, 163967376 more expected))))

我的分析:

1) 查看了一下文件下载的源码: file_download.py, 这个文件中
处理文件下载的主要是两个方法:
1) parallel_download # 并行下载，在要下载的模型文件大小

大于等于 MODELSCOPE_PARALLEL_DOWNLOAD_THRESHOLD_MB 1000 1000 时出发；

2) 还有一个方法: http_get_file, 就是启动http 分片下载，
里面有个无限循环的处理逻辑:
...
while True:
try:
downloaded_size = temp_file.tell()
get_headers['Range'] = 'bytes=%d-' % downloaded_size
r = requests.get(
url,
stream=True,
headers=get_headers,
cookies=cookies,
timeout=API_FILE_DOWNLOAD_TIMEOUT)
r.raise_for_status()
content_length = r.headers.get('Content-Length')
total = int(
content_length) if content_length is not None else None
progress = tqdm(
unit='B',
unit_scale=True,
unit_divisor=1024,
total=total,
initial=downloaded_size,
desc='Downloading',
)
for chunk in r.iter_content(
chunk_size=API_FILE_DOWNLOAD_CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
progress.update(len(chunk))
temp_file.write(chunk)
progress.close()
break
except (Exception) as e: # no matter what happen, we will retry.
retry = retry.increment('GET', url, error=e)
retry.sleep()
...

这段代码的处理逻辑简单的说就是只要文件没有下载完，无论出现任何错误，都会一直重试下去。

但是，基于此文开头提到的错误 "Max retries exceeded ..."，我判断还是有错误最大尝试次数的限制，例如99次。

2) 基于以上分析，模型文件下载的代码有明显的bug, 会导致模型下载的过程最终因为超过最大重试次数而终止。这是一个非常坏的用户体验，对比HuggingFace的transformes模块，我从来没有遇到过调用hf_download API 或者调研huggingFace-cli download 去下载模型最终失败的问题。

请好好把代码中的问题分析清楚并尽快修复吧。

接下来就是无限循环的代码逻辑，问题就出在这，既然下载过程是无限循环（退出循环的唯一条件是文件下载完毕），为什么还要在代码里加一段最大重试次数的限制？??

什么垃圾代码 ???

while True:
try:
with open(file_name, 'rb+') as f:
f.seek(start)
print("URL:",url)
r = requests.get(
url,
stream=True,
headers=get_headers,
cookies=cookies,
timeout=API_FILE_DOWNLOAD_TIMEOUT)
for chunk in r.iter_content(
chunk_size=API_FILE_DOWNLOAD_CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
progress.update(end - start)
break
except (Exception) as e: # no matter what exception, we will retry.
retry = retry.increment('GET', url, error=e)
logger.warning('Downloading: %s failed, reason: %s will retry' %
(model_file_name, e))
retry.sleep()

把ModelScope的文件下载模块好好改改吧，现在没几个模型能成功下载！

#

接下来就是无限循环的代码逻辑，问题就出在这，既然下载过程是无限循环（退出循环的唯一条件是文件下载完毕），为什么还要在代码里加一段最大重试次数的限制？??

ModelScope模型即服务

相关文章

相关解决方案

热门讨论

热门文章