一开始我用的代码是:
from sentence_transformers import SentenceTransformer model = SentenceTransformer('all-MiniLM-L6-v2')
好几次都在下载了一小部分之后失败了。
所以改为提前将模型下载到本地(wget稳定性更强,可以无限retry,我下pytorch_model.bin重试了8次):
- 这个模型的网址很容易找到:https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
- 挨个下载文件到本地:
mkdir /data/pretrained_model/all-MiniLM-L6-v2 wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/config.json wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/pytorch_model.bin wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/data_config.json wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/config_sentence_transformers.json wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/modules.json wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/sentence_bert_config.json wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/special_tokens_map.json wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer_config.json wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/train_script.py wget -P /data/pretrained_model/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/vocab.txt mkdir /data/pretrained_model/all-MiniLM-L6-v2/1_Pooling wget -P /data/pretrained_model/all-MiniLM-L6-v2/1_Pooling https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/1_Pooling/config.json
然后代码直接改成:
from sentence_transformers import SentenceTransformer model = SentenceTransformer('/data/pretrained_model/all-MiniLM-L6-v2')
其他一切不变即可。