Restoring from checkpoint failed,Assign requires shapes of both tensors to match. lhs shape= [700,8] rhs shape= [660,8]

本文涉及的产品
模型在线服务 PAI-EAS,A10/V100等 500元 1个月
交互式建模 PAI-DSW,每月250计算时 3个月
模型训练 PAI-DLC,100CU*H 3个月
简介: 模型恢复出错,是特征参数不一致问题

报错信息:
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1558, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1627, in _build
build_save=build_save, build_restore=build_restore)
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1188, in _build_internal
restore_sequentially, reshape)
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 783, in _AddShardedRestoreOps
name="restore_shard"))
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 752, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 278, in restore
self.op.get_shape().is_fully_defined())
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 236, in assign
validate_shape=validate_shape)
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 62, in assign
use_locking=use_locking, name=name)
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(args, *kwargs)
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3401, in create_op
op_def=op_def)
File "/worker/venv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1771, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [700,8] rhs shape= [660,8]
[node save/Assign_7 (defined at /worker/tensorflow_jobs/easy_rec/python/model/easy_rec_estimator.py:74) = Assign[T=DT_FLOAT, _class=["loc:@attr_value_names_embedding/embedding_weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

很明显可以看到是Restoring from checkpoint failed ,从ckpt恢复模型出错,出错原因呢是现在的模型和ckpt的模型中attr_value_names的参数不一样。

相关实践学习
使用PAI-EAS一键部署ChatGLM及LangChain应用
本场景中主要介绍如何使用模型在线服务(PAI-EAS)部署ChatGLM的AI-Web应用以及启动WebUI进行模型推理,并通过LangChain集成自己的业务数据。
机器学习概览及常见算法
机器学习(Machine Learning, ML)是人工智能的核心,专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能,它是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。 本课程将带你入门机器学习,掌握机器学习的概念和常用的算法。
相关文章
RuntimeError: Given groups=1, weight of size 64 128 1 7, expected input[16,
RuntimeError: Given groups=1, weight of size 64 128 1 7, expected input[16,
2994 0
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
2545 0
|
5月前
|
TensorFlow 算法框架/工具
【Tensorflow】解决A `Concatenate` layer should be called on a list of at least 2 inputs
在TensorFlow 2.0中,使用Concatenate函数时出现错误,可以通过替换为tf.concat 来解决。
52 4
|
5月前
|
机器学习/深度学习 PyTorch 算法框架/工具
【Pytorch】Expected hidden[0] size (2, 136, 256), got [2, 256, 256]
文章解决了PyTorch中LSTM模型因输入数据的批次大小不一致导致的“Expected hidden[0] size”错误,并提供了两种解决方案:调整批次大小或在DataLoader中设置drop_last=True来丢弃最后一个不足批次大小的数据。
105 1
|
6月前
|
PyTorch 算法框架/工具 机器学习/深度学习
|
TensorFlow 算法框架/工具
解决TypeError: tf__update_state() got an unexpected keyword argument ‘sample_weight‘
解决TypeError: tf__update_state() got an unexpected keyword argument ‘sample_weight‘
291 0
解决TypeError: tf__update_state() got an unexpected keyword argument ‘sample_weight‘
|
PyTorch 算法框架/工具
pytorch报错 RuntimeError: The size of tensor a (25) must match the size of tensor b (50) at non-singleton dimension 1 怎么解决?
这个错误提示表明,在进行某个操作时,张量a和b在第1个非单例维(即除了1以外的维度)上的大小不一致。例如,如果a是一个形状为(5, 5)的张量,而b是一个形状为(5, 10)的张量,则在第二个维度上的大小不匹配。
4183 0
|
PyTorch 算法框架/工具
Please ensure they have the same size. return F.mse_loss(input, target, reduction=self.reduction) 怎么解决?
这个通常是由于 input 和 target 张量的维度不匹配导致的,因此可以通过调整它们的维度来解决。
349 0
解决AssertionError: size of input tensor and input format are different.tensor shape: (3, 138input_for
解决AssertionError: size of input tensor and input format are different.tensor shape: (3, 138input_for
478 0
成功解决ValueError: Dimension 1 in both shapes must be equal, but are 1034 and 1024. Shapes are [100,103
成功解决ValueError: Dimension 1 in both shapes must be equal, but are 1034 and 1024. Shapes are [100,103