TensorFlow安装教程

简介: TensorFlow安装教程

1. TensorFlow 2最新版安装(本文撰写时为2.9.0)


官方安装指南:Install TensorFlow 2

用pip安装的指南:Install TensorFlow with pip

TensorFlow基础的系统环境等要求可直接在该网站上查看,已经2022年了,一般电脑都不会这么老吧。


新建anaconda虚拟环境:conda create -n envtf2 python==3.8(Python版本需要是3.7-3.10,本文以3.8为例,主要是因为我需要用3.8版本来安装另一个包)

激活虚拟环境:conda activate envtf2

如果要使用cuda,首先确定本机安装有NVIDIA GPU driver:nvidia-smi(一般都会有的吧,没有的话到得了这一步吗)

安装指定的cudatoolkit和cudnn版本:conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0

有两种指定配置路径的方式:

①临时的,每次会话都需要先激活虚拟环境然后:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/

②自动在每次激活虚拟环境后执行此操作(我没有试过,我一直都用的是上面那种方式):


mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh


更新pip:pip install --upgrade pip

安装TensorFlow:pip install tensorflow

检验CPU版TensorFlow是否可用:python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

(我的服务器有4张卡)

image.png

检验GPU版TensorFlow是否可用:python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

image.png


注意,以上操作是在终端上进行的,不能直接放到jupyter notebook。一个失败的例子:

image.png

image.png

image.png

在jupyter notebook上,我直接调用!export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/也不行,用os.environ['LD_LIBRARY_PATH']也不行,用$env也不行,就把我整得相当困惑。我看了一下,好像如果用jupyter notebook的话就必须要修改jupyter内核才能用,但是我修改了jupyter kernelspec list路径中的kernel.json后仍然不行。(参考自python - How to set env variable in Jupyter notebook - Stack Overflow)

其他我在网上有看到一些使用全局配置解决此问题的方法,但是我这个服务器上还需要运行别的版本的别的项目,总之不太方便用这个。一般来说我对此问题的解决方法就是不用jupyter notebook来跑TF项目。我看的这些资料可资参考:

解决TensorFlow在terminal中正常但在jupyter notebook中报错的方案 - stardsd - 博客园

Add CUDA Library Path to Jupyterhub Notebook - AIML - wiki.ucar.edu

install pytorch with jupyter - 知乎


所以jupyter notebook上要成功使用TensorFlow GPU功能的话就必须要先在命令行上激活虚拟环境,然后export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/,然后调用jupyter notebook命令打开jupyter notebook,这样就能直接正常使用了。

(注意:如果仅安装了ipykernel包,那么VSCode中可以打开notebook文件,但是无法使用jupyter notebook打开能够在浏览器中打开的网页,因此需要安装jupyterlab:pip install jupyterlab(参考Project Jupyter | Installing Jupyter)。VSCode即使在远程服务器上也可以把端口转到本地使用localhost域名在本地浏览器打开,挺方便的)

运行成功的效果:

image.png


2. TensorFlow 1.14 + Keras 2.3.1(安装时间:2022.8.17)


这个是苏神bert4keras(https://github.com/bojone/bert4keras)的配置。


见TensorFlow官网(使用 pip 安装 TensorFlow),仅TensorFlow2.2以上支持Python3.8以上,所以我需要一个Python3.7的环境。

新建anaconda虚拟环境:conda create -n envtf114 python=3.7 pip

安装GPU版TensorFlow:pip install tensorflow-gpu==1.14


试用如下代码(来自tensorflow-gpu1.14代码测试_爱听许嵩歌的博客-CSDN博客_tensorflow-gpu测试代码):


import tensorflow as tf
with tf.device('/cpu:0'):
    a = tf.constant([1.0, 2.0, 3.0], shape=[3], name='a')
    b = tf.constant([1.0, 2.0, 3.0], shape=[3], name='b')
with tf.device('/gpu:2'):
    c = a + b
# 注意:allow_soft_placement=True表明:计算设备可自行选择,如果没有这个参数,会报错。
# 因为不是所有的操作都可以被放在GPU上,如果强行将无法放在GPU上的操作指定到GPU上,将会报错。
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))
# sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
sess.run(tf.global_variables_initializer())
print(sess.run(c))


报错:


Traceback (most recent call last):
  File "trytf1.py", line 1, in <module>
    import tensorflow as tf
  File "env_path/lib/python3.7/site-packages/tensorflow/__init__.py", line 28, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "env_path/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 52, in <module>
    from tensorflow.core.framework.graph_pb2 import *
  File "env_path/lib/python3.7/site-packages/tensorflow/core/framework/graph_pb2.py", line 16, in <module>
    from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
  File "env_path/lib/python3.7/site-packages/tensorflow/core/framework/node_def_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "env_path/lib/python3.7/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "env_path/lib/python3.7/site-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "env_path/lib/python3.7/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 42, in <module>
    serialized_options=None, file=DESCRIPTOR),
  File "/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 560, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates


嗯虽然不知道发生了什么总之我从善如流地照着改(参考1. Downgrade the protobuf package to 3.20.x or lower._weixin_44834086的博客-CSDN博客):


pip install protobuf==3.19.0

image.png

然后重新运行代码,这回的输出信息变成了:


env_path/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
env_path/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
env_path/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
env_path/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
env_path/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
env_path/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
env_path/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
env_path/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
env_path/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
env_path/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
env_path/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
env_path/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From trytf1.py:11: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From trytf1.py:11: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-08-17 15:27:08.829308: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-08-17 15:27:08.865801: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1800000000 Hz
2022-08-17 15:27:08.867967: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f031fff480 executing computations on platform Host. Devices:
2022-08-17 15:27:08.868039: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2022-08-17 15:27:08.871550: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2022-08-17 15:27:09.628470: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f0332f1040 executing computations on platform CUDA. Devices:
2022-08-17 15:27:09.628540: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2022-08-17 15:27:09.628560: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): Tesla T4, Compute Capability 7.5
2022-08-17 15:27:09.628580: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (2): Tesla T4, Compute Capability 7.5
2022-08-17 15:27:09.628597: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (3): Tesla T4, Compute Capability 7.5
2022-08-17 15:27:09.645921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:3b:00.0
2022-08-17 15:27:09.650885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:5e:00.0
2022-08-17 15:27:09.652426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:b1:00.0
2022-08-17 15:27:09.653863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:d9:00.0
2022-08-17 15:27:09.654104: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2022-08-17 15:27:09.654223: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2022-08-17 15:27:09.654332: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2022-08-17 15:27:09.654437: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2022-08-17 15:27:09.654540: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2022-08-17 15:27:09.654661: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2022-08-17 15:27:09.740681: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2022-08-17 15:27:09.740741: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2022-08-17 15:27:09.740824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-17 15:27:09.740848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1 2 3 
2022-08-17 15:27:09.740868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y Y Y 
2022-08-17 15:27:09.740886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N Y Y 
2022-08-17 15:27:09.740904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2:   Y Y N Y 
2022-08-17 15:27:09.740921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3:   Y Y Y N 
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:1 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:2 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:3 -> device: XLA_GPU device
2022-08-17 15:27:09.742912: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:1 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:2 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:3 -> device: XLA_GPU device
WARNING:tensorflow:From trytf1.py:13: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
add: (Add): /job:localhost/replica:0/task:0/device:CPU:0
2022-08-17 15:27:09.746145: I tensorflow/core/common_runtime/placer.cc:54] add: (Add)/job:localhost/replica:0/task:0/device:CPU:0
init: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0
2022-08-17 15:27:09.746214: I tensorflow/core/common_runtime/placer.cc:54] init: (NoOp)/job:localhost/replica:0/task:0/device:CPU:0
a: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2022-08-17 15:27:09.746257: I tensorflow/core/common_runtime/placer.cc:54] a: (Const)/job:localhost/replica:0/task:0/device:CPU:0
b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2022-08-17 15:27:09.746294: I tensorflow/core/common_runtime/placer.cc:54] b: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2022-08-17 15:27:09.748161: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
[2. 4. 6.]


其他略,总之那一堆无法打开so文件就说明cuda安装有问题,无法使用GPU。


TensorFlow版本对应GPU版本(图源https://www.tensorflow.org/install/source#gpu):

208c1227b4974e5383378d66f47882f6.png

所以首先安装所需的cudnn和cuda:conda install -c conda-forge cudatoolkit=10.0 cudnn=7.4


报了个非常诡异的bug:


Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): failed
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
    Traceback (most recent call last):
      File "anaconda3/lib/python3.9/site-packages/urllib3/response.py", line 700, in _update_chunk_length
        self.chunk_left = int(line, 16)
    ValueError: invalid literal for int() with base 16: b''
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "anaconda3/lib/python3.9/site-packages/urllib3/response.py", line 441, in _error_catcher
        yield
      File "anaconda3/lib/python3.9/site-packages/urllib3/response.py", line 767, in read_chunked
        self._update_chunk_length()
      File "anaconda3/lib/python3.9/site-packages/urllib3/response.py", line 704, in _update_chunk_length
        raise InvalidChunkLength(self, line)
    urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "anaconda3/lib/python3.9/site-packages/requests/models.py", line 760, in generate
        for chunk in self.raw.stream(chunk_size, decode_content=True):
      File "anaconda3/lib/python3.9/site-packages/urllib3/response.py", line 575, in stream
        for line in self.read_chunked(amt, decode_content=decode_content):
      File "anaconda3/lib/python3.9/site-packages/urllib3/response.py", line 796, in read_chunked
        self._original_response.close()
      File "anaconda3/lib/python3.9/contextlib.py", line 137, in __exit__
        self.gen.throw(typ, value, traceback)
      File "anaconda3/lib/python3.9/site-packages/urllib3/response.py", line 458, in _error_catcher
        raise ProtocolError("Connection broken: %r" % e, e)
    urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "anaconda3/lib/python3.9/site-packages/conda/exceptions.py", line 1114, in __call__
        return func(*args, **kwargs)
      File "anaconda3/lib/python3.9/site-packages/conda/cli/main.py", line 86, in main_subshell
        exit_code = do_call(args, p)
      File "anaconda3/lib/python3.9/site-packages/conda/cli/conda_argparse.py", line 90, in do_call
        return getattr(module, func_name)(args, parser)
      File "anaconda3/lib/python3.9/site-packages/conda/cli/main_install.py", line 20, in execute
        install(args, parser, 'install')
      File "anaconda3/lib/python3.9/site-packages/conda/cli/install.py", line 259, in install
        unlink_link_transaction = solver.solve_for_transaction(
      File "anaconda3/lib/python3.9/site-packages/conda/core/solve.py", line 152, in solve_for_transaction
        unlink_precs, link_precs = self.solve_for_diff(update_modifier, deps_modifier,
      File "anaconda3/lib/python3.9/site-packages/conda/core/solve.py", line 195, in solve_for_diff
        final_precs = self.solve_final_state(update_modifier, deps_modifier, prune, ignore_pinned,
      File "anaconda3/lib/python3.9/site-packages/conda/core/solve.py", line 300, in solve_final_state
        ssc = self._collect_all_metadata(ssc)
      File "anaconda3/lib/python3.9/site-packages/conda/common/io.py", line 86, in decorated
        return f(*args, **kwds)
      File "anaconda3/lib/python3.9/site-packages/conda/core/solve.py", line 463, in _collect_all_metadata
        index, r = self._prepare(prepared_specs)
      File "anaconda3/lib/python3.9/site-packages/conda/core/solve.py", line 1058, in _prepare
        reduced_index = get_reduced_index(self.prefix, self.channels,
      File "anaconda3/lib/python3.9/site-packages/conda/core/index.py", line 287, in get_reduced_index
        new_records = SubdirData.query_all(spec, channels=channels, subdirs=subdirs,
      File "anaconda3/lib/python3.9/site-packages/conda/core/subdir_data.py", line 139, in query_all
        result = tuple(concat(executor.map(subdir_query, channel_urls)))
      File "anaconda3/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator
        yield fs.pop().result()
      File "anaconda3/lib/python3.9/concurrent/futures/_base.py", line 446, in result
        return self.__get_result()
      File "anaconda3/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
        raise self._exception
      File "anaconda3/lib/python3.9/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "anaconda3/lib/python3.9/site-packages/conda/core/subdir_data.py", line 131, in <lambda>
        subdir_query = lambda url: tuple(SubdirData(Channel(url), repodata_fn=repodata_fn).query(
      File "anaconda3/lib/python3.9/site-packages/conda/core/subdir_data.py", line 144, in query
        self.load()
      File "anaconda3/lib/python3.9/site-packages/conda/core/subdir_data.py", line 209, in load
        _internal_state = self._load()
      File "anaconda3/lib/python3.9/site-packages/conda/core/subdir_data.py", line 374, in _load
        raw_repodata_str = fetch_repodata_remote_request(
      File "anaconda3/lib/python3.9/site-packages/conda/core/subdir_data.py", line 700, in fetch_repodata_remote_request
        resp = session.get(join_url(url, filename), headers=headers, proxies=session.proxies,
      File "anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 542, in get
        return self.request('GET', url, **kwargs)
      File "anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 529, in request
        resp = self.send(prep, **send_kwargs)
      File "anaconda3/lib/python3.9/site-packages/requests/sessions.py", line 687, in send
        r.content
      File "anaconda3/lib/python3.9/site-packages/requests/models.py", line 838, in content
        self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
      File "anaconda3/lib/python3.9/site-packages/requests/models.py", line 763, in generate
        raise ChunkedEncodingError(e)
    requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
`$ /home/wanghuijuan/anaconda3/bin/conda install -c conda-forge cudatoolkit=10.0 cudnn=7.4`
  environment variables:
                 CIO_TEST=<not set>
        CONDA_DEFAULT_ENV=
                CONDA_EXE=anaconda3/bin/conda
             CONDA_PREFIX=
           CONDA_PREFIX_1=anaconda3
           CONDA_PREFIX_2=
    CONDA_PROMPT_MODIFIER=()
         CONDA_PYTHON_EXE=anaconda3/bin/python
               CONDA_ROOT=anaconda3
              CONDA_SHLVL=3
           CURL_CA_BUNDLE=<not set>
                     PATH=/anaconda3/condabin:/home/wanghuijuan/.vscode
                          -server/bin/6d9b74a70ca9c7733b29f0456fd8195364076dda/bin/remote-cli:/u
                          sr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:
                          /usr/local/games:/snap/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>
     active environment : 
    active env location : 
            shell level : 3
       user config file : .condarc
 populated config files : 
          conda version : 4.13.0
    conda-build version : 3.21.8
         python version : 3.9.12.final.0
       virtual packages : __cuda=11.4=0
                          __linux=4.15.0=0
                          __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment :anaconda3  (writable)
      conda av data dir :anaconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                         https://conda.anaconda.org/conda-forge/noarch
                         https://repo.anaconda.com/pkgs/main/linux-64
                         https://repo.anaconda.com/pkgs/main/noarch
                         https://repo.anaconda.com/pkgs/r/linux-64
                         https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/wanghuijuan/anaconda3/pkgs
                          /home/wanghuijuan/.conda/pkgs
       envs directories : /home/wanghuijuan/anaconda3/envs
                          /home/wanghuijuan/.conda/envs
               platform : linux-64
             user-agent : conda/4.13.0 requests/2.27.1 CPython/3.9.12 Linux/4.15.0-136-generic ubuntu/18.04.4 glibc/2.27
                UID:GID : 1018:1014
             netrc file : None
           offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
If submitted, this report will be used by core maintainers to improve
future releases of conda.
Would you like conda to send this report to the core maintainers?
[y/N]: y
Upload did not complete.
Thank you for helping to improve conda.
Opt-in to always sending reports (and not see this message again)
by running
    $ conda config --set report_errors true



不知道发生了什么,总之换一种安装方式好了(参考TensorFlow-gpu安装和测试(TensorFlow-gpu1.14+Cuda10)_爱学习的小龙的博客-CSDN博客_tensorflowgpu测试):


wget -P files/install_packages https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/linux-64/cudatoolkit-10.0.130-0.conda
conda install files/install_packages/cudatoolkit-10.0.130-0.conda


重新运行Python代码。和之前一样的输出部分就不写了,直接从不一样的地方开始:


2022-08-17 16:15:43.407219: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-08-17 16:15:43.409338: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-08-17 16:15:43.411111: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2022-08-17 16:15:43.411878: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2022-08-17 16:15:43.415478: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2022-08-17 16:15:43.418072: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2022-08-17 16:15:43.424901: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2022-08-17 16:15:43.435064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0, 1, 2, 3
2022-08-17 16:15:43.435492: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-08-17 16:15:43.441476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-08-17 16:15:43.442070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1 2 3 
2022-08-17 16:15:43.442448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y Y Y 
2022-08-17 16:15:43.443431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N Y Y 
2022-08-17 16:15:43.444206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2:   Y Y N Y 
2022-08-17 16:15:43.444586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3:   Y Y Y N 
2022-08-17 16:15:43.452440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2446 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2022-08-17 16:15:43.462938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 5244 MB memory) -> physical GPU (device: 1, name: Tesla T4, pci bus id: 0000:5e:00.0, compute capability: 7.5)
2022-08-17 16:15:43.469831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 14259 MB memory) -> physical GPU (device: 2, name: Tesla T4, pci bus id: 0000:b1:00.0, compute capability: 7.5)
2022-08-17 16:15:43.483509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 14259 MB memory) -> physical GPU (device: 3, name: Tesla T4, pci bus id: 0000:d9:00.0, compute capability: 7.5)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:1 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:2 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:3 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla T4, pci bus id: 0000:3b:00.0, compute capability: 7.5
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla T4, pci bus id: 0000:5e:00.0, compute capability: 7.5
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla T4, pci bus id: 0000:b1:00.0, compute capability: 7.5
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla T4, pci bus id: 0000:d9:00.0, compute capability: 7.5
2022-08-17 16:15:43.490300: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:1 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:2 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:3 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla T4, pci bus id: 0000:3b:00.0, compute capability: 7.5
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla T4, pci bus id: 0000:5e:00.0, compute capability: 7.5
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla T4, pci bus id: 0000:b1:00.0, compute capability: 7.5
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla T4, pci bus id: 0000:d9:00.0, compute capability: 7.5
WARNING:tensorflow:From /home/wanghuijuan/whj_code1/trytf1.py:13: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
add: (Add): /job:localhost/replica:0/task:0/device:GPU:2
2022-08-17 16:15:43.495600: I tensorflow/core/common_runtime/placer.cc:54] add: (Add)/job:localhost/replica:0/task:0/device:GPU:2
init: (NoOp): /job:localhost/replica:0/task:0/device:GPU:0
2022-08-17 16:15:43.495642: I tensorflow/core/common_runtime/placer.cc:54] init: (NoOp)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2022-08-17 16:15:43.495664: I tensorflow/core/common_runtime/placer.cc:54] a: (Const)/job:localhost/replica:0/task:0/device:CPU:0
b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
2022-08-17 16:15:43.495682: I tensorflow/core/common_runtime/placer.cc:54] b: (Const)/job:localhost/replica:0/task:0/device:CPU:0
[2. 4. 6.]


运行TensorFlow代码时需要在代码前加上这些:


import tensorflow as tf
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   
os.environ["CUDA_VISIBLE_DEVICES"] = "2"    #这里是gpu的序号,指定使用的gpu对象
config = tf.ConfigProto()
config.gpu_options.allow_growth = True


Keras就自动安装好了。


直接用bert4keras代码来举个实际用例:


pip install bert4keras
wget -P /data/pretrained_model/chinese_L-12_H-768_A-12 https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip
unzip /data/pretrained_model/chinese_L-12_H-768_A-12/chinese_L-12_H-768_A-12.zip -d /data/pretrained_model/chinese_L-12_H-768_A-12


代码(改自https://github.com/bojone/bert4keras/blob/master/examples/basic_extract_features.py):


import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
import time
import numpy as np
from bert4keras.backend import keras
from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import Tokenizer
from bert4keras.snippets import to_array
config_path = '/data/pretrained_model/chinese_L-12_H-768_A-12/chinese_L-12_H-768_A-12/bert_config.json'
checkpoint_path = '/data/pretrained_model/chinese_L-12_H-768_A-12/chinese_L-12_H-768_A-12/bert_model.ckpt'
dict_path = '/data/pretrained_model/chinese_L-12_H-768_A-12/chinese_L-12_H-768_A-12/vocab.txt'
tokenizer = Tokenizer(dict_path, do_lower_case=True)  # 建立分词器
model = build_transformer_model(config_path, checkpoint_path)  # 建立模型,加载权重
# 编码测试
token_ids, segment_ids = tokenizer.encode(u'语言模型')
token_ids, segment_ids = to_array([token_ids], [segment_ids])
print('\n ===== predicting =====\n')
print(model.predict([token_ids, segment_ids]))
"""
输出:
[[[-0.63251007  0.2030236   0.07936534 ...  0.49122632 -0.20493352
    0.2575253 ]
  [-0.7588351   0.09651865  1.0718756  ... -0.6109694   0.04312154
    0.03881441]
  [ 0.5477043  -0.792117    0.44435206 ...  0.42449304  0.41105673
    0.08222899]
  [-0.2924238   0.6052722   0.49968526 ...  0.8604137  -0.6533166
    0.5369075 ]
  [-0.7473459   0.49431565  0.7185162  ...  0.3848612  -0.74090636
    0.39056838]
  [-0.8741375  -0.21650358  1.338839   ...  0.5816864  -0.4373226
    0.56181806]]]
"""
time.sleep(100)


time.sleep()命令是为了停留一下,显式用nvidia-smi命令看这个程序只在卡2上占用空间,以及具体占用了多大的GPU(占了14G,还是相当大的)


相关实践学习
部署Stable Diffusion玩转AI绘画(GPU云服务器)
本实验通过在ECS上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。
相关文章
|
6月前
|
TensorFlow 算法框架/工具 Python
最新版tensorflow安装教程,pip安装+手动安装
最新版tensorflow安装教程,pip安装+手动安装
548 1
|
Ubuntu TensorFlow 算法框架/工具
最新 Tensorflow 2.2极简安装教程 | 学习笔记
快速学习最新 Tensorflow 2.2极简安装教程
最新 Tensorflow 2.2极简安装教程 | 学习笔记
|
TensorFlow 算法框架/工具 异构计算
Tensorflow2.0安装教程 (CPU版本,windows环境)
Anaconda创建虚拟环境报错:An HTTP error occurred when trying to retrieve this URL
543 0
Tensorflow2.0安装教程 (CPU版本,windows环境)
|
机器学习/深度学习 数据挖掘 TensorFlow
TensorFlow 中文资源精选,官方网站,安装教程,入门教程,实战项目,学习路径。
Awesome-TensorFlow-Chinese TensorFlow 中文资源全集,学习路径推荐: 官方网站,初步了解。
1965 0
|
11天前
|
机器学习/深度学习 人工智能 算法
猫狗宠物识别系统Python+TensorFlow+人工智能+深度学习+卷积网络算法
宠物识别系统使用Python和TensorFlow搭建卷积神经网络,基于37种常见猫狗数据集训练高精度模型,并保存为h5格式。通过Django框架搭建Web平台,用户上传宠物图片即可识别其名称,提供便捷的宠物识别服务。
142 55
|
29天前
|
机器学习/深度学习 数据采集 数据可视化
TensorFlow,一款由谷歌开发的开源深度学习框架,详细讲解了使用 TensorFlow 构建深度学习模型的步骤
本文介绍了 TensorFlow,一款由谷歌开发的开源深度学习框架,详细讲解了使用 TensorFlow 构建深度学习模型的步骤,包括数据准备、模型定义、损失函数与优化器选择、模型训练与评估、模型保存与部署,并展示了构建全连接神经网络的具体示例。此外,还探讨了 TensorFlow 的高级特性,如自动微分、模型可视化和分布式训练,以及其在未来的发展前景。
70 5
|
1月前
|
机器学习/深度学习 人工智能 算法
基于Python深度学习的【垃圾识别系统】实现~TensorFlow+人工智能+算法网络
垃圾识别分类系统。本系统采用Python作为主要编程语言,通过收集了5种常见的垃圾数据集('塑料', '玻璃', '纸张', '纸板', '金属'),然后基于TensorFlow搭建卷积神经网络算法模型,通过对图像数据集进行多轮迭代训练,最后得到一个识别精度较高的模型文件。然后使用Django搭建Web网页端可视化操作界面,实现用户在网页端上传一张垃圾图片识别其名称。
84 0
基于Python深度学习的【垃圾识别系统】实现~TensorFlow+人工智能+算法网络
|
1月前
|
机器学习/深度学习 人工智能 算法
【手写数字识别】Python+深度学习+机器学习+人工智能+TensorFlow+算法模型
手写数字识别系统,使用Python作为主要开发语言,基于深度学习TensorFlow框架,搭建卷积神经网络算法。并通过对数据集进行训练,最后得到一个识别精度较高的模型。并基于Flask框架,开发网页端操作平台,实现用户上传一张图片识别其名称。
91 0
【手写数字识别】Python+深度学习+机器学习+人工智能+TensorFlow+算法模型
|
1月前
|
机器学习/深度学习 人工智能 算法
基于深度学习的【蔬菜识别】系统实现~Python+人工智能+TensorFlow+算法模型
蔬菜识别系统,本系统使用Python作为主要编程语言,通过收集了8种常见的蔬菜图像数据集('土豆', '大白菜', '大葱', '莲藕', '菠菜', '西红柿', '韭菜', '黄瓜'),然后基于TensorFlow搭建卷积神经网络算法模型,通过多轮迭代训练最后得到一个识别精度较高的模型文件。在使用Django开发web网页端操作界面,实现用户上传一张蔬菜图片识别其名称。
94 0
基于深度学习的【蔬菜识别】系统实现~Python+人工智能+TensorFlow+算法模型
|
1月前
|
机器学习/深度学习 人工智能 TensorFlow
基于TensorFlow的深度学习模型训练与优化实战
基于TensorFlow的深度学习模型训练与优化实战
92 0