# TensorFlow 多 GPU 处理并行数据

## Multi-GPU processing with data parallelism

If you write your software in a language like C++ for a single cpu core, making it run on multiple GPUs in parallel would require rewriting the software from scratch. But this is not the case with TensorFlow. Because of its symbolic nature, tensorflow can hide all that complexity, making it effortless to scale your program across many CPUs and GPUs.

 import tensorflow as tf

with tf.device(tf.DeviceSpec(device_type='CPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b

tf.Session().run(c)

The same thing can as simply be done on GPU:

with tf.device(tf.DeviceSpec(device_type='GPU', device_index=0)):
a = tf.random_uniform([1000, 100])
b = tf.random_uniform([1000, 100])
c = a + b


But what if we have two GPUs and want to utilize both? To do that, we can split the data and use a separate GPU for processing each half:
python
split_a = tf.split(a, 2)
split_b = tf.split(b, 2)

split_c = []
for i in range(2):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
split_c.append(split_a[i] + split_b[i])

c = tf.concat(split_c, axis=0)


Let's rewrite this in a more general form so that we can replace addition with any other set of operations:

<div class="se-preview-section-delimiter"></div>

python
def make_parallel(fn, num_gpus, **kwargs):
in_splits = {}
for k, v in kwargs.items():
in_splits[k] = tf.split(v, num_gpus)

out_split = []
for i in range(num_gpus):
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0):
out_split.append(fn(**{k : v[i] for k, v in in_splits.items()}))

return tf.concat(out_split, axis=0)

def model(a, b):
return a + b

c = make_parallel(model, 2, a=a, b=b)

You can replace the model with any function that takes a set of tensors as input and returns a tensor as result with the condition that both the input and output are in batch. Note that we also added a variable scope and set the reuse to true. This makes sure that we use the same variables for processing both splits. This is something that will become handy in our next example.

Let’s look at a slightly more practical example. We want to train a neural network on multiple GPUs. During training we not only need to compute the forward pass but also need to compute the backward pass (the gradients). But how can we parallelize the gradient computation? This turns out to be pretty easy.

Recall from the first item that we wanted to fit a second degree polynomial to a set of samples. We reorganized the code a bit to have the bulk of the operations in the model function:

import numpy as np
import tensorflow as tf

def model(x, y):
w = tf.get_variable("w", shape=[3, 1])

f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1)
yhat = tf.squeeze(tf.matmul(f, w), 1)

loss = tf.square(yhat - y)
return loss

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

loss = model(x, y)

tf.reduce_mean(loss))

def generate_data():
x_val = np.random.uniform(-10.0, 10.0, size=100)
y_val = 5 * np.square(x_val) + 3
return x_val, y_val

sess = tf.Session()
sess.run(tf.global_variables_initializer())
for _ in range(1000):
x_val, y_val = generate_data()
_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})

_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})
print(sess.run(tf.contrib.framework.get_variables_by_name("w")))

Now let’s use make_parallel that we just wrote to parallelize this. We only need to change two lines of code from the above code:

loss = make_parallel(model, 2, x=x, y=y)

tf.reduce_mean(loss),
colocate_gradients_with_ops=True)

The only thing that we need to change to parallelize backpropagation of gradients is to set the colocate_gradients_with_ops flag to true. This ensures that gradient ops run on the same device as the original op.

|
6天前
|

|
5天前
|

🔍揭秘Python数据分析奥秘，TensorFlow助力解锁数据背后的亿万商机
【9月更文挑战第11天】在信息爆炸的时代，数据如沉睡的宝藏，等待发掘。Python以简洁的语法和丰富的库生态成为数据分析的首选，而TensorFlow则为深度学习赋能，助你洞察数据核心，解锁商机。通过Pandas库，我们可以轻松处理结构化数据，进行统计分析和可视化；TensorFlow则能构建复杂的神经网络模型，捕捉非线性关系，提升预测准确性。两者的结合，让你在商业竞争中脱颖而出，把握市场脉搏，释放数据的无限价值。以下是使用Pandas进行简单数据分析的示例：
19 5
|
6天前
|

【9月更文挑战第10天】从数据新手成长为AI专家，需先掌握Python基础语法，并学会使用NumPy和Pandas进行数据分析。接着，通过Matplotlib和Seaborn实现数据可视化，最后利用TensorFlow或PyTorch探索深度学习。这一过程涉及从数据清洗、可视化到构建神经网络的多个步骤，每一步都需不断实践与学习。借助Python的强大功能及各类库的支持，你能逐步解锁数据的深层价值。
17 0
|
16天前
|

JSF 邂逅持续集成，紧跟技术热点潮流，开启高效开发之旅，引发开发者强烈情感共鸣
41 0
|
16天前
|

【8月更文挑战第31天】Uno Platform 是一种强大的工具，允许开发者使用 C# 和 XAML 构建跨平台应用。本文探讨了 Uno Platform 中实现跨平台应用的最佳实践，包括代码共享、平台特定功能、性能优化及测试等方面。通过共享代码、采用 MVVM 模式、使用条件编译指令以及优化性能，开发者可以高效构建高质量应用。Uno Platform 支持多种测试方法，确保应用在各平台上的稳定性和可靠性。这使得 Uno Platform 成为个人项目和企业应用的理想选择。
25 0
|
16天前
|

TensorFlow 数据管道优化超重要！掌握这些关键技巧，大幅提升模型训练效率！
【8月更文挑战第31天】在机器学习领域，高效的数据处理对构建优秀模型至关重要。TensorFlow作为深度学习框架，其数据管道优化能显著提升模型训练效率。数据管道如同模型生命线，负责将原始数据转化为可理解形式。低效的数据管道会限制模型性能，即便模型架构先进。优化方法包括：合理利用数据加载与预处理功能，使用tf.data.Dataset API并行读取文件；使用tf.image进行图像数据增强；缓存数据避免重复读取，使用cache和prefetch方法提高效率。通过这些方法，可以大幅提升数据管道效率，加快模型训练速度。
24 0
|
2月前
|

Python数据分析新纪元：TensorFlow与PyTorch双剑合璧，深度挖掘数据价值
【7月更文挑战第30天】随着大数据时代的发展，数据分析变得至关重要，深度学习作为其前沿技术，正推动数据分析进入新阶段。本文介绍如何结合使用TensorFlow和PyTorch两大深度学习框架，最大化数据价值。
61 8
|
1月前
|
TensorFlow 算法框架/工具 异构计算
【Tensorflow 2】查看GPU是否能应用

15 2
|
2月前
|

🔍揭秘Python数据分析奥秘，TensorFlow助力解锁数据背后的亿万商机
【7月更文挑战第29天】在数据丰富的时代，Python以其简洁和强大的库支持成为数据分析首选。Pandas库简化了数据处理与分析，如读取CSV文件、执行统计分析及可视化销售趋势。TensorFlow则通过深度学习技术挖掘复杂数据模式，提升预测准确性。两者结合助力商业决策，把握市场先机，释放数据巨大价值。
39 4
|
2月前
|