2.5 装载TensorFlow
需要提醒的是,建议使用Virtualenv来安装TensorFlow,安装完TensorFlow和GPU (图形加速器)支持后,需要验证。
由于某些用户在安装TensorFlow GPU支持时会遇到问题,因此接下来将介绍如何安装GPU支持。
首先,开发者要在Developer Nvidia Website上注册。
接着,还需要安装CUDA Toolkit 9.0, tensorflow.org中的链接始终指向最新的CUDA版本,现在是9.2版本。但是不要使用9.2版本,除非TensorFlow支持它。请使用上面链接的CUDA 9.0版本。
同样,请下载并安装cuDNN v7.1.4 for CUDA 9.0, tensorflow.org中的链接指向的最新版cuDNN是CUDA 9.2的v7.1.4版本。安装并运行如下命令:
$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176
接着,运行命令“$ nvidia-smi”,得到如下结果:
Fri Jun 15 22:21:08 2018 +---------------------------------------------------------------------+ | NVIDIA-SMI 384.130 Driver Version: 384.130 | |----------------------------+------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |============================+====================+====================| | 0 Quadro K600 Off | 00000000:05:00.0 Off | N/A | | 25% 48C P0 N/A / N/A | 0MiB / 979MiB | 0% Default | +----------------------------+--------------------+--------------------+ +----------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |======================================================================| | No running processes found | +---------------------------------------------------------------------+
Device 0: "Quadro 600" CUDA Driver Version / Runtime Version 9.0 / 9.0 CUDA Capability Major/Minor version number: 2.1 Total amount of global memory: 962 MBytes (1009254400 bytes)
Quadro M6000
$ ./bin/x86_64/linux/release/deviceQuery
./bin/x86_64/linux/release/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Quadro M6000 24GB" CUDA Driver Version / Runtime Version 9.0 / 9.0 CUDA Capability Major/Minor version number: 5.2 Total amount of global memory: 24467 MBytes (25655836672 bytes) (24) Multiprocessors, (128) CUDA Cores/MP: 3072 CUDA Cores GPU Max Clock rate: 1114 MHz (1.11 GHz) Memory Clock rate: 3305 Mhz Memory Bus Width: 384-bit L2 Cache Size: 3145728 bytes Maximum Texture Dimension Size (x, y, z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x, y, z): (1024, 1024, 64) Max dimension size of a grid size (x, y, z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1 Result = PASS $ nvidia-smi +---------------------------------------------------------------------+ | NVIDIA-SMI 384.130 Driver Version: 384.130 | |-----------------------------+--------------------+-------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |=============================+===================+====================| | 0 Quadro M6000 24GB Off | 00000000:04:00.0 On | Off | | 25% 41C P8 20W / 250W | 488MiB / 24467MiB | 0% Default | +----------------------------+---------------------+-------------------+ +----------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=====================================================================| | 0 2183 G /usr/lib/xorg/Xorg 319MiB | | 0 3796 G compiz 92MiB | | 0 6095 G ...-token=32ADD0D4261B4355966B2810A61BBF37 72MiB | +---------------------------------------------------------------------+
最后,还要安装TensorFlow GPU:
(tensorflow)$ pip install --upgrade tensorflow # for Python 2.7 (tensorflow)$ pip3 install --upgrade tensorflow # for Python 3.n (tensorflow)$ pip install --upgrade tensorflow-gpu # for Python 2.7 and GPU (tensorflow)$ pip3 install --upgrade tensorflow-gpu # for Python 3.n and GPU
(tensorflow) $ python Python 2.7.12 (default, Dec 4 2017, 14:50:18) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> hello = tf.constant("hello") >>> sess = tf.Session() **2018-06-20 06:54:34.284161: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA** **2018-06-20 06:54:34.460555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: ** **name: Quadro M6000 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.114** **pciBusID: 0000:04:00.0** **totalMemory: 23.89GiB freeMemory: 23.29GiB** **2017-05-20 06:54:34.460600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0** **2017-05-20 06:54:34.708584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:** **2017-05-20 06:54:34.708635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 ** **2017-05-20 06:54:34.708644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N ** **2017-05-20 06:54:34.709069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22598 MB memory) -> physical GPU (device: 0, name**: Quadro M6000 24GB, pci bus id: 0000:04:00.0, compute capability: 5.2)** >>> print(sess.run(hello)) hello