引言
根据文档 https://tensorflow.google.cn/install 进行TensorFlow的安装时,要特别注意环境依赖的版本号,如果某个依赖项版本号不一致就会出现问题,比如使用的Python版本、Bazel版本、ProtocolBuffers版本等。
如果是通过Pip安装Python库的形式会容易一些,但如果想从源码编译,尤其是想编译出C/C++ API动态库形式就会比较麻烦,即使编译成功还要避免和其它库出现ABI冲突,参考: https://github.com/rangsimanketkaew/tensorflow-cpp-api 可以节省很多时间。
Pip安装(TersorFlow 2)
目前支持Python3.6~3.9:
$ sudo apt update
$ sudo apt install python3-dev python3-pip python3-venv
$ python3 --version
$ pip3 --version
# Requires the latest pip
$ pip install --upgrade pip
# Current stable release for CPU and GPU
$ pip install tensorflow
# Or try the preview build (unstable)
$ pip install tf-nightly
验证安装成功:
$ ipython
In [1]: import tensorflow as tf
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
In [2]: tf.version.VERSION
Out[2]: '2.11.0'
In [3]: print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(229.19952, shape=(), dtype=float32)
Docker安装
$ docker pull tensorflow/tensorflow:latest # Download latest stable image
$ docker run -it -p 8888:8888 tensorflow/tensorflow:latest-jupyter # Start Jupyter server
源码安装
进行源码安装要先注意几个问题:
使用r2.6分支进行源码编译:
$ git clone git@github.com:tensorflow/tensorflow.git
$ cd tensorflow/
$ git checkout r2.6
查看该分支依赖的protobuf版本:
$ grep -i protobuf tensorflow/workspace2.bzl
protobuf-3.9.2
因为本地已经安装的版本不一致,需要降级安装3.9.2:
$ protoc --version
libprotoc 3.14.0
$ git clone https://github.com/protocolbuffers/protobuf.git
$ git checkotu v3.9.2
$ ./autogen.sh
$ ./configure
$ make
$ make install
$ protoc --version
libprotoc 3.9.2
现在可以开始编译:
$ bazel --version
bazel 3.7.2
$ ./configure # 这次都选N
$ bazel build --config=opt //tensorflow:libtensorflow.so # C
$ bazel build --config=opt //tensorflow:libtensorflow_framework.so # framework base
$ ls -1 bazel-bin/tensorflow/libtensorflow*.so*
$ bazel build --config=opt //tensorflow:libtensorflow_cc.so # C++
$ bazel build --config=cuda //tensorflow:libtensorflow_cc.so # with CUDA
$ bazel build --config=opt //tensorflow:libtensorflow_framework.so # framework base
$ ls -1 bazel-bin/tensorflow/libtensorflow*.so*
$ bazel build --config=opt //tensorflow:install_headers # headers
$ bazel build --config=opt //tensorflow/tools/lib_package:libtensorflow
$ ls -1 bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz
$ bazel build //tensorflow/tools/pip_package:build_pip_package # py
$ ls -1 bazel-bin/tensorflow/tools/pip_package/build_pip_package
$ mkdir tensorflow_pkg
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package ./tensorflow_pkg
$ ls -1 ./tensorflow_pkg/tensorflow-2.9.3-cp37-cp37m-linux_x86_64.whl
$ pip3 install --user ./tensorflow_pkg/tensorflow-2.9.3-cp37-cp37m-linux_x86_64.whl
对python, bazel等版本不一致问题都可能导致出问题,参考: https://github.com/rangsimanketkaew/tensorflow-cpp-api
设置API路径和变量
先设置头文件和链接库文件:
$ mkdir /usr/local/tensorflow/lib/
$ rsync -av bazel-bin/tensorflow/include /usr/local/tensorflow/
$ rsync -av bazel-bin/tensorflow/libtensorflow*.so* /usr/local/tensorflow/lib/
$ export LD_LIBRARY_PATH=/usr/local/tensorflow/lib
# for Mac
$ rsync -av bazel-bin/tensorflow/libtensorflow*.dylib* /usr/local/tensorflow/lib/
$ export DYLD_LIBRARY_PATH=/usr/local/tensorflow/lib
特别说明一下,如果是在Mac电脑上进行的,却只将.so
共享库文件放到指定位置,没有将.dylib
文件一起放过去,会遇到奇奇怪怪的问题:
$ otool -L test
test:
@rpath/libtensorflow_cc.so.2 (compatibility version 0.0.0, current version 0.0.0)
@rpath/libtensorflow_framework.so.2 (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1300.23.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
$ ./test
dyld[54114]: Library not loaded: @rpath/libtensorflow_framework.2.dylib
Referenced from: /usr/local/tensorflow/lib/libtensorflow_cc.so.2.6.5
Reason: tried: '/usr/local/tensorflow/lib//libtensorflow_framework.2.dylib' (no such file),
'/usr/local/tensorflow/lib/../_solib_darwin_x86_64/_U_S_Stensorflow_Clibtensorflow_Uc
c.so.2.6.5___Utensorflow/libtensorflow_framework.2.dylib' (no such file), '/usr/local/ten
sorflow/lib/libtensorflow_framework.2.dylib' (no such file), '/usr/local/lib/libtensorflo
w_framework.2.dylib' (no such file), '/usr/lib/libtensorflow_framework.2.dylib' (no such file)
明明使用的是.so
文件,却说找不到.dylib
,究其根本,原因是:
$ otool -L /usr/local/tensorflow/lib/libtensorflow_cc.so.2.6.5
/usr/local/tensorflow/lib/libtensorflow_cc.so.2.6.5:
@rpath/libtensorflow_cc.so.2 (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1300.23.0)
@rpath/libtensorflow_framework.2.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1858.112.0)
/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 60158.100.133)
/System/Library/Frameworks/IOKit.framework/Versions/A/IOKit (compatibility version 1.0.0, current version 275.0.0)
/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1858.112.0)
/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
可以看到虽然这是一个.so
文件,但其实它底下依赖了.dylib
文件的;如果只是按照其他的教程亦步亦趋只将.so
复制过去,就会发现能够顺利编译成功却又无法运行,徒增烦恼。
调用C API
#include <stdio.h>
#include <tensorflow/c/c_api.h>
int main() {
printf("Hello from TensorFlow C library version %s\n", TF_Version());
return 0;
}
编译:
$ gcc test_c.c -I/usr/local/tensorflow/include/ -L/usr/local/tensorflow/lib/ -ltensorflow -ltensorflow_framework -o test_c
$ ./test_c
Hello from TensorFlow C library version 2.6.5
调用C++ API
#include <tensorflow/core/platform/env.h>
#include <tensorflow/core/public/session.h>
#include <iostream>
using namespace std;
using namespace tensorflow;
int main()
{
Session* session;
Status status = NewSession(SessionOptions(), &session);
if (!status.ok()) {
cout << status.ToString() << "\n";
return 1;
}
cout << "Session successfully created.\n";
}
编译运行:
$ g++ -std=c++17 test_c++.cpp -I/usr/local/tensorflow/include/ -L/usr/local/tensorflow/lib/ -ltensorflow_cc -ltensorflow_framework -o test_c++
$ ./test_c++
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Session successfully created.
直接和TF源码一起编译
还有一种方法就是不提前编译动态库,而是直接在TF源码里面一起编译,就可以直接编译为可执行程序,也可以编译为动态库;这个好处是依赖动态库少,方便部署,却包含了大量不需要的东西导致文件很大,而且编译也很慢。
比如,使用这样的目录结构:
tensorflow-src/tensorflow/test-tf/
tensorflow-src/tensorflow/test-tf/BUILD
tensorflow-src/tensorflow/test-tf/test-tf.cpp
cc_binary(
name = "test-tf",
srcs = ["test-tf.cpp"],
deps = [
"//tensorflow/core:tensorflow",
],
)
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/public/session.h"
#include <iostream>
using namespace std;
using namespace tensorflow;
int main()
{
Session* session;
Status status = NewSession(SessionOptions(), &session);
if (!status.ok()) {
cout << status.ToString() << "\n";
return 1;
}
cout << "Session successfully created.\n";
}
$ cd ~/tensorflow-src/
$ ./configure
$ cd tensorflow/test-tf/
$ bazel build --config=opt :test-tf
$ cd ~/tensorflow-src/bazel-bin/tensorflow/test-tf/
$ ./test-tf
总结
TensorFlow支持最好的是Python,而C++是没有官方直接提供下载版本的,但为了线下用Python得到训练模型结果,然后在线上用C++开发的系统来使用,又的确是一个需求;过程虽然曲折,但也是必由之路!
资料