tensorflow 学习笔记¶

介绍
Install
从构建模型到Android端
Caffe到tensorflow的转换
Nodejs
源代码精读
skflow(Scikit Flow)
代码结构
训练记录
- 1 数据准备
- 2 参数分析
后续工作

介绍 ¶

tensorflow 可用于c、c++和python

最好是从源码安装（在开发android app时要求）
生成动态链接库(生成的文件在 bazel-bin/tensoflow 下)
# ubuntu(.so)
bazel build //tensoflow:libtensorflow.so

# mac(.dylib)
bazel build //tensorflow
工作流程包括以下几大块：

construction phase ——> assembles a graph

execution phase ——> session

save & load ——> saver, superviser (GraphDef)

visuable the graph ——> summary, tensorboard

The tf.import_graph_def() function provides the only (supported) way to perform this surgery, via the optional input_map argument. Let’s say you want to replace the tensor “DecodeJpeg:0” with your new variable. You would do something like the following:

graph_def = ...
tf_new_image = tf.constant(...)
_ = tf.import_graph_def(graph_def, input_map={"DecodeJpeg:0": tf_new_image})

4.2 将已有网络作为现有网络的一部分重新训练网络 ref3 ¶

若要保留原有参数，optimize 时将欲保留的参数除外，如下：

opt.minimize(loss, <subset of variables you want to train>)

否则导入参数后正常训练即可

4.3 retrain models

5 set layer-wise learning rate ¶

How to set layer-wise learning rate in Tensorflow?

var_list1 = [variables from first 5 layers]
var_list2 = [the rest ofvariables]
opt1 = tf.train.GradientDescentOptimizer(0.00001)
opt2 = tf.train.GradientDescentOptimizer(0.0001)
grads = tf.gradients(loss, var_list1 + var_list2)
grads1 = grads[:len(var_list1)]
grads2 = grads[len(var_list1):]
tran_op1 = opt1.apply_gradients(zip(grads1, var_list1))
train_op2 = opt2.apply_gradients(zip(grads2, var_list2))
train_op = tf.group(train_op1, train_op2)

调试阶段 ¶

tools

注解

tfdbg 调试时，不能直接从 .sh 文件运行，否则会导致 cbreak() returned ERR 的错误。直接从命令行运行 python 脚本

测试阶段 ¶

1 .pb文件的导入(re:classify_image)¶

有两种表达方式：

from tensorflow.python.platform import gfile

# 第1种方法 (当需要构建network时，此法不可用？)
with tf.Session() as sess:
  with gfile.FastGFile('/path/to/.pbfile','rb') as f:
  # 也可以是tf.gfile.FastGFile
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    sess.graph.as_default()
    tf.import_graph_def(graph_def)

# 第2种方法 (node 名称前会加上 import)
with tf.Graph().as_default() as imported_graph:
  with gfile.FastGFile('/path/to/.pbfile','rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    tf.import_graph_def(graph_def)
sess = tf.Session(graph = imported_graph)

使用时，为确定调用名称，使用以下语句查看：

for node in session.graph_def.node:
  print node.name
for op in tf.get_default_graph().get_operations():
  print op.name
for op in sess.graph.get_operations():  # 此句同上
  print op.name

softmax_tensor =  sess.graph.get_tensor_by_name("XXX")
prediction = sess.run(softmax_tensor,{"name",input})

C++ 开发 ¶

loading-a-tensorflow-graph-with-the-c-api

1 创建graph并使用 ¶

... // root 添加一系列ops
tensoflow::GraphDef graph_def;
TF_RETURN_IF_ERROR(root.ToGraphDef(&graph_def));

std::unique_ptr<tensorflow::Session> session(
    tensorflow::NewSession(tensorflow::SessionOptions()));
TF_RETURN_IF_ERROR(session->Create(graph_def));
TF_RETURN_IF_ERROR(session->Run({}, # input
                             {output_name} # which node we want get the output from
                             , {}, out_tensors # where to put the output data
                             ));

2 .pb文件的使用(re:label_image)¶

// 方法1:
Status LoadGraph(string graph_file_name, Session** session) {
  tensoflow::GraphDef graph_def;
  TF_RETURN_IF_ERROR(
        ReadBinaryProto(tensoflow::Env::Default(), graph_file_name, &graph_def)
  );
  TF_RETURN_IF_ERROR(NewSession(SessionOptions(), session));
  TF_RETURN_IF_ERROR((*session)->Create(graph_def));
  return Status::OK();
}

// 方法2:
Status LoadGraph(string graph_file_name, std::unique_ptr<tensoflow::Session>* session) {
  tensoflow::GraphDef graph_def;
  TF_RETURN_IF_ERROR(
        ReadBinaryProto(tensoflow::Env::Default(), graph_file_name, &graph_def)
  );
  session->reset(tensoflow::NewSession(tensoflow::SessionOptions()));
  TF_RETURN_IF_ERROR((*session)->Create(graph_def));
  return Status::OK();
}

编译注意事项：

BUILD 文件中的内容（比如文件name 与 .cc 文件一致）
整个项目文件夹要放在” tensorflow/tensorflow”以下的文件夹中
LISCENCE 文件？？
不要和其它无关文件放在一起
单幅图时注意expand dims

3 生成 tensorflow 的动态链接库 ¶

Create a new folder in the TensorFlow repo at tensorflow/tensorflow/libtensorflow/.
tensorflow/tensorflow/libtensorflow/
tensorflow/tensorflow/libtensorflow/BUILD
Inside this folder we’re going to create a new BUILD file which will contain a single call to cc_binary with the linkshared option set to 1 so that we get a .so from the build. The name of the binary must end in .so or it will not work.
cc_binary(
    name = "libtensorflow.so",
    linkshared = 1,
    deps = [
        "//tensorflow/core:tensorflow",
    ]
)
From the root of the repository, run ./configure .

Compile the shared library with bazel build --config=opt //tensorflow/libtensorflow:libtensorflow.so and locate the generated file from the repo’s root: bazel-bin/tensorflow/libtensorflow/libtensorflow.so
注解

If you’re on OS X and using Node.js you’ll need to rename the shared library from libtensorflow.so to libtensorflow.dylib

if compile with GPU, cmd is bazel build --config=opt --config=cuda //tensorflow/libtensorflow:libtensorflow.so

4 c++ 代码调试 ¶

在编译时加入调试信息 bazel build -c dbg //<path to src>:<target_name> , 然后使用 gdb 调试

更多调试，参考 TENSORFLOW_DEBUG.md

在macOS上用LLDB调试TensorFlow源码

Android实现 ¶

android环境搭建 ¶

若tensorflow尚未下载，最好用以下方法下载:

$ git clone https://github.com/tensorflow/tensorflow.git --recurse-submodules

后续参考“/tensorflow/example/android”中的 Readme.md 文件（国内sdk 和 ndk 的下载地址: androiddevtools ）
编译生成的.so文件（位于.cache文件夹下）可用于androd studio中开发使用。

小技巧

可以通过查看BUILD文件查询.so文件的名称，后通过locate命令查询其在ubuntu中的位置。
tensorflow的mobile实现： mobile <https://www.tensorflow.org/mobile.html> _
有些错误可能是因为环境变量的设置问题，例如报错 /usr/local/bin/gcc ,将/usr/local/bin 从PATH中删除即可，猜测可能是gcc冲突之类的原因。注意修改环境变量后确认其是否真正生效！！
在反复修改编译的过程中建议使用 bazel clean 来清除前次的编译结果，避免未知错误
bazel 版本造成问题，加 –invcompatible_load_argument_is_label=false
找不到cuda等的.so文件，build 加 –action_env=”LD_LIBRARY_PATH=${LD_LIBRARY_PATH}”

注解

若之前用于编译生成gpu 版的.whl 文件，那么此时注意重新configure，因为android 代码的编译不需要 gpu。 the android demo should not need CUDA. Is it possible during configuration you configured TF to build with CUDA? That would add –config=cuda automatically to the build.

开发 ¶

bazel build 后生成 .so 文件(在bazel-out|.cache文件夹下)，可将其用于AS中开发。

小技巧

可以通过查看BUILD文件查询.so文件的名称，后通过locate命令查询其在ubuntu中的位置。

参考 Android TensorFlow Machine Learning Example

根据 tensoflow/contrib/android 下的 README build the .jar and .so file.
create an android sample project in Android Studio.
Put label file (imagenet_comp_graph_label_strings.txt) and pre-trained model (tensorflow_inception_graph.pb) into assets folder.
Put .jar file in libs folder and right click and add as library.

Create jniLibs folder in main directory and put .so in jniLibs/armeabi-v7a folder.

├── app
│   ├── build.gradle
│   ├── libs
│   │   └── libandroid_tensorflow_inference_java.jar
│   └── src
│       ├── androidTest
│       ├── main
│       │   ├── AndroidManifest.xml
│       │   ├── assets
│       │   │   ├── imagenet_comp_graph_label_strings.txt
│       │   │   └── tensorflow_inception_graph.pb
│       │   ├── java
│       │   │   └── <main java code>
│       │   ├── jniLibs
│       │   │   └── armeabi-v7a
│       │   │       └── libtensorflow_inference.so
│       │   └── res
│       └── test
├── assets
│   └──<test image>
├── build.gradle
├── gradle
├── gradle.properties
├── gradlew
├── gradlew.bat
└── settings.gradle

注意事项 ¶

.h 和 .cc 文件同步修改

同步更换 .pb 和 .txt 文件

错误集锦 ¶

No OpKernel was registered to Support:

出错的原因是模型中包含的某些运算没有加到 BUILD 文件中，解决办法有两条：

1 将包含相应运算的文件包含到 BUILD 文件中

2 可能的话，保存不包含该运算的模型

Caffe到tensorflow的转换 ¶

转换程序代码： caffe-tensorflow

Nodejs ¶

lodading tensorflow model from Node.js

源代码精读 ¶

基础应用注意 ¶

tensor A * B – 元素运算 tf.multiply(A, B) – 元素运算 tf.matmul(A, B) – 矩阵运算

笔记 ¶

tf.shape() 返回动态纬度， tensor.shape 返回静态纬度

Seq2Seq ¶

tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py (version 1.11)

tensorflow版本升级之后(r1.3及之后？)把之前的 tf.nn.seq2seq 的代码迁移到了 tf.contrib.legacy_seq2seq 下面，这部分API估计以后会被遗弃，因为已经开发出了新的API放在 tf.contrib.seq2seq 下面，更加灵活。

文件函数结构如下, 共实现了6个seq2seq函数

model_with_buckets()
- seq2seq函数
  1. basic_rnn_seq2seq() ：最简单版本，输入和输出都是embedding的形式；最后一步的state vector作为decoder的initial state；encoder和decoder用相同的RNN cell，但不共享权值参数；
  - rnn_decoder()
  1. tied_rnn_seq2seq() ：同1，但是encoder和decoder共享权值参数
  2. embedding_rnn_seq2seq() ：同1，但输入和输出改为id的形式，函数会在内部创建分别用于encoder和decoder的embedding matrix
  - embedding_rnn_decoder()
  1. embedding_tied_rnn_seq2seq() ：同2，但输入和输出改为id形式，函数会在内部创建分别用于encoder和decoder的embedding matrix
  2. embedding_attention_seq2seq() ：同3，但多了attention机制
  - embedding_attention_decoder()
  - attention_decoder()
  - attention()
  1. one2many_rnn_seq2seq()
- loss函数
  - sequence_loss_by_example()
  - sequence_loss()

model_with_buckets() 的目的是为了减少计算量和加快模型计算速度，因为这部分代码比较古老——有些地方还在使用static_rnn()这种函数，其实新版的tf中引入dynamic_rnn之后就不需要这么做了。该方法为每个bucket都构造一个模型(这些模型参数共享)，然后训练时取相应长度的序列进行。其实这一部分可以参考现在的dynamic_rnn来进行理解，dynamic_rnn是对每个batch的数据将其pad至本batch中长度最大的样本，而bucket则是在数据预处理环节先对数据长度进行聚类操作。

tensorflow/contrib/eager/python/examples/generative_examples tensorflow/contrib/eager/python/examples/nmt_with_attention models/research/textsum nmt