2019FastDepth论文复现（tensorflow）

type

Post

status

Published

date

Apr 7, 2022

slug

2019FastDepth复现tensorflow版本

summary

本文是2019FastDepth论文复现的tensorflow版本

论文笔记

简介

基于深度学习的单目深度估计在近几年是比较热门的研究方向之一，MIT的Diana Wofk等人在ICRA 2019上提出了一种用于嵌入式系统的深度估计算法FastDepth，在保证准确率的情况下，大大提高了模型的计算效率（来源）。

论文地址：[1903.03273v1] FastDepth: Fast Monocular Depth Estimation on Embedded Systems (arxiv.org)

官方验证测试的pytorch版本（无训练）：https://github.com/dwofk/fast-depth

笔记（这个部分以后需要边看边写）

摘要

深度感应是机器人任务的一个关键功能，如定位、测绘和障碍物检测等任务的关键功能。人们对从单一的RGB图像中进行深度估计的兴趣日益浓厚。由于单眼相机的成本和尺寸相对较低，人们对从单一的RGB图像进行深度估计的兴趣越来越大。然而，最先进的单目深度估计算法是基于相当复杂的深度神经网络的，对于在嵌入式平台上进行实时推理来说太慢了。在嵌入式平台上进行推理，例如，安装在微型飞行器上的嵌入式平台。

在本文中，我们解决的问题是在嵌入式系统上的快速深度估计的问题。我们提出了一种高效和轻量级的编码器-解码器网络结构并应用网络剪裁来进一步降低计算复杂性和延时。特别是，我们专注于设计一个低延迟的解码器。我们的方法证明了有可能达到与先前的深度估计工作类似的精度，但推理速度要比先前的深度估计快一个数量级。我们提出的网络，FastDepth，在NVIDIA Jetson TX2 GPU上的运行速度为178 fps，而在仅使用TX2 CPU时的运行速度为27 fps，有功功率低于10瓦。FastDepth在纽约大学深度v2数据上实现了接近最先进的精度。据作者所知，本文展示了在一个可由微型飞行器携带的嵌入式平台上使用深度神经网络进行实时的单眼深度估计，具有最低的延迟和最高的吞吐量。

嵌入式系统上的快速深度估计的问题

提出了一种高效和轻量级的编码器-解码器网络，解码器经过特别设计了，低延迟

跟之前baseline比精度近似，快一个数量级（sota）

实验证明帧率很高

之前的工作

这篇文章的related work写得不错，可以多看。

Introduction

这篇就是一个比较纯粹的技术报告，没啥数学推导啥的，重点在复现。

本文提出了一种低延迟，运行在嵌入式系统上的高效、高吞吐量、高精度的深度估计算法。我们提出了一种有效的编解码器网络体系结构，重点是低延迟设计。我们的方法使用MobileNet作为编码器，在解码器中使用深度可分离卷积的最近邻内插。我们应用了最先进的网络修剪NetAdapt，并使用TVM编译器堆栈来进一步减少目标嵌入式平台上的推理运行时间。

编码器-解码器结构，编码器用MobileNet，解码器用的深度可分离卷积的最近邻内插

手工设计网络参数冗余，训练时剪枝了

因为当时嵌入式平台没有对MobileNet的可分离卷积作优化，还对这个用TVM编译器做了优化

网络主体

21年了再看这个网络没啥新鲜的，主要注意有三个跨层连接，跨层通道没有作卷积操作，所以需要两者相和时维度一致。然后数据集并不是224*224的，输入时需要降采样。解码器是试出来的，在这篇文章的实验部分有写。

注意理解上采样部分的转置卷积，以及stride>1对应的微步卷积

上池化(unpooling),上采样(unsampling)和反卷积(deconvolution)的区别 - 简书 (jianshu.com)

【机器学习】详解转置卷积 (Transpose Convolution)_花与人间事同的博客-CSDN博客_转置卷积

单目图像深度估计算法-FastDepth - 简书 (jianshu.com)

复现

先直接上代码

以下代码在gitee smart_sever仓库里也有 fastdepth_train_test_model.py


from tensorflow.keras.layers import Conv2D, UpSampling2D, SeparableConv2D, BatchNormalization, Activation, add
from tensorflow.keras.models import Model
from tensorflow.keras import optimizers
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.applications.mobilenet import MobileNet
# 加载.h5图片数据
import numpy as np
from tensorflow.keras.utils import Sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Layer
import cv2
import h5py as h5
import datetime
import os

""" DataGenerator初始化"""
input_size=(224,224)
shuffle=True
train_txt_path='/data0/lijiaqi/traindata/nyudepthv2/train.txt'
val_txt_path='/data0/lijiaqi/traindata/nyudepthv2/val.txt'

# 训练时每个epoch中的step数量，在训练集DataGenerator的init函数中自动赋值，无需更改
train_steps_per_epoch=10
val_steps_per_epoch=10

class DataGenerator(Sequence):
    """
    基于Sequence的自定义Keras数据生成器
    """

    def __init__(self, txt_path , to_fit, batch_size=5, shuffle=shuffle):
        """

        :rtype: object
        """
        # 初始化方法
        # :param data_txt_path: 存h5数据路径的txt的所在位置
        # :param data_path_list:     h5数据路径列表
        # :param to_fit: 设定是否返回结果y
        # :param batch_size: batch size
        # :param shuffle: 每一个epoch后是否打乱数据

        self.data_txt_path = txt_path
        self.data_path_list=open(txt_path, 'r').readlines()
        # data_num_all 训练集/验证集中的数据个数
        self.data_num_all=len(self.data_path_list)
        self.batch_size = batch_size
        
        # 为训练集DataGenerator时，初始化steps_per_epoch
        if txt_path==train_txt_path:
            self.steps_per_epoch=self.data_num_all // self.batch_size
            global train_steps_per_epoch
            train_steps_per_epoch=self.steps_per_epoch
            #训练集按照给定的batch_size生成
            self.batch_size = batch_size
        elif txt_path == val_txt_path:
            #验证集的batch_size向训练集定下的steps_per_epoch对齐
            global val_steps_per_epoch
            self.batch_size = self.data_num_all // val_steps_per_epoch
            
        
        self.indexes_all=np.arange(self.data_num_all)
        self.to_fit = to_fit
        self.shuffle = shuffle
        self.on_epoch_end()
        self.input_size=input_size
        self.n_channels=3


    def __getitem__(self, index):
        """生成每一批次训练数据
        :param index: batch索引
        :return: 训练图像和标签
        """
        # 生成批次索引
        indexes_batch = self.indexes_all[index * self.batch_size:min((index + 1) * self.batch_size,self.data_num_all)]
        # 索引列表
        batch_path_list = [self.data_path_list[k] for k in indexes_batch]
        # 生成数据
        X = self._generate_X(batch_path_list)
        if self.to_fit:
            y = self._generate_y(batch_path_list)
            return X, y
        else:
            return X


    def __len__(self):
        """每个epoch下的批次数量
        """
        return int(np.floor(self.data_num_all / self.batch_size))


    def on_epoch_end(self):
        """每个epoch之后更新索引
        """
        self.indexes_all = np.arange(self.data_num_all)
        if self.shuffle == True:
            np.random.shuffle(self.indexes_all)


    def _generate_X(self, batch_path_list):
        """生成每一批次的图像
        :param batch_path_list: 批次数据索引列表
        :return: 一个批次的图像 shape(batch_size,w,h,channel)
        """
        # 初始化
        X = np.empty((self.batch_size, *self.input_size, self.n_channels))
        # 生成数据
        for i, path in enumerate(batch_path_list):
            # 存储一个批次
            X[i,] = self._load_h5(path[:-1],'x')
        return X


    def _generate_y(self, batch_path_list):
        """生成每一批次的图像
        :param batch_path_list: 批次数据索引列表
        :return: 一个批次的图像 shape(batch_size,w,h,channel)
        """
        # 初始化
        y = np.empty((self.batch_size, *self.input_size, 1))
        # 生成数据
        for i, path in enumerate(batch_path_list):
            # 存储一个批次
            y[i,] = self._load_h5(path[:-1],'y')
        return y


    def _load_h5(self, path, xory):
        """读取h5数据
        """
        h5_file = h5.File(path, "r")
        if(xory=='x'):
            np_file = h5_file.get('rgb')
            shape=np_file.shape
            #print(shape)
            #调整rgb通道至最后一维
            rgb_raw=np.transpose(np_file,(2,1,0))
            #降采样至224*224*3 标准输入
            rgb_downsample=cv2.resize(rgb_raw,dsize=(224, 224))
            return rgb_downsample
        elif(xory=='y'):
            np_file = h5_file.get('depth')
            shape=np_file.shape
            #print(shape)
            #调整深度图w,h
            depth_raw=np.transpose(np_file,(1,0))
            #降采样至224*224，增加维度至224*224*1标准输出 
            depth_downsample=cv2.resize(depth_raw,dsize=(224, 224))
            depth_downsample=np.expand_dims(depth_downsample,2)
            return depth_downsample
        else:
            print("error0")




class FastDepth:
    def __init__(self):
        self.build_net()

    def _SDWConv(self, filtres, kernel):
        def f(x):
            x = SeparableConv2D(filtres, kernel, padding='same')(x)
            x = BatchNormalization()(x)
            x = Activation('relu')(x)

            return x

        return f

    def _encoder(self):
        self.MN = MobileNet(input_shape=(224, 224, 3),
                            weights=None,
                            include_top='False')

        # 7*7*1024
        latent = self.MN.get_layer('conv_pw_13_relu').output

        return latent

    def _decoder(self, x):
        # 14*14*512
        x1 = self._SDWConv(512, (5, 5))(x)
        x1 = UpSampling2D()(x1)

        # 28*28*256
        x2 = self._SDWConv(256, (5, 5))(x1)
        x2 = UpSampling2D()(x2)
        s2 = self.MN.get_layer('conv_pw_5_relu').output
        x2 = add([x2, s2])

        # 56*56*128
        x3 = self._SDWConv(128, (5, 5))(x2)
        x3 = UpSampling2D()(x3)
        s3 = self.MN.get_layer('conv_pw_3_relu').output
        x3 = add([x3, s3])

        # 112*112*64
        x4 = self._SDWConv(64, (5, 5))(x3)
        x4 = UpSampling2D()(x4)
        s4 = self.MN.get_layer('conv_pw_1_relu').output
        x4 = add([x4, s4])

        # 224*224*32
        x5 = self._SDWConv(32, (5, 5))(x4)
        x5 = UpSampling2D()(x5)

        return x5

    def build_net(self):
        latent = self._encoder()
        out = self._decoder(latent)
        out_dense = Conv2D(1, (1, 1))(out)

        self.model = Model(inputs=self.MN.input, outputs=out_dense)


if __name__ == '__main__':


    # Generators
    # 72.76
    train_batchsize=64
    train_steps_per_epoch=10
    val_steps_per_epoch=20
    model_save_filepath="/data0/lijiaqi/smart_server/2019fastdepth/model_parameter_save"
    training_generator = DataGenerator(train_txt_path, to_fit=True,batch_size=train_batchsize)
    validation_generator = DataGenerator(val_txt_path, to_fit=True)
    net = FastDepth()
    #网络参数
    #net.model.summary()
    net.model.compile(optimizer=optimizers.SGD(lr=0.01, momentum=0.9, decay=0.0001, nesterov=False),
              loss='mse',
              metrics=['accuracy'])

    # checkpoint = ModelCheckpoint(model_save_filepath+'/'+datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),monitor='val_loss', save_weights_only=True,verbose=1,save_best_only=False, period=1)
    
    # if os.path.exists(filepath):
    #     model.load_weights(filepath)
    #     # 若成功加载前面保存的参数，输出下列信息
    #     print("checkpoint_loaded")
    
    print("compile finished")
    print("train_steps_per_epoch:",train_steps_per_epoch)
    print("validation_generator.batch_size:",validation_generator.batch_size)

    # test for DataGenerator
    # training_generator作为迭代器返回的turple结构为（迭代次数i，（train_x,train_y））
    # for i,(train_x,train_y) in enumerate(training_generator):
    #     print(train_x.shape)
    #     print(train_y.shape)
    #     print(i)
    #     break

    # train with fit_generator
    history=net.model.fit_generator(generator=training_generator,
                            validation_data=validation_generator,
                            epochs=100,
                            max_queue_size=64,
                            steps_per_epoch=train_steps_per_epoch,
                            validation_steps=val_steps_per_epoch,
                            callbacks=[ModelCheckpoint(os.path.join(model_save_filepath, 'model_{epoch:04d}_{val_loss:.6f}.hdf5'), 
                                                        monitor='val_loss', save_weights_only=True, 
                                                        verbose=1,save_best_only=False, period=1)],
                            verbose=1)
    accy=history.history['accuracy']
    lossy = history.history['loss']
    np_accy = np.array(accy).reshape((1,len(accy))) #reshape是为了能够跟别的信息组成矩阵一起存储
    np_lossy =np.array(lossy).reshape((1,len(lossy)))
    #np_out = np.concatenate([np_accy,np_lossy],axis=0)
    np.savetxt('history_acc.txt',np_accy)
    np.savetxt('history_loss.txt',np_lossy)       
    print("保存文件成功")

环境配置

之前配的有问题报错，hls上来就弄好了，生成了yml文件如下

fastDepth.yml

3.4KB

之后改了一点，156的服务器直接能装，普通电脑可能要看一下版本号手动装

数据集

tensorflow没有pytorch的dataset那么好用的dataloader，本来是做的pytorch的dataset+keras做训练的一个杂交，因为MAI比赛需要全tensorflow所以被否了。

用到的nyudepthv2数据集的构造跟一般数据集类似

train.txt和val.txt是我生成的，包含全部h5文件的绝对路径，用的creat_data_index.py仓库里也有，代码不贴了，需要注意os路径和h5文件的读取（需要先get对应的value），里面还包含了图像调整通道和resize降采样，一堆小的api，对数据预处理挺有用的，以后可以多看看

Datagenerator

注释写得挺详细的，直接看上面对应代码就行了

参考链接（含可运行代码）：

如何在Keras中使用数据生成器（data generators）的详细示例_orDream的博客-CSDN博客

网络主体

之前的笔记部分有细讲，这个主题难点就是keras.model基本框架用法和几个跨层连接，需要多看

Train


if __name__ == '__main__':
    # Generators
    # 72.76
    train_batchsize=64
    train_steps_per_epoch=10
    val_steps_per_epoch=20
    model_save_filepath="/data0/lijiaqi/smart_server/2019fastdepth/model_parameter_save"
    training_generator = DataGenerator(train_txt_path, to_fit=True,batch_size=train_batchsize)
    validation_generator = DataGenerator(val_txt_path, to_fit=True)
    net = FastDepth()
    #网络参数
    #net.model.summary()
    net.model.compile(optimizer=optimizers.SGD(lr=0.01, momentum=0.9, decay=0.0001, nesterov=False),
              loss='mse',
              metrics=['accuracy'])

    # checkpoint = ModelCheckpoint(model_save_filepath+'/'+datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),monitor='val_loss', save_weights_only=True,verbose=1,save_best_only=False, period=1)
    
    # if os.path.exists(filepath):
    #     model.load_weights(filepath)
    #     # 若成功加载前面保存的参数，输出下列信息
    #     print("checkpoint_loaded")
    
    print("compile finished")
    print("train_steps_per_epoch:",train_steps_per_epoch)
    print("validation_generator.batch_size:",validation_generator.batch_size)

    # test for DataGenerator
    # training_generator作为迭代器返回的turple结构为（迭代次数i，（train_x,train_y））
    # for i,(train_x,train_y) in enumerate(training_generator):
    #     print(train_x.shape)
    #     print(train_y.shape)
    #     print(i)
    #     break

    # train with fit_generator
    history=net.model.fit_generator(generator=training_generator,
                            validation_data=validation_generator,
                            epochs=100,
                            max_queue_size=64,
                            steps_per_epoch=train_steps_per_epoch,
                            validation_steps=val_steps_per_epoch,
                            callbacks=[ModelCheckpoint(os.path.join(model_save_filepath, 'model_{epoch:04d}_{val_loss:.6f}.hdf5'), 
                                                        monitor='val_loss', save_weights_only=True, 
                                                        verbose=1,save_best_only=False, period=1)],
                            verbose=1)
    accy=history.history['accuracy']
    lossy = history.history['loss']
    np_accy = np.array(accy).reshape((1,len(accy))) #reshape是为了能够跟别的信息组成矩阵一起存储
    np_lossy =np.array(lossy).reshape((1,len(lossy)))
    #np_out = np.concatenate([np_accy,np_lossy],axis=0)
    np.savetxt('history_acc.txt',np_accy)
    np.savetxt('history_loss.txt',np_lossy)       
    print("保存文件成功")

直接写注意事项吧

compile部分的optimizers可以自定参数等等（参考官方中文文档，这一部分写得凑合（链接））

注意fit_generator需要自己定义steps_per_epoch，这个参数可以在datagenerator的init里对齐训练的batchsize修改steps_per_epoch全局变量，验证集的validation_steps也得定义，跟steps_per_epoch一样就行

fit_generator利用callbacks机制保存每一个epoch的权重，改了一会，路径一般照着上面来就行

verbose为0不输出epoch结果，为1输出进度条

fit_generator返回一个history，照着上面保存accuracy和loss就行

其他注意事项

这个train还差导入已经训好的权重，这篇就不补了，之后自己去仓库找

一般调代码训练的流程是：

写网络主体结构，compile之后summary看一下参数对不对

用几个random的数组作为traindata试一下输入输出或者中间结果的size符不符合预期

写datagenerator，生成enumerate打印每次的生成数据对不对

套进去开训，中间参数checkpoints啥的细节优化

之后需要写一个pytorch一般框架（挖坑）