ZhuoYi: Difference between revisions

Revision as of 05:09, 19 November 2020

Introduction

Semantic segmentation using CNN requires large volume and high quality of the training dataset to achieve high performance. These requirements pose challenges to storage and compute hardware resources. However, lower quality dataset induces unwanted artifacts that may destroy the important image information. To solve these challenges, we need to better understand how the quality of training data affects the semantic segmentation algorithm performance. The goal of this project is to see how training data quality affects semantic segmentation network performance.

Methods

In this project, several experiments are conducted to study the connection between the performance of a semantic segmentation algorithm and 2D image/3D lidar point cloud quality. The attributes of 2D images, such as compression ratio and resolution are explored. For 3D lidar data, resolution and data channels are studied.

Compute Hardware

Machine: Macbook Pro

Processor: 2.9 GHz Quad-Core Intel Core i7

Memory: 16 GB 2133MHz LPDDR3

Software

Program: Matlab R2020b

Toolboxes: Deep Learning and Lidar toolboxes

Experiment Data Flow

The following diagram shows the data flow in all the experiments in this project. There are 4 stages. Stage one is data loading the label generation. Data preprocessing and partitioning are in stage two. Network training is in stage three. And then the network evaluation is in the last stage.

Dataset

The experiments use two different type of dataset, 2D image and 3D lidar point cloud data. The image data is collected on a highway from a front-facing camera mounted on the ego vehicle, and the lidar data is collected from an Ouster OS1 lidar sensor on the same vehicle. The camera and lidar data are approximately time-synced and calibrated to estimate their intrinsic and extrinsic parameters.

To do semantic segmentation, pixel label dataset is required. In this project, both 2D and 3D pixel label datasets are generated based on bounding box label data. This induces unwanted artifacts since all the pixel labels are rectangular. In addition, only 2 classes (car/background) are used in the 2D image experiments, and 3 classes (car/truck/background) are used in the 3D lidar experiments. Note also that background pixels dominate in most of the images.

Since Matlab only supports single core CPU on Mac for training, to reduce compute time, the max number of epochs and iterations is limited to 10 and 1800 respectively. Also, total 600 images for both image and lidar dataset are selected for the same reason. The last experiment on lidar data channel uses 800 images to improve metrics. These images are randomly shuffled before partitioning into training (60%), validation (20%) and testing (20%) data. Consequently, the experiment results in this project focus on the relative performance metrics.

Networks and Metrics

For 2D image experiments, a Deeplab v3+ network with weights initialized from a pre-trained Resnet-18 network is built. For 3D lidar experiments, a SqueezeSegV2 semantic segmentation network on 3-D organized lidar point cloud data is trained. All experiments use compute time and class accuracy and IoU to evaluate how the data quality affects network training and segmentation performance.

Results

Conclusions

Appendix

You can write math equations as follows: $y=x+5$

You can include images as follows (you will need to upload the image first using the toolbox on the left bar, using the "Upload file" link).

@@ Line 34: / Line 34: @@
 To do semantic segmentation, pixel label dataset is required. In this project, both 2D and 3D pixel label datasets are generated based on bounding box label data. This induces unwanted artifacts since all the pixel labels are rectangular. In addition, only 2 classes (car/background) are used in the 2D image experiments, and 3 classes (car/truck/background) are used in the 3D lidar experiments. Note also that background pixels dominate in most of the images.
+[[File:Bbox.png]]
+Since Matlab only supports single core CPU on Mac for training, to reduce compute time, the max number of epochs and iterations is limited to 10 and 1800 respectively. Also, total 600 images for both image and lidar dataset are selected for the same reason. The last experiment on lidar data channel uses 800 images to improve metrics. These images are randomly shuffled before partitioning into training (60%), validation (20%) and testing (20%) data. Consequently, the experiment results in this project focus on the relative performance metrics.
+===Networks and Metrics===
+For 2D image experiments, a Deeplab v3+ network with weights initialized from a pre-trained Resnet-18 network is built. For 3D lidar experiments, a SqueezeSegV2 semantic segmentation network on 3-D organized lidar point cloud data is trained. All experiments use compute time and class accuracy and IoU to evaluate how the data quality affects network training and segmentation performance.
 == Results ==

ZhuoYi: Difference between revisions

Revision as of 05:09, 19 November 2020

Contents

Introduction

Methods

Compute Hardware

Software

Experiment Data Flow

Dataset

Networks and Metrics

Results

Conclusions

Appendix

Navigation menu

ZhuoYi: Difference between revisions

Revision as of 05:09, 19 November 2020

Introduction

Methods

Compute Hardware

Software

Experiment Data Flow

Dataset

Networks and Metrics

Results

Conclusions

Appendix

Navigation menu

Search