An Overview of Hyperspectral Colon Tissue Cell Classification

From Psych 221 Image Systems Engineering
Revision as of 02:24, 22 March 2012 by imported>Psych2012 (Classification)
Jump to navigation Jump to search

Project Title

An Overview of Hyperspectral Colon Tissue Cell Classification

Introduction

Why Classification?

The colon is the upper part of the large intestine tube while the rectum is the lower part of this tube. Practically, colon or rectum cancer is characterized as separate cancer instances. Colorectal or bowel cancer is a composite name for colon and rectum cancer. It is the uncontrolled growth of tissue cells in either the colon or rectum which causes the colorectal cancer. It is the third most commonly diagnosed cancer after lung and breast cancer.

Yet 80% of colorectal cancer cases can be treated if caught at an early stage. Thus, it is important to discriminate between normal and malignant tissue cells of the human colon. After that, we can deal with malignant tissue cells.

Hyperspectral sensors

Hyperspectral sensor data

High spectral resolution characteristics of hyperspectral sensors preserve important aspects of the spectrum. Hyperspectral sensors commonly utilize the simple fact that any body with temperature over absolute zero either emit or reflect the absorbed energy in certain frequency bands. This eventually makes segmentation of different materials possible.

The image data provided by hyperspectral sensors is visualized as a 3D cube, where the face is a function of spatial coordinates f(x,y) and depth is a function of wavelength d(λ) . The image data can also be seen as a stack of multiple 2D images. Each spatial point on the face is characterized by its own spectrum. Each image represents a range of the electromagnetic spectrum and is also known as a spectral band. These 'images' are then combined and form a three-dimensional hyperspectral data cube for processing and analysis.

Pattern recognition (Tissue classification)

The detection of malignant cells can be viewed as a typical example of a pattern recognition problem. Pattern recognition in images consists of three independent steps, which can be applied to the tissue classification problem as follows:

1. Image segmentation: Objects(tissues cells) contained in the image scene are separated from the background. This is the separation of constituent parts of tissue cells.
2. Feature extraction: The characteristics of each object are quantified. Also these features should contain enough discriminant information to distinguish a normal tissue from a malignant tissue.
3. Classification: Normal and malignant tissue cells should be assigned unique target class.


We will extend our research based on the above three categories.

Methods

Dimensionality Reduction

Dimensionality reduction sample

Before the formal process of segmentation of hyperspectral imagery, an intermediate step of dimensionality reduction is often involved. The goal is to eliminate the redundancy in the data while simultaneously preserving the discriminant features for segmentation, detection or classification algorithms. Dimensionality reduction can solve the problem of high computational complexity which huge size of hyperspectral image data normally carries. Normal way of dimensionality reduction in data mining is to use Singular Value Decomposition (SVD) . But here we discuss PCA and ICA instead of SVD.

Two categories of methods for reduction:
(1) Linear methods: principal component analysis, factor analysis and independent component analysis.
(2) Non-linear methods: curvilinear component analysis, curvilinear distance analysis and multi-dimensional scaling.

Principal component analysis (PCA)

Principal component analysis (PCA) is a statistical multivariate data analysis tool which attempts to find the natural coordinate axes for the multidimensional dataset. It is the representation of the higher-dimensional data into lower- dimensional orthogonal axes such that it is highly decorrelated. This representation can be considered as the transformation of the original data into a new vector space where the basis vectors are actually a linear combination of the original data vectors. Also PCA can be briefly described as the projection of the multivariate data on the orthogonal axes which are in fact the eigenvectors of the covariance matrix of the original data.

Independent component analysis (ICA)

Independent component analysis (ICA) extends the concept of traditional multivariate data analysis techniques to determine the hidden components in the data. Unlike PCA, ICA does not merely attempt to find a decorrelated lower-dimensional representation for the data but also attempts to discover statistically independent components.

To implement ICA, we also need to satisfy some assumptions:
(1) The components should be mutually independent
(2) The number of data dimensions must be equal to or greater than the hidden independent components
(3) The independent components should have a nongaussian distribution

ICA can also be seen as an enhancement to PCA and factor analysis. There are also ways like FastICA and the FlexICA to deal with data.

Segmentation

At a microscopic level, human colon tissue cells can be characterised as having four constituent parts: nuclei, cytoplasm, lamina propria, and lumen. According to the NCI’s (National Cancer Institute) dictionary of cancer related medical terms, these constituent parts are defined as:
(i) Nuclei: the core central part of a cell, containing DNA, which controls its growth
(ii) Cytoplasm: the fluid inside a cell but outside the cell's nucleus. Most chemical reactions in a cell take place in the cytoplasm
(iii) Lamina propria: a type of connective tissue found under the thin layer of tissues covering a mucous membrane
(iv) Lumen: the cavity or channel within a tube or tubular organ such as a blood vessel or the intestine

We can label them before classification. There are two different methods of segmenting the hyperspectral image data:
(1) spatial analysis
(2) spectral analysis

Spatial analysis: Wavelet based segmentation

Wavelets are special mathematical functions used to represent a signal/image matched to its resolution and scale. Unlike a conventional Fourier transform, which utilizes sines and cosines of varying amplitude and frequency as its basis functions, the wavelet transform makes use of these scalable wavelet functions. A variety of such functions exists and a well suitable wavelet can be selected specific to an application depending upon the signal/image characteristics to be represented.

The most important applications of wavelet theory is the possibility of multiresolution analysis which allow us to exploit the signal/image characteristics, matched to a particular scale, which might go undetected in other analysis techniques analysis. This method’s input data is the output of PCA dimensionality reduction.

However, from the experiments, conventional wavelet texture analysis method loses the necessary discriminant information. Therefore, this method is suitable only when the data projected in the first principal component direction contains 80% or more of the total variance. But it is not very uncommon in hyperspectral colon tissue imagery to have 80% or even more variance concentrated in the data projected in the first principal component direction.

Spectral analysis: ICA based segmentation

An alternative for the segmentation of hyperspectral data is by doing a spectral analysis.This approach is in correspondence with the spectral signature of each point on the face of the data cube.

We should transform the image data cube into lower dimensions, which is the number of independent components or regions in the image scene. Since a colon cell image consists of four different types of regions, the new dimension of the image cube is 4. We can also use a K-means clustering algorithm on the extracted components to cluster the results.

Segmentation with preprocessing:
One of the assumptions to perform ICA on higher-dimensional data is the nongaussianity measure. We should also consider high-emphasis filtering in the preprocessing step. Although we know, in image enhancement, high-emphasis filtering is to restore the edges from a blurred image, here we can utilize it to change the distribution to supergaussian to make better segmentation.

Classification

The goal is to discriminate between normal and malignant sections in a tissue. We can utilize the information from the segmentation labels in such a way that a reasonable amount of accuracy is achieved for the discrimination. We can use machine learning method by training a classifier with some known examples and simulating the trained classifier on unknown test samples to evaluate the performance of discrimination.


Axy(ri)=log[Rxy(ri)o/Rxy(ri)] (1)



Ckj=SkriRjri+ekj (2)


Skri=(PkritPkri)Pkrit (3)

Results & Conclusions

Describe what you learned. What worked? What didn't? Why? What would you do if you kept working on the project?


Reference