An Overview of Hyperspectral Colon Tissue Cell Classification

From Psych 221 Image Systems Engineering
Jump to navigation Jump to search

Project Title

An Overview of Hyperspectral Colon Tissue Cell Classification

Note that this is only the tutorial in addition to "Algorithms for calculating the Oxygen concentration in pig organs using Hyperspectral Images"

Introduction

Why Classification?

The colon is the upper part of the large intestine tube while the rectum is the lower part of this tube. Practically, colon or rectum cancer is characterized as separate cancer instances. Colorectal or bowel cancer is a composite name for colon and rectum cancer. It is the uncontrolled growth of tissue cells in either the colon or rectum which causes the colorectal cancer. It is the third most commonly diagnosed cancer after lung and breast cancer.

Yet 80% of colorectal cancer cases can be treated if caught at an early stage. Thus, it is important to discriminate between normal and malignant tissue cells of the human colon. After that, we can deal with malignant tissue cells.

Hyperspectral sensors

Hyperspectral sensor data

High spectral resolution characteristics of hyperspectral sensors preserve important aspects of the spectrum. Hyperspectral sensors commonly utilize the simple fact that any body with temperature over absolute zero either emit or reflect the absorbed energy in certain frequency bands. This eventually makes segmentation of different materials possible.

The image data provided by hyperspectral sensors is visualized as a 3D cube, where the face is a function of spatial coordinates f(x,y) and depth is a function of wavelength d(λ) . The image data can also be seen as a stack of multiple 2D images. Each spatial point on the face is characterized by its own spectrum. Each image represents a range of the electromagnetic spectrum and is also known as a spectral band. These 'images' are then combined and form a three-dimensional hyperspectral data cube for processing and analysis.

Pattern recognition (Tissue classification)

The detection of malignant cells can be viewed as a typical example of a pattern recognition problem. Pattern recognition in images consists of three independent steps, which can be applied to the tissue classification problem as follows:

1. Image segmentation: Objects(tissues cells) contained in the image scene are separated from the background. This is the separation of constituent parts of tissue cells.
2. Feature extraction: The characteristics of each object are quantified. Also these features should contain enough discriminant information to distinguish a normal tissue from a malignant tissue.
3. Classification: Normal and malignant tissue cells should be assigned unique target class.


We will extend our research based on the above three categories.

Methods

Dimensionality Reduction

Dimensionality reduction sample

Before the formal process of segmentation of hyperspectral imagery, an intermediate step of dimensionality reduction is often involved. The goal is to eliminate the redundancy in the data while simultaneously preserving the discriminant features for segmentation, detection or classification algorithms. Dimensionality reduction can solve the problem of high computational complexity which huge size of hyperspectral image data normally carries. Normal way of dimensionality reduction in data mining is to use Singular Value Decomposition (SVD) . But here we discuss PCA and ICA instead of SVD.

Two categories of methods for reduction:
(1) Linear methods: principal component analysis, factor analysis and independent component analysis.
(2) Non-linear methods: curvilinear component analysis, curvilinear distance analysis and multi-dimensional scaling.

Principal component analysis (PCA)

Principal component analysis (PCA) is a statistical multivariate data analysis tool which attempts to find the natural coordinate axes for the multidimensional dataset. It is the representation of the higher-dimensional data into lower- dimensional orthogonal axes such that it is highly decorrelated. This representation can be considered as the transformation of the original data into a new vector space where the basis vectors are actually a linear combination of the original data vectors. Also PCA can be briefly described as the projection of the multivariate data on the orthogonal axes which are in fact the eigenvectors of the covariance matrix of the original data.

Independent component analysis (ICA)

Independent component analysis (ICA) extends the concept of traditional multivariate data analysis techniques to determine the hidden components in the data. Unlike PCA, ICA does not merely attempt to find a decorrelated lower-dimensional representation for the data but also attempts to discover statistically independent components.

To implement ICA, we also need to satisfy some assumptions:
(1) The components should be mutually independent
(2) The number of data dimensions must be equal to or greater than the hidden independent components
(3) The independent components should have a nongaussian distribution

ICA can also be seen as an enhancement to PCA and factor analysis. There are also ways like FastICA and the FlexICA to deal with data.

Segmentation

At a microscopic level, human colon tissue cells can be characterised as having four constituent parts: nuclei, cytoplasm, lamina propria, and lumen. According to the NCI’s (National Cancer Institute) dictionary of cancer related medical terms, these constituent parts are defined as:
(i) Nuclei: the core central part of a cell, containing DNA, which controls its growth
(ii) Cytoplasm: the fluid inside a cell but outside the cell's nucleus. Most chemical reactions in a cell take place in the cytoplasm
(iii) Lamina propria: a type of connective tissue found under the thin layer of tissues covering a mucous membrane
(iv) Lumen: the cavity or channel within a tube or tubular organ such as a blood vessel or the intestine

We can label them before classification. There are two different methods of segmenting the hyperspectral image data:
(1) spatial analysis
(2) spectral analysis

Spatial analysis: Wavelet based segmentation

Wavelets are special mathematical functions used to represent a signal/image matched to its resolution and scale. Unlike a conventional Fourier transform, which utilizes sines and cosines of varying amplitude and frequency as its basis functions, the wavelet transform makes use of these scalable wavelet functions. A variety of such functions exists and a well suitable wavelet can be selected specific to an application depending upon the signal/image characteristics to be represented.

The most important applications of wavelet theory is the possibility of multiresolution analysis which allow us to exploit the signal/image characteristics, matched to a particular scale, which might go undetected in other analysis techniques analysis. This method’s input data is the output of PCA dimensionality reduction.

However, from the experiments, conventional wavelet texture analysis method loses the necessary discriminant information. Therefore, this method is suitable only when the data projected in the first principal component direction contains 80% or more of the total variance. But it is not very uncommon in hyperspectral colon tissue imagery to have 80% or even more variance concentrated in the data projected in the first principal component direction.

Spectral analysis: ICA based segmentation

An alternative for the segmentation of hyperspectral data is by doing a spectral analysis.This approach is in correspondence with the spectral signature of each point on the face of the data cube.

We should transform the image data cube into lower dimensions, which is the number of independent components or regions in the image scene. Since a colon cell image consists of four different types of regions, the new dimension of the image cube is 4. We can also use a K-means clustering algorithm on the extracted components to cluster the results.

Segmentation with preprocessing:
One of the assumptions to perform ICA on higher-dimensional data is the nongaussianity measure. We should also consider high-emphasis filtering in the preprocessing step. Although we know, in image enhancement, high-emphasis filtering is to restore the edges from a blurred image, here we can utilize it to change the distribution to supergaussian to make better segmentation.

Classification

The final phase of our hyperspectral colon tissue cell classification algorithm is the discrimination between normal and malignant sections in a tissue. The goal is to discriminate between normal and malignant sections in a tissue. We can utilize the information from the segmentation labels in such a way that a reasonable amount of accuracy is achieved for the discrimination. We can use machine learning method by training a classifier with some known examples and simulating the trained classifier on unknown test samples to evaluate the performance of discrimination.

Support Vector Machine:
Given n training pairs (<xi>, yi) where is an input feature vector, and yi belongs to {-1,1} is the target label; the task of the discriminant function is to learn the patterns in the training pairs in such a way that it can predict a reliable yi for a given unknown xi. We can set a SVM relatively optimal decision boundary.

Statistical features:
(1) Measure of central tendency (location):
a) geometric mean
b) harmonic mean
c) arithmetic mean
d) median and trimmed mean
(2) Measure of dispersion (spread):
a) standard deviation
b) variance
c) coefficient of variation
d) second moment
e) mean absolute deviation

Morphological features:
The estimation of shape and location of regions in an image can be assisted by the use of morphological features. Out of many available attributes, we opted for nine diverse attributes which are:
(1) area: the number of on pixels in a region
(2) eccentricity: the eccentricity of the ellipse that has the same second-moment as the region. The eccentricity is the ratio of the distance between the foci of the ellipse and its major axis length
(3) equivalent diameter: diameter of a circle with the same area as the region
(4) Euler number: equal to the number of objects in the region minus the number of holes in those objects
(5) extent: the proportion of the pixels in the bounding box that are also in the region
(6) orientation: the angle (in degrees) between the x-axis and the major axis of the ellipse that has the same second-moment as the region
(7) solidity: the proportion of the pixels in the convex hull that are also in the region
(8) major axis length: the length (in pixels) of the major axis of the ellipse that has the same second-moment as the region
(9) minor axis length: the length (in pixels) of the minor axis of the ellipse that has the same second-moment as the region

Therefore, by an optimal way, we can classify the normal and malignant colon tissue cells.

Results & Conclusions

I learned how to classify hyperspectral colon tissue cell step by step, from dimensionality reduction to image segmentation, features extraction and classification.

As we have stated previously, the wavelet technique is based on spatial pattern recognition, while the ICA method implements spectral pattern recognition. The wavelet based technique is limited as it will produce fine results only when the derived variable covers more than 80% of the data variance. On the other hand, an ICA based approach utilizes all of the extracted independent components and it should be relatively more consistent and reliable than wavelet based technique.

If I keep working on this project, I will utilize the data to do all these step myself to see the results and to check which methods is better in practice.

Reference

1. Kashif M. Rajpoot, Nasir M. Rajpoot, Martin J. Turner, Hyperspectral Colon Tissue Cell Classification.
2. Slides from Jure Leskovec's CS 246 Mining Massive Data Sets.
3. Wikipedia: Hyperspectral imaging, http://en.wikipedia.org/wiki/Hyperspectral_imaging.
4. Robila S., Varshney P., “Target Detection in Hyperspectral Images Based on Independent Component Analysis,” SPIE AeroSense, Orlando, FL, Apr. 2002.
5. Gonzalez R., Woods R., Digital Image Processing, Prentice Hall, 2001.


Appendix

This is the additional work to Ye Tian, Xiaoxiong Lu's Algorithms for calculating the Oxygen concentration in pig organs using Hyperspectral Images (Ye Tian)