This book shows how to decompose high-dimensional microarrays into small subspaces (Small
Matryoshkas SMs) statistically analyze them and perform cancer gene diagnosis. The
information is useful for genetic experts anyone who analyzes genetic data and students to
use as practical textbooks. Discriminant analysis is the best approach for microarray
consisting of normal and cancer classes. Microarrays are linearly separable data (LSD Fact 3).
However because most linear discriminant function (LDF) cannot discriminate LSD theoretically
and error rates are high no one had discovered Fact 3 until now. Hard-margin SVM (H-SVM) and
Revised IP-OLDF (RIP) can find Fact3 easily. LSD has the Matryoshka structure and is easily
decomposed into many SMs (Fact 4). Because all SMs are small samples and LSD statistical
methods analyze SMs easily. However useful results cannot be obtained. On the other hand
H-SVM and RIP can discriminate two classes in SM entirely. RatioSV is the ratio of SV distance
and discriminant range. The maximum RatioSVs of six microarrays is over 11.67%. This fact shows
that SV separates two classes by window width (11.67%). Such easy discrimination has been
unresolved since 1970. The reason is revealed by facts presented here so this book can be read
and enjoyed like a mystery novel. Many studies point out that it is difficult to separate
signal and noise in a high-dimensional gene space. However the definition of the signal is not
clear. Convincing evidence is presented that LSD is a signal. Statistical analysis of the genes
contained in the SM cannot provide useful information but it shows that the discriminant score
(DS) discriminated by RIP or H-SVM is easily LSD. For example the Alon microarray has 2 000
genes which can be divided into 66 SMs. If 66 DSs are used as variables the result is a
66-dimensional data. These signal data can be analyzed to find malignancy indicators by
principal component analysis and cluster analysis.