Initial points selection for clustering gene expression data: A spatial contiguity analysis-based approach

Hui Yi; Cuimei Bo; Xiaofeng Song; Yuhao Yuan

doi:10.3233/BME-141199

Initial points selection for clustering gene expression data: A spatial contiguity analysis-based approach

Hui Yi, Cuimei Bo, Xiaofeng Song, Yuhao Yuan

COLLEGE OF ELECTRICAL ENGINEERING AND CONTROL SCIENCE

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Clustering is considered one of the most powerful tools for analyzing gene expression data. Although clustering has been extensively studied, a problem remains significant: iterative techniques like k-means clustering are especially sensitive to initial starting conditions. An unreasonable selection of initial points leads to problems including local minima and massive computation. In this paper, a spatial contiguity analysis-based approach is proposed, aiming to solve this problem. It employs principal component analysis (PCA) to identify data points that are likely extracted from different clusters as initial points. This helps to avoid local minima, and accelerates the computation. The effectiveness of the proposed approach was validated on several benchmark datasets.

Original language	English
Pages (from-to)	3709-3717
Number of pages	9
Journal	Bio-Medical Materials and Engineering
Volume	24
Issue number	6
DOIs	https://doi.org/10.3233/BME-141199
State	Published - 2014

Keywords

Gene expression data
Initial points
K-means
Spatial contiguity analysis

Access to Document

10.3233/BME-141199

Cite this

@article{067d6745b293457da3651954ec13f750,

title = "Initial points selection for clustering gene expression data: A spatial contiguity analysis-based approach",

abstract = "Clustering is considered one of the most powerful tools for analyzing gene expression data. Although clustering has been extensively studied, a problem remains significant: iterative techniques like k-means clustering are especially sensitive to initial starting conditions. An unreasonable selection of initial points leads to problems including local minima and massive computation. In this paper, a spatial contiguity analysis-based approach is proposed, aiming to solve this problem. It employs principal component analysis (PCA) to identify data points that are likely extracted from different clusters as initial points. This helps to avoid local minima, and accelerates the computation. The effectiveness of the proposed approach was validated on several benchmark datasets.",

keywords = "Gene expression data, Initial points, K-means, Spatial contiguity analysis",

author = "Hui Yi and Cuimei Bo and Xiaofeng Song and Yuhao Yuan",

note = "Publisher Copyright: {\textcopyright} 2014 - IOS Press and the authors.",

year = "2014",

doi = "10.3233/BME-141199",

language = "英语",

volume = "24",

pages = "3709--3717",

journal = "Bio-Medical Materials and Engineering",

issn = "0959-2989",

publisher = "SAGE Publications Ltd",

number = "6",

}

TY - JOUR

T1 - Initial points selection for clustering gene expression data

T2 - A spatial contiguity analysis-based approach

AU - Yi, Hui

AU - Bo, Cuimei

AU - Song, Xiaofeng

AU - Yuan, Yuhao

PY - 2014

Y1 - 2014

N2 - Clustering is considered one of the most powerful tools for analyzing gene expression data. Although clustering has been extensively studied, a problem remains significant: iterative techniques like k-means clustering are especially sensitive to initial starting conditions. An unreasonable selection of initial points leads to problems including local minima and massive computation. In this paper, a spatial contiguity analysis-based approach is proposed, aiming to solve this problem. It employs principal component analysis (PCA) to identify data points that are likely extracted from different clusters as initial points. This helps to avoid local minima, and accelerates the computation. The effectiveness of the proposed approach was validated on several benchmark datasets.

AB - Clustering is considered one of the most powerful tools for analyzing gene expression data. Although clustering has been extensively studied, a problem remains significant: iterative techniques like k-means clustering are especially sensitive to initial starting conditions. An unreasonable selection of initial points leads to problems including local minima and massive computation. In this paper, a spatial contiguity analysis-based approach is proposed, aiming to solve this problem. It employs principal component analysis (PCA) to identify data points that are likely extracted from different clusters as initial points. This helps to avoid local minima, and accelerates the computation. The effectiveness of the proposed approach was validated on several benchmark datasets.

KW - Gene expression data

KW - Initial points

KW - K-means

KW - Spatial contiguity analysis

UR - http://www.scopus.com/inward/record.url?scp=84907280025&partnerID=8YFLogxK

U2 - 10.3233/BME-141199

DO - 10.3233/BME-141199

M3 - 文章

C2 - 25227086

AN - SCOPUS:84907280025

SN - 0959-2989

VL - 24

SP - 3709

EP - 3717

JO - Bio-Medical Materials and Engineering

JF - Bio-Medical Materials and Engineering

IS - 6

ER -

Initial points selection for clustering gene expression data: A spatial contiguity analysis-based approach

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this