Home | MSigDB | Expression | Model | Output
This data collection includes five microarray gene expression datasets from human breast cancer patients that have been used in the analysis of Abraham et al, 2010 “Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context” BMC Bioinformatics 11:277.
The 5 datasets were downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (Accession numbers: GSE2034, GSE4922, GSE6532, GSE7390, GSE11121) prior to December 23rd 2009, and the following were removed to create one single input dataset for this study: (1) Data arising from non-Affymetrix microarray platforms (2) Data arising from patients that received adjuvant therapy (from GSE6532 and GSE4922). The data contains gene expression data from both lymph-node-negative and node-positive breast cancer patients. The dataset was collected from patients with both ER (Estrogen Receptor)-positive and ER-negative tumours.
Put these files in a subdirectory called ExprData
of the directory where you unpacked GeneSetStats.tar.gz.
These datasets were downloaded prior to December 23rd, 2009. The datasets were downloaded from NCBI GEO using the Bioconductor package GEOquery, and are in Bioconductor ExpressionSet format.
For GSE11121 and GSE4922, we derived the annotation from the GEO data. For GSE2034, GSE6532, and GSE7390 we used annotation datasets by Benjamin Haibe-Kains:
Note: the GEO datasets are somtimes updated and are their annotation format may change, potentially breaking our code. Here are the versions we used (link to current NCBI version in parentheses):
There are no restrictions on the use or distribution of this data (which was originally obtained from the NCBI GEO database). See: http://www.ncbi.nlm.nih.gov/About/disclaimer.html
Abraham, G; Kowalczyk, A; Loi, S; Haviv, I; Zobel, J. (2011) Five human breast cancer microarray gene expression datasets. Computer Science and Software Engineering, The University of Melbourne. doi:10.4225/02/4E9F695934393
Supplement to: Abraham, G; Kowalczyk, A; Loi, S; Haviv, I; Zobel, J. (2010) Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context. BMC Bioinformatics 11:277. doi: 10.1186/1471-2105-11-277
This page and content is licensed under a Creative Commons Attribution 3.0 Australia License