Other available software:

GWIS -

Multivariate GWAS analysis

Quarc -

Quality Analysis and Read Control

SparSNP

Lasso-penalized linear models for SNP data

SparSNP fits lasso-penalized linear models to SNP data. Its main features are:

  • it can fit squared hinge loss for classification (case/control) and linear regression (quantitative phenotypes)
  • takes PLINK BED/FAM files as input
  • the amount of memory is bounded - can work with large datasets using very little memory (typically <100MB, more for better performance)
  • fits a model over a grid of penalties, and writes the estimated coefficients to disk
  • it can also do cross-validation, using the estimated coefficients to predict outputs for other datasets
  • efficient - it uses warm-restarts plus an active-set approach, the model fitting part of 3-fold cross-validation for a dataset of 2000 samples by 300,000 SNP dataset takes ~5min, and about 25min for ~6800 samples / ~516,000 SNPs

Contact

Gad Abraham, gad.abraham@unimelb.edu.au

Citation

G. Abraham, A. Kowalczyk, J. Zobel, and M. Inouye, “Sparse Linear Models to Explain Phenotypic Variance and Predict Complex Disease”, submitted, 2011

License

Copyright (C), National ICT Australia (2011), All Rights Reserved.

Redistribution and use in binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Requirements

Linux or OSX, both 64-bit (x86-64)

To run R code: package ggplot2 (≤0.8.9)

Documentation

See the file workflow.pdf in the tarball

Release Notes

  • v0.89 2012-03-18
    • Added SNP tabulation in topsnps.csv.
    • Small fixes for when nfolds=1.
    • Fixed inconsistency in the nonzero files, now subtract one from numnz to exclude the intercept.
  • v0.88 2011-11-08
    • Fixed bugs in reading missing phenotypes.
  • v0.87 2011-11-04
    • Removed dependency on glmnet.
    • Added plots of explained genetic variance.
    • Save all results to RData file.
  • v0.84 2011-10-19
    • Overhaul of helper scripts making discovery and validation easier, see 00README.txt.
    • Sparse outputs, only non-zero model weights are written.
    • Note: this version is not compatible with outputs from previous versions of SparSNP, due to the use of a sparse format for the model. The easiest solution is to run SparSNP again.
  • v0.74 2011-10-16
    • Added missing utility split
    • Cleaned up coordinate descent code
  • v0.73 2011-10-08
    • Cleaned up documentation
    • Removed unnecessary dependencies in R code
  • v0.72 2011-10-05
    • Faster unpacking of PLINK data using precomputed mappings
    • Removed dependencies on Hmisc and ROCR packages
  • v0.71 2011-08-23
    • Bugfix in tarball: symbolic links instead of files
    • Added mention of no missing phenotypes allowed to docs
    • Univariable method terminates on non-convergence
  • v0.7 2011-07-29
    • fixed bug in reading 0/1 phenotypes from FAM files
  • v0.6 2011-06-13
    • speed-up prediction phase by ignoring non-zero SNPs
  • v0.5 2011-06-10
    • removed more refs to cache.h, compiles cleanly on Linux
  • v0.4 2011-06-10
    • fixes missing cache.h bug in v0.3
    • adds in-memory cache to store active set resulting in better performance

Download

[v0.89 static binary for 64-bit Linux] [v0.89 binary for Mac OSX Intel x86-64]

Older Versions

[v0.88 static binary for 64-bit Linux] [v0.88 binary for Mac OSX Intel x86-64] [v0.87 static binary for 64-bit Linux] [v0.87 binary for Mac OSX Intel x86-64] [v0.84 static binary for 64-bit Linux] [v0.84 binary for Mac OSX Intel x86-64] [v0.74 static binary for 64-bit Linux] [v0.74 binary for Mac OSX Intel x86-64] [v0.73 static binary for 64-bit Linux] [v0.73 binary for Mac OSX Intel x86-64] [v0.72 static binary for 64-bit Linux] [v0.72 binary for Mac OSX Intel x86-64] [v0.71 static binary for 64-bit Linux] [v0.71 binary for Mac OSX Intel x86-64] [v0.7 static binary for 64-bit Linux] [v0.7 binary for Mac OSX Intel x86-64]
Copyright © 2011 National ICT Australia Ltd