Masters Thesis

Comparison of statistical learning methods for prediction in Genome-Wide Association Studies

Studies from Genome Wide Association Studies (GWAS) show that the detected Single Nucleotide Polymorphisms (SNPs) only explain a small fraction of heritability and although identifying those missing SNPs is important, it is sometimes more important to be able to predict whether a person will develop a certain disease or not. In this thesis, a comparison analysis of three different variable selection methods (LASSO, x 2 Test for Independence, Random Forest) and seven classification methods (Logistic Classification, Linear Dimensional Analysis, Random Forest, Support Vector Machines with Linear, Radial, and Polynomial kernels, A-Nearest Neighbor) will be given on simulated GWAS datasets under different disease models. After a discussion of the methods, the best model for each scenario will be chosen based on prediction error rate and area under the Receiver Operating Characteristic (AUC) curve.

Relationships

In Collection:

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.