Downloadable Content
Download PDF
Masters Thesis
Comparison of statistical learning methods for prediction in Genome-Wide Association Studies
Studies from Genome Wide Association Studies (GWAS) show that the detected Single Nucleotide Polymorphisms (SNPs) only explain a small fraction of heritability and although identifying those missing SNPs is important, it is sometimes more important to be able to predict whether a person will develop a certain disease or not. In this thesis, a comparison analysis of three different variable selection methods (LASSO, x 2 Test for Independence, Random Forest) and seven classification methods (Logistic Classification, Linear Dimensional Analysis, Random Forest, Support Vector Machines with Linear, Radial, and Polynomial kernels, A-Nearest Neighbor) will be given on simulated GWAS datasets under different disease models. After a discussion of the methods, the best model for each scenario will be chosen based on prediction error rate and area under the Receiver Operating Characteristic (AUC) curve.
Relationships
- In Collection:
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
AS362017MATHH37.pdf | 2020-06-13 | Public | Download |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.