Author: Daan Leiva
Institution: UCLA
This project explores the use of Support Vector Machines (SVMs) to predict genotypes based on allele intensity data collected via OpenArray™ SNP genotyping technology. The goal is to improve the accuracy of genotype predictions—especially in assays with low signal separation—thereby reducing costs and improving confidence in results.
scikit-learn
), R, SQLite.After hyperparameter tuning, the best-performing model used an RBF kernel with C = 0.3 and γ = 300.
The RBF SVM outperformed the linear model in both accuracy and prediction consistency. While homozygous genotype predictions were reliable, predicting heterozygous samples remains a challenge. Future improvements could involve additional feature transformation or deeper models.
Special thanks to Professor Wei Wang for her mentorship and to Chelsea Ju for her support and collaboration throughout the project.