Masters Thesis

A multifaceted data mining approach to analyzing college students' persistence and graduation

This study describes a host of generalizable and data mining-based approaches to identify factors that contribute towards student persistence and graduation, using data from an academic program named Metro College Success Program at San Francisco State University, California. These approaches include (1) a visual analysis to identify bivariate relationships and to understand the flow of students in an educational institute, (2) an ensemble feature selection method to recognize factors that have a significant impact on a student’s persistence and graduation, (3) classification and prediction algorithms to predict whether a student will persist in a given semester and ultimately graduate, and (4) a variety of association patterns to help education practitioners gain further insights into factors that affect persistence and graduation. Our analysis reveals the following main insights: (1) most students who dropout do so in the fourth and seventh terms, (2) the educational level of a student’s mother, the ELM (Entry Level Mathematics) score and race are identified as the most influential factors in predicting a student’s third-term persistence, (3) Naive Bayesian is the most suitable model for predicting graduation while AdaBoost and SVM models are most suited for predicting persistence (4) a student’s low ELM score and Pell eligibility (an indicator of socioeconomic status) together predict a lower rate of graduation. By collaborating with practitioners and focusing on generating humaninterpretable results, the study helped identify bottlenecks to a student’s path towards graduation.

Relationships

In Collection:

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.