UC Irvine
ML Repository
Theme

Predict Students' Dropout and Academic Success

Download(520.8 KB)

About

A dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters. The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes. Preprocessing description: We performed a rigorous data preprocessing to handle data from anomalies, unexplainable outliers, and missing values.
Subject Area
Social Science
Instances
4,424
Features
36
Data Types
Tabular
Tasks
Classification
Feature Types
Continuous, Categorical, Integer

Features

NameRoleTypeUnitsMissing ValuesDescription

Introductory Paper

Early prediction of student's performance in higher education: a case study
Mónica V. Martins, Daniel Tolledo, Jorge Machado, Luís M. T. Baptista, and Valentim Realinho. 2021.
Trends and Applications in Information Systems and Technologies

Additional Metadata

Authors
Valentim Realinho
Mónica Vieira Martins
Jorge Machado
Luís Baptista
Year Created
2021
License
CC BY 4.0