Predict Students' Dropout and Academic Success
About
A dataset created from a higher education institution (acquired from several disjoint databases) related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies.
The dataset includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students' academic performance at the end of the first and second semesters.
The data is used to build classification models to predict students' dropout and academic sucess. The problem is formulated as a three category classification task, in which there is a strong imbalance towards one of the classes.
Preprocessing description:
We performed a rigorous data preprocessing to handle data from anomalies, unexplainable outliers, and missing values.
Subject Area
Social Science
Instances
4,424
Features
36
Data Types
Tabular
Tasks
Classification
Feature Types
Continuous, Categorical, Integer
Features
| Name | Role | Type | Units | Missing Values | Description |
|---|---|---|---|---|---|
| Marital Status | Feature | Integer | - | No | |
| Application mode | Feature | Integer | - | No | |
| Application order | Feature | Integer | - | No | |
| Course | Feature | Integer | - | No | |
| Daytime/evening attendance | Feature | Integer | - | No | |
| Previous qualification | Feature | Integer | - | No | |
| Previous qualification (grade) | Feature | Continuous | - | No | |
| Nacionality | Feature | Integer | - | No | |
| Mother's qualification | Feature | Integer | - | No | |
| Father's qualification | Feature | Integer | - | No | |
| Mother's occupation | Feature | Integer | - | No | |
| Father's occupation | Feature | Integer | - | No | |
| Admission grade | Feature | Continuous | - | No | |
| Displaced | Feature | Integer | - | No | |
| Educational special needs | Feature | Integer | - | No | |
| Debtor | Feature | Integer | - | No | |
| Tuition fees up to date | Feature | Integer | - | No | |
| Gender | Feature | Integer | - | No | |
| Scholarship holder | Feature | Integer | - | No | |
| Age at enrollment | Feature | Integer | - | No | |
| International | Feature | Integer | - | No | |
| Curricular units 1st sem (credited) | Feature | Integer | - | No | |
| Curricular units 1st sem (enrolled) | Feature | Integer | - | No | |
| Curricular units 1st sem (evaluations) | Feature | Integer | - | No | |
| Curricular units 1st sem (approved) | Feature | Integer | - | No | |
| Curricular units 1st sem (grade) | Feature | Integer | - | No | |
| Curricular units 1st sem (without evaluations) | Feature | Integer | - | No | |
| Curricular units 2nd sem (credited) | Feature | Integer | - | No | |
| Curricular units 2nd sem (enrolled) | Feature | Integer | - | No | |
| Curricular units 2nd sem (evaluations) | Feature | Integer | - | No | |
| Curricular units 2nd sem (approved) | Feature | Integer | - | No | |
| Curricular units 2nd sem (grade) | Feature | Integer | - | No | |
| Curricular units 2nd sem (without evaluations) | Feature | Integer | - | No | |
| Unemployment rate | Feature | Continuous | - | No | |
| Inflation rate | Feature | Continuous | - | No | |
| GDP | Feature | Continuous | - | No | |
| Target | Target | Categorical | - | No |
Introductory Paper
Early prediction of student's performance in higher education: a case study
Mónica V. Martins, Daniel Tolledo, Jorge Machado, Luís M. T. Baptista, and Valentim Realinho. 2021.
Trends and Applications in Information Systems and Technologies
Additional Metadata
Authors
Valentim Realinho
Mónica Vieira Martins
Jorge Machado
Luís Baptista
Year Created
2021
License
CC BY 4.0