Harnessing Machine Learning for Predicting School Dropout

In an article published in the journal Scientific Reports, researchers from Mexico utilized several machine learning (ML) algorithms to design predictive models that can identify students who are at risk of dropping out and provide them with appropriate support. They employed their techniques for predicting school dropout at secondary and higher education levels.

Study: Harnessing Machine Learning for Predicting School Dropout. Image credit: Elnur/Shutterstock
Study: Harnessing Machine Learning for Predicting School Dropout. Image credit: Elnur/Shutterstock


ML is a branch of artificial intelligence that enables computers to learn from data and perform tasks that normally require human intelligence. It can be classified into two categories: supervised and unsupervised learning. Supervised learning is when the computer is given a set of input-output pairs and learns to map new inputs to the desired outputs. Unsupervised learning occurs when the computer is given input data without explicit labels and learns to uncover patterns or structures within the data.

ML has been widely used in various fields, such as medicine, engineering, finance, and education. It can help improve the quality and effectiveness of teaching and learning processes and address challenges such as student retention, performance, and satisfaction. School dropout is a complex phenomenon that has multiple causes and consequences, and it is influenced by individual, family, school, and social factors. Therefore, ML can model and predict school dropout by handling large and heterogeneous datasets, capturing nonlinear relationships, and providing accurate results.

About the Research

In the present paper, the authors aimed to develop a model for predicting school dropout with 90% reliability. They used data from the 2010 and 2020 housing and population censuses and the 2015 intercensal survey conducted by the National Institute of Statistics and Geography (INEGI). These data sets included information about the residents and households in Mexico's 32 states and 2,457 municipalities, including factors such as ethnicity, birth, education, health services, economic issues, and other relevant characteristics.

The study selected 20 variables from the data sources based on their correlation with the target variable, which was the academic level of the individuals. The target variable indicated whether the individual had completed or dropped out of secondary or higher education. The selected variables included demographic, socioeconomic, and educational factors, such as age, gender, marital status, occupation, income, school attendance, school type, and school location. The researchers cleaned and homogenized the data, discarding incomplete, duplicate, and unspecified records and retaining only the records of people over 14 years old who entered secondary or higher education. The final dataset consisted of 1,080,782 records.

Furthermore, artificial neural networks (ANN), support vector machines (SVM), Bayesian optimization, random forest (RF), and linear ridge and Lasso regression were applied to create predictive models. These techniques were chosen because they have proven effective and competitive in solving regression problems. Moreover, the performance of each technique was compared in terms of reliability and processing time using different evaluation metrics, such as the coefficient of determination, the mean squared error, and the root mean squared error. The study utilized 80% of the data for training and 20% for testing.

Research Findings

The outcomes showed that all the ML techniques achieved high-reliability results, above 91%. However, the best technique in terms of reliability and processing time was the ANN, which obtained a reliability of 99%, followed by SVM and Bayesian optimization, which obtained a reliability of 99.5% and 99.4%, respectively. RF, linear ridge, and Lasso regression obtained a reliability of 91.3% and 91.1%, respectively. The error rates of the techniques were below 10%, which was the convergence criterion established by the authors. The ANN also had the shortest processing time, while random forest required the most computing power.

Several tests were also performed to optimize the parameters and structure of the ANN, such as the number of layers, neurons, activation function, and optimization algorithm. The authors found that ANN was the best configuration multilayer perceptron with four hidden layers and two neurons each, using the adaptive moment estimation (ADAM) optimization algorithm and the rectified linear unit (ReLU) activation function. Moreover, it was able to learn from the data and to predict the probability of school dropout for everyone based on the input variables.

The study also identified the most influential variables in predicting school dropout using the feature importance method. The most influential variables were school attendance, the school type, the school location, the occupation, the income, and the marital status. These variables reflect the economic, social, and educational factors that affect the decision of students to continue or abandon their studies.


In summary, the paper comprehensively demonstrated the feasibility and usefulness of applying ML to predict school dropout. The authors indicated that the best ML approach was the ANN. They also highlighted the most influential variables in predicting school dropout, which can aid in understanding the causes and consequences of this issue.

The research has several applications and implications for the educational sector, including providing timely support to at-risk students and evaluating the impact of various policies and programs. Additionally, the researchers proposed developing an open platform for institutions to access and utilize the data and predictions, facilitating ongoing model improvement with new data.

Journal reference:
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2024, February 22). Harnessing Machine Learning for Predicting School Dropout. AZoAi. Retrieved on April 16, 2024 from https://www.azoai.com/news/20240222/Harnessing-Machine-Learning-for-Predicting-School-Dropout.aspx.

  • MLA

    Osama, Muhammad. "Harnessing Machine Learning for Predicting School Dropout". AZoAi. 16 April 2024. <https://www.azoai.com/news/20240222/Harnessing-Machine-Learning-for-Predicting-School-Dropout.aspx>.

  • Chicago

    Osama, Muhammad. "Harnessing Machine Learning for Predicting School Dropout". AZoAi. https://www.azoai.com/news/20240222/Harnessing-Machine-Learning-for-Predicting-School-Dropout.aspx. (accessed April 16, 2024).

  • Harvard

    Osama, Muhammad. 2024. Harnessing Machine Learning for Predicting School Dropout. AZoAi, viewed 16 April 2024, https://www.azoai.com/news/20240222/Harnessing-Machine-Learning-for-Predicting-School-Dropout.aspx.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Integrated Machine Learning for Enhancing Footing Stability Prediction in Shallow Foundations