Prediksi Tingkat Pendidikan Arsitek Perangkat Lunak Menggunakan Pendekatan Machine Learning Classification

Seni Agustira; Haffiyan Putra Pratama

doi:10.54650/jukomika.v9i1.664

Authors

Seni Agustira Universitas Pendidikan Indonesia
Haffiyan Putra Pratama Universitas Pendidikan Indonesia

DOI:

https://doi.org/10.54650/jukomika.v9i1.664

Abstract

This study builds a classification model to predict the final education level of software architects based on their usage patterns of software architectural styles in real projects. The dataset used is the Dataset of Software Architectural Styles from Kaggle, consisting of 1,002 respondents with 18 numerical features representing the frequency of usage of 18 architectural styles. The target variable has three classes: BSC (CS or SE), MS (CS or SE), and PhD (CS or SE). The main challenge is the severely imbalanced class distribution, with BSC comprising 77.4% of samples. Preprocessing steps include IQR-based outlier handling, label encoding, StandardScaler normalization, and stratified 80:20 splitting. Two models were trained and compared: Logistic Regression as a baseline and Random Forest as an ensemble model. Results show that Random Forest significantly outperforms Logistic Regression, achieving 78.1% accuracy versus 38.8% on the test set, and cross-validation means of 76.1% versus 39.8%. Feature importance analysis reveals that Blackboard, Data-Centric, and Layered architectural styles contribute most to distinguishing education levels. This study concludes that Random Forest is better suited for non-linear classification with imbalanced classes, and recommends SMOTE and hyperparameter tuning as future improvements.

Prediksi Tingkat Pendidikan Arsitek Perangkat Lunak Menggunakan Pendekatan Machine Learning Classification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Accreditation

Article Template

Aditional Menu

Counter

Tools

Suplementary Files