Prediksi Tingkat Pendidikan Arsitek Perangkat Lunak Menggunakan Pendekatan Machine Learning Classification
DOI:
https://doi.org/10.54650/jukomika.v9i1.664Abstract
This study builds a classification model to predict the final education level of software architects based on their usage patterns of software architectural styles in real projects. The dataset used is the Dataset of Software Architectural Styles from Kaggle, consisting of 1,002 respondents with 18 numerical features representing the frequency of usage of 18 architectural styles. The target variable has three classes: BSC (CS or SE), MS (CS or SE), and PhD (CS or SE). The main challenge is the severely imbalanced class distribution, with BSC comprising 77.4% of samples. Preprocessing steps include IQR-based outlier handling, label encoding, StandardScaler normalization, and stratified 80:20 splitting. Two models were trained and compared: Logistic Regression as a baseline and Random Forest as an ensemble model. Results show that Random Forest significantly outperforms Logistic Regression, achieving 78.1% accuracy versus 38.8% on the test set, and cross-validation means of 76.1% versus 39.8%. Feature importance analysis reveals that Blackboard, Data-Centric, and Layered architectural styles contribute most to distinguishing education levels. This study concludes that Random Forest is better suited for non-linear classification with imbalanced classes, and recommends SMOTE and hyperparameter tuning as future improvements.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Seni Agustira, Haffiyan Putra Pratama

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

