Nonlinear Machine Learning Models for Compositional Superconductivity Prediction
David Szczecina
CUCAI 2026 Proceedings - 2026
Abstract
Accurate prediction of superconducting critical temperature (Tc) from chemical composition is a central challenge in materials informatics and an important step toward accelerating superconductor discovery. In this work, we conduct a systematic benchmarking study of linear regression, regularized linear models, tree-based ensembles, gradient boosting methods, neural networks, and stacked ensembles on the UCI Superconductivity dataset comprising over 21,000 materials described by engineered compositional features. Our results demonstrate that nonlinear models substantially outperform linear baselines, indicating that Tc depends on com- plex, non-additive interactions among compositional descriptors. Tree-based ensemble methods consistently achieve the strongest predictive performance, while boosting approaches and stacked ensembling provide competitive alternatives. Neural networks also capture nonlinear structure effectively, though they do not surpass ensemble tree methods in this tabular setting. Cross-validation analysis reveals increased variability in flexible nonlinear models relative to linear regression, highlighting the bias-variance tradeoff inherent in high-capacity learners. Overall, this study establishes an updated empirical benchmark for compositional superconductivity prediction and underscores the importance of nonlinear modeling in materials property prediction