CUCAI 2025 Archive
← Back to papers

Toxicity Prediction Based on Molecular Structure Using Machine Learning

Tristan William Tucker Tejal Simran Cheema

CUCAI 2025 Proceedings2025

Published 2025/03/26

Abstract

Our project focuses on attempting to develop a machine learning model to predict the toxicity of molecules based on their molecular structure. In our testing we used two model archetypes, a Support Vector Machine (SVM) and a Neural Network, both trained using the Tox21 dataset [1], a catalog of over 10,000 molecules and their relative toxicities based on twelve distinct biological factors. Through our experiments, we found our best results were with using a Neural Network with CHEMBERT featurization of SMILES strings, LDA dimensionality reduction and SMOTEENN resampling. Our results show a positive correlation between molecular structure and toxicity, but found faults in attempting to build one general model to predict all the biological factors at once. To demonstrate an application of our models, we built an app allowing users to input the name of any molecule, and have the model output its predicted toxicity. Link to GitHub here.