Exploration of data science techniques to
predict fatigue strength of steel from
composition and processing parameters
Objective
The paper aims to use data science techniques-such as machine learning and
regression analysis-to predict the fatigue strength of various types of steel
based on: Chemical composition: Percentages of elements like carbon,
silicon, manganese, etc. Thermal processing parameters: Including processing
temperatures, treatment durations, and cooling rates.
Abstract
Problem Statement:
Fatigue strength is a critical property in the design of mechanical components. However,
measuring fatigue strength is both expensive and time-consuming.
Proposed Solution:
The paper proposes using data science methods (for example, neural networks, decision trees,
and multivariate polynomial regression) to build predictive models that forecast fatigue
strength.
Data Source:
The study uses data from the National Institute for Materials Science (NIMS) database, which
contains 437 samples with 25 features (covering both chemical composition and thermal
processing parameters).
Key Findings:
The developed models achieved very high prediction accuracy (R² > 0.98), representing a
significant improvement over previous studies that reported R² < 0.94.
Dataset
Source : National Institute for Materials Science (NIMS) public
database.
Details :
437 data instances covering carbon steels, low-alloy steels,
carburizing steels, and spring steels.
25 input features :
Chemical composition (e.g., %C, %Mn, %Cr).
Processing parameters (e.g., normalizing temperature, carburization
time, tempering conditions) Microstructural features.
• Predictive Modeling :
Evaluated 12 techniques , including:
-Linear regression, robust regression, and polynomial regression.
-Instance-based methods (k-NN, KStar).
-Decision trees (REPTree, M5 Model Trees).
-Support Vector Machines (SVM).
-Artificial Neural Networks (ANN).
Results
Best Performing Models:
Multivariate Polynomial Regression (MPR) ,Linear
regression achieved the highest accuracy (R² ≈ 0.98).
M5 Model Trees also performed strongly with an R² ≈
0.978.
Artificial Neural Networks (ANN) (R² ≈ 0.97).
Comparison with Previous Studies:
The current models show an improvement of about 66% in
prediction accuracy compared to earlier work (improving
from R² ≈ 0.94 to R² ≈ 0.98