In this study, we explore the impact of
various activation functions on the performance of neural networks,
specifically focusing on their application to the MNIST dataset. Neural
networks rely heavily on activation functions to introduce non-linearity into
the model, enabling them to learn and model complex patterns. Our research
compares six activation functions: ReLU, Sigmoid, Tanh, Leaky ReLU, ELU, and
Swish. We investigate these functions based on key metrics such as accuracy,
training time, training loss history, validation loss history, and accuracy
history.
Experiments were conducted using a three-layer
fully connected neural network. The MNIST dataset, comprising 60,000 training
images and 10,000 test images of handwritten digits, was utilized for training
and evaluation. Weights were initialized using the Kaiming Normal
Initialization method, and the Adam optimizer with a learning rate of 0.001 was
employed. Each model was trained for up to 20 epochs with early stopping
criteria based on validation accuracy.
Please enter the email address corresponding to this article submission to download your certificate.
