Logo
International Journal of
Advanced Education and Research

Search

ARCHIVES
VOL. 9, ISSUE 3 (2024)
Comparison of activation functions in neural networks
Authors
Mukund Agarwal
Abstract

In this study, we explore the impact of various activation functions on the performance of neural networks, specifically focusing on their application to the MNIST dataset. Neural networks rely heavily on activation functions to introduce non-linearity into the model, enabling them to learn and model complex patterns. Our research compares six activation functions: ReLU, Sigmoid, Tanh, Leaky ReLU, ELU, and Swish. We investigate these functions based on key metrics such as accuracy, training time, training loss history, validation loss history, and accuracy history.

Experiments were conducted using a three-layer fully connected neural network. The MNIST dataset, comprising 60,000 training images and 10,000 test images of handwritten digits, was utilized for training and evaluation. Weights were initialized using the Kaiming Normal Initialization method, and the Adam optimizer with a learning rate of 0.001 was employed. Each model was trained for up to 20 epochs with early stopping criteria based on validation accuracy.

Our findings suggest that while ReLU, ELU, and Swish are highly effective for image recognition tasks, the choice of activation function should be tailored to the specific characteristics of the task and dataset. Future research should explore newer activation functions like GELU and Mish, the combination of multiple activation functions within a single network, and their impact on various neural network architectures.
Download
Pages:10-16
How to cite this article:
Mukund Agarwal "Comparison of activation functions in neural networks". International Journal of Advanced Education and Research, Vol 9, Issue 3, 2024, Pages 10-16
Download Author Certificate

Please enter the email address corresponding to this article submission to download your certificate.