ARCHIVES

2026 ISSUES

VOL. 11 : ISSUE 1 : JAN-MAR VOL. 11 : ISSUE 2 : APR-JUN VOL. 11 : ISSUE 3 : JUL-SEP

2025 ISSUES

2024 ISSUES

2023 ISSUES

2022 ISSUES

2021 ISSUES

2020 ISSUES

2019 ISSUES

2018 ISSUES

2017 ISSUES

2016 ISSUES

VOL. 10, ISSUE 2 (2025)

Exploring FP8 floating-point format for computational efficiency in deep learning inference and training

Authors

Himanshu Sharma, Kamre Shriharsh, S Rishwanth Rao, Dr. Awwab Mohammad

Abstract

FP8 (8-bit floating-point) is an emerging numerical format that promises a balance between computational efficiency and precision in deep learning. Traditionally, formats like FP32 and FP16 have been used for training due to their accuracy, while INT8 has been leveraged for inference to save resources. FP8 introduces a new tradeoff: it offers the efficiency of INT8 with better flexibility, and although it has lower precision, it still supports floating-point operations. This paper investigates FP8’s capabilities for inference and training, the architecture of its configurations (E4M3 and E5M2), and how they affect neural network performance. We compare it with FP16 and INT8, demonstrating the practical benefits and challenges of FP8 implementation.

Download

Pages:39-43

How to cite this article:

Himanshu Sharma, Kamre Shriharsh, S Rishwanth Rao, Dr. Awwab Mohammad "Exploring FP8 floating-point format for computational efficiency in deep learning inference and training". International Journal of Advanced Education and Research, Vol 10, Issue 2, 2025, Pages 39-43

Download Author Certificate

Please enter the email address corresponding to this article submission to download your certificate.