Proceedings of the 2024 2nd International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2024)

Research for Enhancing Processing and Computational Efficiency in LLM

Authors
Yu Cong1, *
1School of Data Science, Capital University of Economics and Business, Beijing, 100070, China
*Corresponding author. Email: 32021230004@cueb.edu.cn
Corresponding Author
Yu Cong
Available Online 16 October 2024.
DOI
10.2991/978-94-6463-540-9_97How to use a DOI?
Keywords
Hybrid LLM inference; soft prompts; decoding optimization
Abstract

In the context of current technological development, large language models (LLMs) have become a core component of artificial intelligence. This report provides an in-depth discussion of various advanced strategies and techniques to improve the processing and computational efficiency of LLMs. First, the report goes through a detailed analysis of automatic 4-bit Integer Quantization (INT4 quantization). It then discusses binarization with the Flexible Dual Binarization (FDB) fusion strategy in depth and elaborates on the principle of automatic INT4 quantization and its positive impact on computational efficiency. Furthermore, it explores the flexibility of the fusion strategy of binarization with FDB and its application. Lastly, it examines the application of Atom technology in low-bit quantization and its contribution to processing efficiency. Further, the report explores hybrid LLM inference strategies, focusing on the principles of hybrid LLM inference and the impact on efficiency. Finally, the report introduces soft prompt and decoding optimization techniques, including the principles and advantages of the MEDUSA framework and the SARATHI technique, as well as the application of the Transferable Prompt technique. By synthesizing these strategies and techniques, this report provides strong guidance and reference for the efficient deployment and application of LLM.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2024 2nd International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2024)
Series
Advances in Computer Science Research
Publication Date
16 October 2024
ISBN
978-94-6463-540-9
ISSN
2352-538X
DOI
10.2991/978-94-6463-540-9_97How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Yu Cong
PY  - 2024
DA  - 2024/10/16
TI  - Research for Enhancing Processing and Computational Efficiency in LLM
BT  - Proceedings of the 2024 2nd International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2024)
PB  - Atlantis Press
SP  - 970
EP  - 980
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-540-9_97
DO  - 10.2991/978-94-6463-540-9_97
ID  - Cong2024
ER  -