Research for Enhancing Processing and Computational Efficiency in LLM

Yu Cong

doi:10.2991/978-94-6463-540-9_97

<Previous Article In Volume

Next Article In Volume>

Research for Enhancing Processing and Computational Efficiency in LLM

Authors

Yu Cong¹^{, *}

¹School of Data Science, Capital University of Economics and Business, Beijing, 100070, China

^*Corresponding author. Email: 32021230004@cueb.edu.cn

Corresponding Author

Yu Cong

Available Online 16 October 2024.

DOI: 10.2991/978-94-6463-540-9_97 How to use a DOI?
Keywords: Hybrid LLM inference; soft prompts; decoding optimization
Abstract: In the context of current technological development, large language models (LLMs) have become a core component of artificial intelligence. This report provides an in-depth discussion of various advanced strategies and techniques to improve the processing and computational efficiency of LLMs. First, the report goes through a detailed analysis of automatic 4-bit Integer Quantization (INT4 quantization). It then discusses binarization with the Flexible Dual Binarization (FDB) fusion strategy in depth and elaborates on the principle of automatic INT4 quantization and its positive impact on computational efficiency. Furthermore, it explores the flexibility of the fusion strategy of binarization with FDB and its application. Lastly, it examines the application of Atom technology in low-bit quantization and its contribution to processing efficiency. Further, the report explores hybrid LLM inference strategies, focusing on the principles of hybrid LLM inference and the impact on efficiency. Finally, the report introduces soft prompt and decoding optimization techniques, including the principles and advantages of the MEDUSA framework and the SARATHI technique, as well as the application of the Transferable Prompt technique. By synthesizing these strategies and techniques, this report provides strong guidance and reference for the efficient deployment and application of LLM.
Copyright: © 2024 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2024 2nd International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2024)
Series: Advances in Computer Science Research
Publication Date: 16 October 2024
ISBN: 978-94-6463-540-9
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-540-9_97 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Yu Cong
PY  - 2024
DA  - 2024/10/16
TI  - Research for Enhancing Processing and Computational Efficiency in LLM
BT  - Proceedings of the 2024 2nd International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2024)
PB  - Atlantis Press
SP  - 970
EP  - 980
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-540-9_97
DO  - 10.2991/978-94-6463-540-9_97
ID  - Cong2024
ER  -

download .riscopy to clipboard