Abstract: The demand for high-speed matrix multiplication continues to grow due to recent developments in images processing, graphics processing, digital signal processing and communication via ...
CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 ...
Abstract: In intelligent connected vehicle applications, tasks, such as path planning and health management involve numerous matrix operations, particularly matrix multiplication. Due to limited ...