Matrix Multiplication in Python

EMMFormer: Secure Transformer Inference with Efficient Matrix Multiplication

Abstract: Transformer has been widely applied across various domains due to its outstanding performance, driving the development of numerous application services. However, in many real-world scenarios ...

GitHub

CUDA Kernel for Matrix-Matrix Multiplication on Nvidia GPUs

This code accompanies the blog post Matrix Multiplication Faster Than Nvidia, Sometimes. It provides a CUDA kernel for single-precision matrix-matrix multiplication, with two notable features: use of ...

IEEE

WireLightning: Harnessing Capacitances for In-Transit Massively Parallel Matrix Multiplication

Abstract: Analog computing-in-memory accelerators promise ultra-low-power, on-device AI by reducing data transfer and energy usage. Yet inherent device variations and high energy consumption for ...

GitHub

QiMeng-GEMM: Automatically Generating High-Performance Matrix Multiplication Code by Exploiting Large Language Models

QiMeng-GEMM is an innovative approach to automatically generate high-performance matrix multiplication (GEMM) code using LLMs. This codebase provides a comprehensive solution for efficiently computing ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results