Empirical Study on Model Compression Techniques for Efficient Deployment of Large Language Models and Scalable Diffusion Models with Transformers

Supervisor: Prof. Luo Ping, Second examer: Prof. Kong Lingpeng

Student: Li Zhiqian

Large Models can be compressed into smaller-size models with Quantization!

Quantization is a model size reduction technique that converts model weights from high-precision floating-point representation to low-precision floating-point (FP) or integer (INT) representations, such as 8-bit or 4-bit.

Key Techniques in Model Compression

Apply them to Large Language Models(LLMs) and Scalable Diffusion Models with Transformers(DiTs)

Sample Image

Learnable Weight Clipping

Optimal dynamic clipping threshold for weight distribution

Learnable Equivalent Transformation

Channel-wise scaling and shifting for activation distribution -> outlier issue

   

Sample Image

Piecewise Weight Quantization

Whether you’re building a website for your business, personal brand, or creative project, Frost is the perfect solution for anyone looking to launch a website quickly and efficiently.

Demo of DiTs’ structure

Frost is a powerful WordPress theme created for agencies and professional website builders. With its clean, minimal design, Frost provides the perfect canvas for stylish and sophisticated websites.

Build with Frost

Frost is a powerful WordPress theme created for agencies and professional website builders. With its clean, minimal design, Frost provides the perfect canvas for stylish and sophisticated websites.