π§ Phi-2 GPTQ (Quantized)
This repository provides a 4-bit GPTQ quantized version of the Phi-2 model by Microsoft, optimized for efficient inference using gptqmodel
.
π Model Details
- Base Model: Microsoft Phi-2
- Quantization: GPTQ (4-bit)
- Quantizer:
GPTQModel
- Framework: PyTorch + HuggingFace Transformers
- Device Support: CUDA (GPU)
- License: Apache 2.0
π Features
- β Lightweight: 4-bit quantization significantly reduces memory usage
- β Fast Inference: Ideal for deployment on consumer GPUs
- β
Compatible: Works with
transformers
,optimum
, andgptqmodel
- β CUDA-accelerated: Automatically uses GPU for speed
π Usage
This model is ready-to-use with the Hugging Face transformers
library.
π§ͺ Intended Use
- Research and development
- Prototyping generative applications
- Fast inference environments with limited GPU memory
π References
- Microsoft Phi-2: https://huggingface.co./microsoft/phi-2
- GPTQModel: https://github.com/ModelCoud/GPTQModel
- Transformers: https://github.com/huggingface/transformers
βοΈ License
This model is distributed under the Apache License 2.0.
- Downloads last month
- 29
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for STiFLeR7/Phi2-GPTQ
Base model
microsoft/phi-2