🧠 Phi-2 GPTQ (Quantized)

This repository provides a 4-bit GPTQ quantized version of the Phi-2 model by Microsoft, optimized for efficient inference using gptqmodel.

πŸ“Œ Model Details

  • Base Model: Microsoft Phi-2
  • Quantization: GPTQ (4-bit)
  • Quantizer: GPTQModel
  • Framework: PyTorch + HuggingFace Transformers
  • Device Support: CUDA (GPU)
  • License: Apache 2.0

πŸš€ Features

  • βœ… Lightweight: 4-bit quantization significantly reduces memory usage
  • βœ… Fast Inference: Ideal for deployment on consumer GPUs
  • βœ… Compatible: Works with transformers, optimum, and gptqmodel
  • βœ… CUDA-accelerated: Automatically uses GPU for speed

πŸ“š Usage

This model is ready-to-use with the Hugging Face transformers library.

πŸ§ͺ Intended Use

  • Research and development
  • Prototyping generative applications
  • Fast inference environments with limited GPU memory

πŸ“– References

βš–οΈ License

This model is distributed under the Apache License 2.0.

Downloads last month
29
Safetensors
Model size
918M params
Tensor type
I32
Β·
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for STiFLeR7/Phi2-GPTQ

Base model

microsoft/phi-2
Quantized
(26)
this model