Phi4 output gibberish
#68
by
surajd
- opened
I have deployed this model using vllm on Kubernetes. And while using this model I got a gibberish response like this: https://gist.github.com/surajssd/14b8d5e49cab08d5ecbbfada6cfebcd6#file-gibberish-txt and here are vllm logs: https://gist.github.com/surajssd/14b8d5e49cab08d5ecbbfada6cfebcd6#file-vllm-sh
Here is the config to deploy the model:
apiVersion: apps/v1
kind: Deployment
metadata:
name: phi-4-multimodal
namespace: default
labels:
app: phi-4-multimodal
spec:
replicas: 1
selector:
matchLabels:
app: phi-4-multimodal
template:
metadata:
labels:
app: phi-4-multimodal
spec:
volumes:
- name: shm
emptyDir:
medium: Memory
containers:
- name: vllm
image: vllm/vllm-openai:v0.8.2
command:
- /bin/bash
- -c
# Args: https://huggingface.co./microsoft/Phi-4-multimodal-instruct#vllm-inference
- "pip install vllm[audio];
python3 -m vllm.entrypoints.openai.api_server
--model 'microsoft/Phi-4-multimodal-instruct'
--tensor-parallel-size 1
--pipeline-parallel-size 1
--distributed-executor-backend mp
--dtype auto
--trust-remote-code
--max-model-len 131072
--enable-lora
--max-lora-rank 320
--lora-extra-vocab-size 256
--limit-mm-per-prompt audio=3,image=3
--max-loras 2
--enable-prefix-caching"
ports:
- containerPort: 8000
resources:
limits:
# The model needs only 12 GB of GPU memory.
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
volumeMounts:
- name: shm
mountPath: /dev/shm
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 600
@nguyenbh
I was using vllm/vllm-openai:v0.8.2
docker image or vLLM version v0.8.2
as shown in the YAML. Here is the screenshot: https://gist.github.com/surajssd/14b8d5e49cab08d5ecbbfada6cfebcd6#file-screenshot-png
If you reduce the image resolution, do you see gibberish? I am wondering if you can try to reduce the number max_num_crops for each frame, as well.