Phi4 output gibberish

#68
by surajd - opened

I have deployed this model using vllm on Kubernetes. And while using this model I got a gibberish response like this: https://gist.github.com/surajssd/14b8d5e49cab08d5ecbbfada6cfebcd6#file-gibberish-txt and here are vllm logs: https://gist.github.com/surajssd/14b8d5e49cab08d5ecbbfada6cfebcd6#file-vllm-sh

Screenshot 2025-04-12 at 5.11.32 PM.png

Here is the config to deploy the model:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: phi-4-multimodal
  namespace: default
  labels:
    app: phi-4-multimodal
spec:
  replicas: 1
  selector:
    matchLabels:
      app: phi-4-multimodal
  template:
    metadata:
      labels:
        app: phi-4-multimodal
    spec:
      volumes:
      - name: shm
        emptyDir:
          medium: Memory
      containers:
      - name: vllm
        image: vllm/vllm-openai:v0.8.2
        command:
        - /bin/bash
        - -c
        # Args: https://huggingface.co./microsoft/Phi-4-multimodal-instruct#vllm-inference
        - "pip install vllm[audio];
          python3 -m vllm.entrypoints.openai.api_server
          --model 'microsoft/Phi-4-multimodal-instruct'
          --tensor-parallel-size 1
          --pipeline-parallel-size 1
          --distributed-executor-backend mp
          --dtype auto
          --trust-remote-code
          --max-model-len 131072
          --enable-lora
          --max-lora-rank 320
          --lora-extra-vocab-size 256
          --limit-mm-per-prompt audio=3,image=3
          --max-loras 2
          --enable-prefix-caching"
        ports:
        - containerPort: 8000
        resources:
          limits:
            # The model needs only 12 GB of GPU memory.
            nvidia.com/gpu: "1"
          requests:
            nvidia.com/gpu: "1"
        volumeMounts:
        - name: shm
          mountPath: /dev/shm
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 10
          timeoutSeconds: 600

@surajd Thanks for reporting the issue. What vLLM version are you using? and Can you share the original image?

@nguyenbh I was using vllm/vllm-openai:v0.8.2 docker image or vLLM version v0.8.2 as shown in the YAML. Here is the screenshot: https://gist.github.com/surajssd/14b8d5e49cab08d5ecbbfada6cfebcd6#file-screenshot-png

If you reduce the image resolution, do you see gibberish? I am wondering if you can try to reduce the number max_num_crops for each frame, as well.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment