Vision
Load model
Just like loading the Chat model, for the vision model, you need two specific types:
- the
GGUF model
- the
mmproj model
.
You can load the model using:
Load Model
curl -X POST 'http://127.0.0.1:3928/inferences/llamacpp/loadmodel' -H 'Content-Type: application/json' -d '{
"llama_model_path": "/path/to/gguf/model/",
"mmproj": "/path/to/mmproj/model/",
"ctx_len": 2048,
"ngl": 100,
"cont_batching": false,
"embedding": false,
"system_prompt": "",
"user_prompt": "\n### Instruction:\n",
"ai_prompt": "\n### Response:\n"
}'
Download the models here:
- Llava Model: Large Language and Vision Assistant achieves SoTA on 11 benchmarks.
- Bakllava Model is a Mistral 7B base augmented with the LLaVA architecture.
Inference
Nitro currently only works with images converted to base64 format. Use this base64 converter to prepare your images.
To get the model's understanding of an image, do the following:
Inference
curl http://127.0.0.1:3928/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "<base64>"
}
}
]
}
],
"max_tokens": 300
}'
If the base64 string is too long and causes errors, consider using Postman as an alternative.