Text Generation (Vision)

Vision models behave similarly to text models, but they can accept and interpret images as well.

Model Spotlight: Qwen2.5-VL-7B-Instruct (Vision + Language)

Qwen2.5-VL-7B-Instruct is a powerful multimodal model designed to understand and process both visual and textual inputs. Whether you're building visual assistants, image-based chatbots, or AI tools that analyze pictures, this model is a seamless plug-and-play solution.

Getting Started with Vision-Language Inference

Log in to your Sentry Block account.
Make sure you have an active credit balance.
Go to the "Serverless Endpoints" tab and select the Qwen2.5-VL-7B-Instruct model.
Configure your parameters, upload your image (if applicable), enter your prompt, and hit Enter.

Additional Parameter (for Vision-Language models)

In addition to the standard generation parameters, vision-language models include:

Image – The image URL or file to be analyzed alongside the text prompt.

Using the API Directly

You can integrate the model directly into your own projects via the Sentry Block API. Below are some quick examples to get started.

Important: Don’t forget to include your API key. You can find authentication details in the [API Reference].

cURL Example

bashCopyEditcurl -X POST "https://inference.sentryblock.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $SENTRY_API_KEY" \
--data-raw '{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "image_url", "image_url": {"url": "https://example.com/demo.jpeg"}},
      {"type": "text", "text": "What is this image?"}
    ]
  }],
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "max_tokens": null,
  "temperature": 1,
  "top_p": 0.9,
  "stream": false
}'

Python Example

pythonCopyEditimport requests
import os

url = "https://inference.sentryblock.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.environ.get('SENTRY_API_KEY')}"
}
data = {
    "messages": [{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/demo.jpeg"}},
            {"type": "text", "text": "What is this image?"}
        ]
    }],
    "model": "Qwen/Qwen2.5-VL-7B-Instruct",
    "max_tokens": None,
    "temperature": 1,
    "top_p": 0.9,
    "stream": False
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

JavaScript Example

javascriptCopyEditconst url = "https://inference.sentryblock.com/v1/chat/completions";

const headers = {
  "Content-Type": "application/json",
  "Authorization": `Bearer ${process.env.SENTRY_API_KEY}`
};

const data = {
  messages: [{
    role: "user",
    content: [
      { type: "image_url", image_url: { url: "https://example.com/demo.jpeg" } },
      { type: "text", text: "What is this image?" }
    ]
  }],
  model: "Qwen/Qwen2.5-VL-7B-Instruct",
  max_tokens: null,
  temperature: 1,
  top_p: 0.9,
  stream: false
};

fetch(url, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(data)
})
  .then(response => response.json())
  .then(data => {
    console.log(JSON.stringify(data, null, 2));
  })
  .catch(error => console.error('Error:', error));

Model Identifier

Use the following model ID in your API calls:

Qwen2.5-VL-7B-Instruct → Qwen/Qwen2.5-VL-7B-Instruct

Response Example (Non-Streaming)

jsonCopyEdit{
  "id": "chatcmpl-7ba48f119a564f4ea02b6a41386a3e40",
  "created": 1740689977,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "This image shows ....",
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "completion_tokens": 91,
    "prompt_tokens": 3604,
    "total_tokens": 3695
  }
}

Response Example (Streaming Enabled)

jsonCopyEdit{
  "id": "chatcmpl-3812731562554b23a32dd80fbb7d0d09",
  "created": 1740692435,
  "model": "Qwen/Qwen2.5-VL-7B-Instruct",
  "object": "chat.completion.chunk",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": " setting"
      }
    }
  ]
}

Multiple chunks like this will be streamed in real time, forming the full generated response as data is received.

PreviousText Generation NextImage Generation

Last updated 9 months ago

hashtagModel Spotlight: Qwen2.5-VL-7B-Instruct (Vision + Language)

hashtagGetting Started with Vision-Language Inference

hashtagAdditional Parameter (for Vision-Language models)

hashtagUsing the API Directly

hashtagcURL Example

hashtagPython Example

hashtagJavaScript Example

hashtagModel Identifier

hashtagResponse Example (Non-Streaming)

hashtagResponse Example (Streaming Enabled)