Text Generation

Use these models to generate text, whether it's to review code, write a story, etc.

Available Models on Sentry Block

Sentry Block currently supports a wide range of cutting-edge generative AI models through our Serverless Endpoints. These models are pre-configured and ready for immediate use in your applications.

Available Text Models

DeepSeek-R1-Distill-Llama-70B – High-quality text generation for complex queries.
DeepSeek-R1-Distill-Qwen-32B – Efficient text generation with lower resource consumption.
Llama3.3-70B – Advanced conversational AI with expansive knowledge.
Llama3.1-8B – Lightweight, general-purpose model ideal for low-cost tasks.
Qwen2.5-Coder-32B – Specialized in code generation, completions, and developer tools.

Using the Models

Visit the Sentry Block platform.
Log into your account and ensure you have sufficient credit.
Go to the "Serverless Endpoints" tab.
Choose a model and configure your parameters.
Enter your prompt (text or image, if applicable) and hit Enter.

Adjustable Parameters

Messages – The dialogue history between the user and the assistant.
System Prompt – Contextual instructions guiding the model's behavior.
Output Length – Max number of tokens the model will generate.
Temperature – Controls creativity (higher = more varied responses).
Top P – Controls probability sampling for more or less diverse outputs.
Stream – Toggle between streaming responses or receiving a full response at once.

API Integration

You can also use the models via Sentry Block’s API endpoints to integrate AI directly into your applications. Don’t forget to include your API key. See our [API Reference] for details on authentication.

Using cURL

bashCopyEditcurl -X POST "https://inference.sentryblock.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $SENTRY_API_KEY" \
--data-raw '{
    "messages": [
        {"role": "user", "content": "Is Montreal a thriving hub for the AI industry?"}
    ],
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "max_tokens": null,
    "temperature": 1,
    "top_p": 0.9,
    "stream": false
}'

Using Python

pythonCopyEditimport requests
import os

url = "https://inference.sentryblock.com/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.environ.get('SENTRY_API_KEY')}"
}
data = {
    "messages": [
        {"role": "user", "content": "Is Montreal a thriving hub for the AI industry?"}
    ],
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "max_tokens": None,
    "temperature": 1,
    "top_p": 0.9,
    "stream": False
}
response = requests.post(url, headers=headers, json=data)
print(response.json())

Using JavaScript

javascriptCopyEditconst url = "https://inference.sentryblock.com/v1/chat/completions";

const headers = {
  "Content-Type": "application/json",
  "Authorization": `Bearer ${process.env.SENTRY_API_KEY}`
};

const data = {
  messages: [
    { role: "user", content: "Is Montreal a thriving hub for the AI industry?" }
  ],
  model: "meta-llama/Llama-3.1-8B-Instruct",
  max_tokens: null,
  temperature: 1,
  top_p: 0.9,
  stream: false
};

fetch(url, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(data)
})
  .then(response => response.json())
  .then(data => {
    console.log(JSON.stringify(data, null, 2));
  })
  .catch(error => console.error('Error:', error));

Model Name Mappings

Use these identifiers in the model field of your API request:

deepseek-ai/DeepSeek-R1-Distill-Llama-70B
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
meta-llama/Llama-3.3-70B-Instruct
meta-llama/Llama-3.1-8B-Instruct
Qwen/Qwen2.5-Coder-32B-Instruct

Example Response (Non-Streaming)

jsonCopyEdit{
  "id": "chatcmpl-ec0014bc38e2cad1e45d47f7f01f6569",
  "created": 1740432179,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Yes! Montreal is the home of cutting edge ... research.",
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "completion_tokens": 695,
    "prompt_tokens": 42,
    "total_tokens": 737
  }
}

Example Response (Streaming Enabled)

jsonCopyEdit{
  "id": "chatcmpl-289eb1f670a58c5cde47ddb634aad595",
  "created": 1740432271,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "object": "chat.completion.chunk",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": " everyone"
      }
    }
  ]
}

Each chunk of streamed data adds to the full response. This is useful for real-time applications or chat-like experiences.

PreviousOverview NextText Generation (Vision)

Last updated 9 months ago

hashtagAvailable Models on Sentry Block

hashtagAvailable Text Models

hashtagUsing the Models

hashtagAdjustable Parameters

hashtagAPI Integration

hashtagModel Name Mappings

hashtagExample Response (Non-Streaming)

hashtagExample Response (Streaming Enabled)