Text Generation

Use these models to generate text, whether it's to review code, write a story, etc.

Available Models on Sentry Block

Sentry Block currently supports a wide range of cutting-edge generative AI models through our Serverless Endpoints. These models are pre-configured and ready for immediate use in your applications.


Available Text Models

  • DeepSeek-R1-Distill-Llama-70B – High-quality text generation for complex queries.

  • DeepSeek-R1-Distill-Qwen-32B – Efficient text generation with lower resource consumption.

  • Llama3.3-70B – Advanced conversational AI with expansive knowledge.

  • Llama3.1-8B – Lightweight, general-purpose model ideal for low-cost tasks.

  • Qwen2.5-Coder-32B – Specialized in code generation, completions, and developer tools.


Using the Models

  1. Visit the Sentry Block platform.

  2. Log into your account and ensure you have sufficient credit.

  3. Go to the "Serverless Endpoints" tab.

  4. Choose a model and configure your parameters.

  5. Enter your prompt (text or image, if applicable) and hit Enter.


Adjustable Parameters

  • Messages – The dialogue history between the user and the assistant.

  • System Prompt – Contextual instructions guiding the model's behavior.

  • Output Length – Max number of tokens the model will generate.

  • Temperature – Controls creativity (higher = more varied responses).

  • Top P – Controls probability sampling for more or less diverse outputs.

  • Stream – Toggle between streaming responses or receiving a full response at once.


API Integration

You can also use the models via Sentry Block’s API endpoints to integrate AI directly into your applications. Don’t forget to include your API key. See our [API Reference] for details on authentication.

Using cURL

Using Python

Using JavaScript


Model Name Mappings

Use these identifiers in the model field of your API request:

  • deepseek-ai/DeepSeek-R1-Distill-Llama-70B

  • deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

  • meta-llama/Llama-3.3-70B-Instruct

  • meta-llama/Llama-3.1-8B-Instruct

  • Qwen/Qwen2.5-Coder-32B-Instruct


Example Response (Non-Streaming)


Example Response (Streaming Enabled)

Each chunk of streamed data adds to the full response. This is useful for real-time applications or chat-like experiences.

Last updated