Overview

Welcome to Sentry Block’s Dedicated Endpoints for AI Inference — built for teams and developers who need full control over model deployments and infrastructure. With Dedicated Endpoints, you can deploy state-of-the-art models like Meta LLaMA, Qwen, or your own custom-trained models on hardware configurations tailored to your specific needs.

To request access or deployment, please contact us [here].


Key Features

  • Custom Hardware Options Choose the GPU type, memory, and compute specs that align with your performance requirements.

  • Bring Your Own Model Use Sentry Block’s pre-integrated models or upload and deploy your own.

  • Fine-Tuned Control Customize deployment parameters, manage runtime behavior, and optimize throughput.

  • True Scalability Run intensive workloads with dedicated, isolated resources—built for high-availability and scale.


Prerequisites

  • Sentry Block Account You’ll need an active account on the Sentry Block platform.

  • Credit Balance Ensure you have enough credits to launch and run GPU instances (minimum 1-hour usage recommended).


Pricing and Billing

  • Pay-As-You-Go Charges are based on the type of hardware selected and the total usage time.

For detailed pricing, visit our [Pricing Page].

Last updated