Overview
Welcome to Sentry Block’s Dedicated Endpoints for AI Inference — built for teams and developers who need full control over model deployments and infrastructure. With Dedicated Endpoints, you can deploy state-of-the-art models like Meta LLaMA, Qwen, or your own custom-trained models on hardware configurations tailored to your specific needs.
To request access or deployment, please contact us [here].
Key Features
Custom Hardware Options Choose the GPU type, memory, and compute specs that align with your performance requirements.
Bring Your Own Model Use Sentry Block’s pre-integrated models or upload and deploy your own.
Fine-Tuned Control Customize deployment parameters, manage runtime behavior, and optimize throughput.
True Scalability Run intensive workloads with dedicated, isolated resources—built for high-availability and scale.
Prerequisites
Sentry Block Account You’ll need an active account on the Sentry Block platform.
Credit Balance Ensure you have enough credits to launch and run GPU instances (minimum 1-hour usage recommended).
Pricing and Billing
Pay-As-You-Go Charges are based on the type of hardware selected and the total usage time.
For detailed pricing, visit our [Pricing Page].
Last updated