Skip to content
DeepTokenInference Gateway
HomeDashboardModelsLeaderboardDocsPricingEnterpriseBlog

    Introduction

    • Getting started
    • Quickstart
    • Integrations

    API

    • Authentication
    • Chat Completions
    • Models
    • Errors

    Billing

    • Billing
    • Organizations

    Models

    The model id you pass in the request body is matched against the gateway's catalog. The list is dynamic β€” admin can enable, disable, or remap a model without a client release.

    Discovering models

    Two surfaces expose the catalog:

    • Public catalog β€” the marketing-facing list, what an anonymous visitor would see if they signed up to the free tier.
    • GET /v1/models β€” authenticated, narrows to the models your API key may call.

    Both endpoints reflect live admin state β€” disable an upstream channel and the model disappears from the catalog within 30 seconds.

    Routing

    When you call /v1/chat/completions with a model field, the gateway:

    1. Finds every enabled channel that serves that model.
    2. Filters by your tier's group (free / pro / team / enterprise + the shared default group).
    3. Picks the highest-priority healthy channel. Ties break on weight, then id.
    4. On a transient upstream failure (5xx, timeout, connection reset), retries with the next channel in the chain. Up to three fallback attempts per request.

    Model remapping

    A channel can remap the user-facing model id to a vendor-specific deployment id (e.g. gpt-4o β†’ my-azure-deployment-name). The remap is invisible to clients β€” you still bill against the canonical model id, and the response carries the canonical id too.

    Previous

    Chat Completions

    Next

    Errors

    On this page

    • Discovering models
    • Routing
    • Model remapping