Skip to content
DeepTokenInference Gateway
HomeDashboardModelsLeaderboardDocsPricingEnterpriseBlog

    Introduction

    • Getting started
    • Quickstart
    • Integrations

    API

    • Authentication
    • Chat Completions
    • Models
    • Errors

    Billing

    • Billing
    • Organizations

    Chat Completions

    The POST /v1/chat/completions endpoint is the primary method for generating text, code, or structured JSON responses from supported language models. It is 100% compatible with the OpenAI Chat Completions API schema.

    [!NOTE] DeepToken acts as an intelligent proxy. You call this endpoint, and DeepToken handles routing, prioritization, provider failovers, token counting, and direct credit deductions from your balance.


    Endpoint Details

    • URL: https://api.deeptoken.app/v1/chat/completions
    • Method: POST
    • Headers:
      • Authorization: Bearer <DEEPTOKEN_API_KEY> (Required)
      • Content-Type: application/json (Required)
      • X-DeepToken-Org: <ORG_SLUG> (Optional, to attribute costs to a specific organization wallet)

    Code Examples

    Select your preferred integration method to view a sample request:

    curl https://api.deeptoken.app/v1/chat/completions \
      -H "Authorization: Bearer $DEEPTOKEN_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gpt-4o-mini",
        "messages": [
          {
            "role": "user",
            "content": "Why is the sky blue?"
          }
        ],
        "temperature": 0.7
      }'
    
    from openai import OpenAI
    
    client = OpenAI(
        api_key="$DEEPTOKEN_API_KEY",
        base_url="https://api.deeptoken.app/v1"
    )
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "user", "content": "Why is the sky blue?"}
        ],
        temperature=0.7
    )
    
    print(response.choices[0].message.content)
    
    import OpenAI from "openai"
    
    const client = new OpenAI({
      apiKey: process.env.DEEPTOKEN_API_KEY,
      baseURL: "https://api.deeptoken.app/v1"
    })
    
    const response = await client.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [
        { role: "user", content: "Why is the sky blue?" }
      ],
      temperature: 0.7
    })
    
    console.log(response.choices[0].message.content)
    

    Request Parameters

    The request body must be a JSON object containing the following parameters:

    ParameterTypeRequired?Description
    modelstringYesThe ID of the model to use. See the catalog for all supported model IDs.
    messagesarrayYesA list of message objects representing the conversation history. See Message Object below.
    temperaturenumberNo (default: 1)Sampling temperature between 0 and 2. Higher values make output more random, lower values make it more focused.
    top_pnumberNo (default: 1)Nucleus sampling factor. 0.1 means only tokens comprising the top 10% probability mass are considered.
    streambooleanNo (default: false)If true, tokens are sent as Server-Sent Events (SSE) as they become available.
    max_tokensintegerNoThe maximum number of tokens to generate in the completion.
    stopstring or arrayNoUp to 4 sequences where the API will stop generating further tokens.
    response_formatobjectNoSpecify { "type": "json_object" } or schema definition to enforce JSON output.
    toolsarrayNoA list of tools (functions) the model may call.
    tool_choicestring or objectNoControls which tool is called by the model (none, auto, required, or object).

    Message Object

    Each object in the messages array has the following structure:

    FieldTypeRequired?Description
    rolestringYesThe role of the messages author: system, user, assistant, or tool.
    contentstring or arrayYesThe contents of the message (text, or array of content parts for multimodal input).
    namestringNoAn optional name for the participant, useful to distinguish multiple users.
    tool_call_idstringNo (for tool role)The ID of the tool call this message responds to.

    Response Schema

    A successful non-streaming response returns a JSON object with the following fields:

    {
      "id": "chatcmpl-9A8b9C...",
      "object": "chat.completion",
      "created": 1718029562,
      "model": "gpt-4o-mini",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "The sky is blue because of Rayleigh scattering..."
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 85,
        "total_tokens": 98
      }
    }
    

    Response Fields

    • id: A unique identifier for the chat completion.
    • object: The object type, always chat.completion.
    • created: The Unix timestamp (in seconds) of when the chat completion was created.
    • model: The model used for generating the completion.
    • choices: A list of completion choices. Each choice contains:
      • message: The generated message object.
      • finish_reason: Why the model stopped generating (stop, length, tool_calls, etc.).
    • usage: Token usage statistics for the request.
      • [!IMPORTANT]

      • DeepToken calculates billing based on the token counts returned in usage. Ensure your code handles this block if you are tracking usage client-side.

    Previous

    Authentication

    Next

    Models

    On this page

    • Endpoint Details
    • Code Examples
    • Request Parameters
    • Message Object
    • Response Schema
    • Response Fields