Chat Completions

The POST /v1/chat/completions endpoint is the primary method for generating text, code, or structured JSON responses from supported language models. It is 100% compatible with the OpenAI Chat Completions API schema.

[!NOTE] DeepToken acts as an intelligent proxy. You call this endpoint, and DeepToken handles routing, prioritization, provider failovers, token counting, and direct credit deductions from your balance.

Endpoint Details

URL: https://api.deeptoken.app/v1/chat/completions
Method: POST
Headers:
- Authorization: Bearer <DEEPTOKEN_API_KEY> (Required)
- Content-Type: application/json (Required)
- X-DeepToken-Org: <ORG_SLUG> (Optional, to attribute costs to a specific organization wallet)

Code Examples

Select your preferred integration method to view a sample request:

curl https://api.deeptoken.app/v1/chat/completions \
  -H "Authorization: Bearer $DEEPTOKEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Why is the sky blue?"
      }
    ],
    "temperature": 0.7
  }'

from openai import OpenAI

client = OpenAI(
    api_key="$DEEPTOKEN_API_KEY",
    base_url="https://api.deeptoken.app/v1"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"}
    ],
    temperature=0.7
)

print(response.choices[0].message.content)

import OpenAI from "openai"

const client = new OpenAI({
  apiKey: process.env.DEEPTOKEN_API_KEY,
  baseURL: "https://api.deeptoken.app/v1"
})

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "user", content: "Why is the sky blue?" }
  ],
  temperature: 0.7
})

console.log(response.choices[0].message.content)

Request Parameters

The request body must be a JSON object containing the following parameters:

Parameter	Type	Required?	Description
`model`	`string`	Yes	The ID of the model to use. See the catalog for all supported model IDs.
`messages`	`array`	Yes	A list of message objects representing the conversation history. See Message Object below.
`temperature`	`number`	No (default: `1`)	Sampling temperature between `0` and `2`. Higher values make output more random, lower values make it more focused.
`top_p`	`number`	No (default: `1`)	Nucleus sampling factor. `0.1` means only tokens comprising the top 10% probability mass are considered.
`stream`	`boolean`	No (default: `false`)	If `true`, tokens are sent as Server-Sent Events (SSE) as they become available.
`max_tokens`	`integer`	No	The maximum number of tokens to generate in the completion.
`stop`	`string` or `array`	No	Up to 4 sequences where the API will stop generating further tokens.
`response_format`	`object`	No	Specify `{ "type": "json_object" }` or schema definition to enforce JSON output.
`tools`	`array`	No	A list of tools (functions) the model may call.
`tool_choice`	`string` or `object`	No	Controls which tool is called by the model (`none`, `auto`, `required`, or object).

Message Object

Each object in the messages array has the following structure:

Field	Type	Required?	Description
`role`	`string`	Yes	The role of the messages author: `system`, `user`, `assistant`, or `tool`.
`content`	`string` or `array`	Yes	The contents of the message (text, or array of content parts for multimodal input).
`name`	`string`	No	An optional name for the participant, useful to distinguish multiple users.
`tool_call_id`	`string`	No (for tool role)	The ID of the tool call this message responds to.

Response Schema

A successful non-streaming response returns a JSON object with the following fields:

{
  "id": "chatcmpl-9A8b9C...",
  "object": "chat.completion",
  "created": 1718029562,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The sky is blue because of Rayleigh scattering..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 85,
    "total_tokens": 98
  }
}

Response Fields

id: A unique identifier for the chat completion.
object: The object type, always chat.completion.
created: The Unix timestamp (in seconds) of when the chat completion was created.
model: The model used for generating the completion.
choices: A list of completion choices. Each choice contains:
- message: The generated message object.
- finish_reason: Why the model stopped generating (stop, length, tool_calls, etc.).
usage: Token usage statistics for the request.
- [!IMPORTANT]
- DeepToken calculates billing based on the token counts returned in usage. Ensure your code handles this block if you are tracking usage client-side.