api docs

Connecting to fabryka

Connecting to `router.fabryka.ai`

router.fabryka.ai is an OpenAI-compatible LLM API. If a tool can talk to

OpenAI or OpenRouter, it can talk to fabryka — you only ever change three things:

Pricing: $0.20 / 1M input tokens, $0.60 / 1M output tokens. Check your balance

any time: GET /v1/credits (or paste your key at /account).

1. Quick check — curl

bash

curl https://router.fabryka.ai/v1/chat/completions \
  -H "Authorization: Bearer $FABRYKA_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-35b-a3b",
    "messages": [{"role":"user","content":"Say hello in 5 words."}]
  }'

Balance:

bash

curl https://router.fabryka.ai/v1/credits -H "Authorization: Bearer $FABRYKA_KEY"

2. OpenAI SDKs (drop-in)

Python

python

from openai import OpenAI
client = OpenAI(base_url="https://router.fabryka.ai/v1",
                api_key="sk-fab-...")
r = client.chat.completions.create(
    model="qwen3.6-35b-a3b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(r.choices[0].message.content)

JavaScript / TypeScript

import OpenAI from "openai";
const client = new OpenAI({
  baseURL: "https://router.fabryka.ai/v1",
  apiKey: process.env.FABRYKA_KEY,
});
const r = await client.chat.completions.create({
  model: "qwen3.6-35b-a3b",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(r.choices[0].message.content);

3. Hermes

Hermes reads its model provider from ~/.hermes/config.yaml. Point the

model: block at fabryka — this is the whole integration:

yaml

# ~/.hermes/config.yaml
model:
  default: qwen3.6-35b-a3b
  provider: custom:fabryka
  base_url: https://router.fabryka.ai/v1
  api_mode: chat_completions
  api_key: sk-fab-YOUR_KEY        # get one free at https://router.fabryka.ai

Then restart Hermes so it picks up the change:

bash

# if Hermes runs as a systemd service:
sudo systemctl restart hermes-sdk-server
# otherwise restart however you launched `hermes`

Verify it's wired (call your local Hermes SDK; it should answer via fabryka):

bash

curl http://127.0.0.1:8800/v1/chat/completions \
  -H "Authorization: Bearer $YOUR_LOCAL_HERMES_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.6-35b-a3b","messages":[{"role":"user","content":"reply: online"}]}'

Notes

Change only the LLM api_key (the upstream model provider) to your

sk-fab-.... The key your *app* uses to call the local Hermes SDK is separate

and stays the same.

qwen3.6-35b-a3b is a reasoning model. For fast tool-loop steps you can have

Hermes send chat_template_kwargs: {enable_thinking: false} (see §6).

One model, single-GPU backend → keep concurrency at 1 (the gateway returns

503 if a second request lands mid-generation).

4. OpenRouter-style usage

router.fabryka.ai speaks the same protocol as OpenRouter, so anywhere a tool

expects OpenRouter you can swap the base URL:

diff

- OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
+ OPENROUTER_BASE_URL=https://router.fabryka.ai/v1
  model: qwen3.6-35b-a3b

Add fabryka as a provider in a self-hosted aggregator (LiteLLM example):

yaml

# litellm config.yaml
model_list:
  - model_name: fabryka/qwen3.6-35b-a3b
    litellm_params:
      model: openai/qwen3.6-35b-a3b
      api_base: https://router.fabryka.ai/v1
      api_key: os.environ/FABRYKA_KEY

> Note: OpenRouter's *hosted* service does not let end users register arbitrary

> upstreams. "As a provider" here means: use fabryka wherever you'd point at

> OpenRouter, or register it as an OpenAI-compatible upstream in your own router

> (LiteLLM, a custom gateway, etc.).

5. Streaming

Standard OpenAI SSE streaming is supported — set "stream": true. Usage is

reported in a final chunk and billed automatically:

bash

curl https://router.fabryka.ai/v1/chat/completions \
  -H "Authorization: Bearer $FABRYKA_KEY" -H "Content-Type: application/json" \
  -d '{"model":"qwen3.6-35b-a3b","stream":true,
       "messages":[{"role":"user","content":"Count to 5."}]}'

6. Reasoning mode (important for this model)

qwen3.6-35b-a3b is a reasoning model. By default it "thinks" first:

The thinking trace comes back in message.reasoning_content, and the final

answer in message.content.

Give it enough room — use a generous max_tokens (e.g. 1024+) so it can finish

thinking *and* answer. With a small budget, all tokens go to reasoning and

content will be empty.

Want fast, direct answers (no thinking)? Disable it:

json

{
  "model": "qwen3.6-35b-a3b",
  "messages": [{"role":"user","content":"Say hello in 5 words."}],
  "chat_template_kwargs": {"enable_thinking": false}
}

7. Limits & errors

Endpoints: POST /v1/chat/completions, GET /v1/models, GET /v1/credits.

Raw markdown: /docs.txt · home

Connecting to fabryka

Connecting to router.fabryka.ai

1. Quick check — curl

2. OpenAI SDKs (drop-in)

3. Hermes

4. OpenRouter-style usage

5. Streaming

6. Reasoning mode (important for this model)

7. Limits & errors

Connecting to `router.fabryka.ai`