Connecting to fabryka
Connecting to router.fabryka.ai
router.fabryka.ai is an OpenAI-compatible LLM API. If a tool can talk to
OpenAI or OpenRouter, it can talk to fabryka — you only ever change three things:
Pricing: $0.20 / 1M input tokens, $0.60 / 1M output tokens. Check your balance
any time: GET /v1/credits (or paste your key at /account).
1. Quick check — curl
curl https://router.fabryka.ai/v1/chat/completions \
-H "Authorization: Bearer $FABRYKA_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.6-35b-a3b",
"messages": [{"role":"user","content":"Say hello in 5 words."}]
}'Balance:
curl https://router.fabryka.ai/v1/credits -H "Authorization: Bearer $FABRYKA_KEY"2. OpenAI SDKs (drop-in)
Python
from openai import OpenAI
client = OpenAI(base_url="https://router.fabryka.ai/v1",
api_key="sk-fab-...")
r = client.chat.completions.create(
model="qwen3.6-35b-a3b",
messages=[{"role": "user", "content": "Hello!"}],
)
print(r.choices[0].message.content)JavaScript / TypeScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://router.fabryka.ai/v1",
apiKey: process.env.FABRYKA_KEY,
});
const r = await client.chat.completions.create({
model: "qwen3.6-35b-a3b",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(r.choices[0].message.content);3. Hermes
Hermes reads its model provider from ~/.hermes/config.yaml. Point the
model: block at fabryka — this is the whole integration:
# ~/.hermes/config.yaml
model:
default: qwen3.6-35b-a3b
provider: custom:fabryka
base_url: https://router.fabryka.ai/v1
api_mode: chat_completions
api_key: sk-fab-YOUR_KEY # get one free at https://router.fabryka.aiThen restart Hermes so it picks up the change:
# if Hermes runs as a systemd service:
sudo systemctl restart hermes-sdk-server
# otherwise restart however you launched `hermes`Verify it's wired (call your local Hermes SDK; it should answer via fabryka):
curl http://127.0.0.1:8800/v1/chat/completions \
-H "Authorization: Bearer $YOUR_LOCAL_HERMES_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.6-35b-a3b","messages":[{"role":"user","content":"reply: online"}]}'Notes
- Change only the LLM
api_key(the upstream model provider) to your
sk-fab-.... The key your *app* uses to call the local Hermes SDK is separate
and stays the same.
qwen3.6-35b-a3bis a reasoning model. For fast tool-loop steps you can have
Hermes send chat_template_kwargs: {enable_thinking: false} (see §6).
- One model, single-GPU backend → keep concurrency at 1 (the gateway returns
503 if a second request lands mid-generation).
4. OpenRouter-style usage
router.fabryka.ai speaks the same protocol as OpenRouter, so anywhere a tool
expects OpenRouter you can swap the base URL:
- OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
+ OPENROUTER_BASE_URL=https://router.fabryka.ai/v1
model: qwen3.6-35b-a3bAdd fabryka as a provider in a self-hosted aggregator (LiteLLM example):
# litellm config.yaml
model_list:
- model_name: fabryka/qwen3.6-35b-a3b
litellm_params:
model: openai/qwen3.6-35b-a3b
api_base: https://router.fabryka.ai/v1
api_key: os.environ/FABRYKA_KEY> Note: OpenRouter's *hosted* service does not let end users register arbitrary
> upstreams. "As a provider" here means: use fabryka wherever you'd point at
> OpenRouter, or register it as an OpenAI-compatible upstream in your own router
> (LiteLLM, a custom gateway, etc.).
5. Streaming
Standard OpenAI SSE streaming is supported — set "stream": true. Usage is
reported in a final chunk and billed automatically:
curl https://router.fabryka.ai/v1/chat/completions \
-H "Authorization: Bearer $FABRYKA_KEY" -H "Content-Type: application/json" \
-d '{"model":"qwen3.6-35b-a3b","stream":true,
"messages":[{"role":"user","content":"Count to 5."}]}'6. Reasoning mode (important for this model)
qwen3.6-35b-a3b is a reasoning model. By default it "thinks" first:
- The thinking trace comes back in
message.reasoning_content, and the final
answer in message.content.
- Give it enough room — use a generous
max_tokens(e.g. 1024+) so it can finish
thinking *and* answer. With a small budget, all tokens go to reasoning and
content will be empty.
Want fast, direct answers (no thinking)? Disable it:
{
"model": "qwen3.6-35b-a3b",
"messages": [{"role":"user","content":"Say hello in 5 words."}],
"chat_template_kwargs": {"enable_thinking": false}
}7. Limits & errors
Endpoints: POST /v1/chat/completions, GET /v1/models, GET /v1/credits.