# Connecting to `router.fabryka.ai` `router.fabryka.ai` is an **OpenAI-compatible** LLM API. If a tool can talk to OpenAI or OpenRouter, it can talk to fabryka — you only ever change three things: | Setting | Value | |---|---| | **Base URL** | `https://router.fabryka.ai/v1` | | **API key** | `sk-fab-...` (get one free at — $100 credit) | | **Model** | `qwen3.6-35b-a3b` | Pricing: **$0.20 / 1M input tokens, $0.60 / 1M output tokens.** Check your balance any time: `GET /v1/credits` (or paste your key at `/account`). --- ## 1. Quick check — curl ```bash curl https://router.fabryka.ai/v1/chat/completions \ -H "Authorization: Bearer $FABRYKA_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3.6-35b-a3b", "messages": [{"role":"user","content":"Say hello in 5 words."}] }' ``` Balance: ```bash curl https://router.fabryka.ai/v1/credits -H "Authorization: Bearer $FABRYKA_KEY" ``` --- ## 2. OpenAI SDKs (drop-in) **Python** ```python from openai import OpenAI client = OpenAI(base_url="https://router.fabryka.ai/v1", api_key="sk-fab-...") r = client.chat.completions.create( model="qwen3.6-35b-a3b", messages=[{"role": "user", "content": "Hello!"}], ) print(r.choices[0].message.content) ``` **JavaScript / TypeScript** ```ts import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://router.fabryka.ai/v1", apiKey: process.env.FABRYKA_KEY, }); const r = await client.chat.completions.create({ model: "qwen3.6-35b-a3b", messages: [{ role: "user", content: "Hello!" }], }); console.log(r.choices[0].message.content); ``` --- ## 3. Hermes Hermes reads its model provider from **`~/.hermes/config.yaml`**. Point the `model:` block at fabryka — this is the whole integration: ```yaml # ~/.hermes/config.yaml model: default: qwen3.6-35b-a3b provider: custom:fabryka base_url: https://router.fabryka.ai/v1 api_mode: chat_completions api_key: sk-fab-YOUR_KEY # get one free at https://router.fabryka.ai ``` Then restart Hermes so it picks up the change: ```bash # if Hermes runs as a systemd service: sudo systemctl restart hermes-sdk-server # otherwise restart however you launched `hermes` ``` Verify it's wired (call your local Hermes SDK; it should answer via fabryka): ```bash curl http://127.0.0.1:8800/v1/chat/completions \ -H "Authorization: Bearer $YOUR_LOCAL_HERMES_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"qwen3.6-35b-a3b","messages":[{"role":"user","content":"reply: online"}]}' ``` **Notes** - Change only the **LLM** `api_key` (the upstream model provider) to your `sk-fab-...`. The key your *app* uses to call the local Hermes SDK is separate and stays the same. - `qwen3.6-35b-a3b` is a reasoning model. For fast tool-loop steps you can have Hermes send `chat_template_kwargs: {enable_thinking: false}` (see §6). - One model, single-GPU backend → keep concurrency at **1** (the gateway returns `503` if a second request lands mid-generation). --- ## 4. OpenRouter-style usage `router.fabryka.ai` speaks the same protocol as OpenRouter, so anywhere a tool expects OpenRouter you can swap the base URL: ```diff - OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 + OPENROUTER_BASE_URL=https://router.fabryka.ai/v1 model: qwen3.6-35b-a3b ``` **Add fabryka as a provider in a self-hosted aggregator (LiteLLM example):** ```yaml # litellm config.yaml model_list: - model_name: fabryka/qwen3.6-35b-a3b litellm_params: model: openai/qwen3.6-35b-a3b api_base: https://router.fabryka.ai/v1 api_key: os.environ/FABRYKA_KEY ``` > Note: OpenRouter's *hosted* service does not let end users register arbitrary > upstreams. "As a provider" here means: use fabryka wherever you'd point at > OpenRouter, or register it as an OpenAI-compatible upstream in your own router > (LiteLLM, a custom gateway, etc.). --- ## 5. Streaming Standard OpenAI SSE streaming is supported — set `"stream": true`. Usage is reported in a final chunk and billed automatically: ```bash curl https://router.fabryka.ai/v1/chat/completions \ -H "Authorization: Bearer $FABRYKA_KEY" -H "Content-Type: application/json" \ -d '{"model":"qwen3.6-35b-a3b","stream":true, "messages":[{"role":"user","content":"Count to 5."}]}' ``` --- ## 6. Reasoning mode (important for this model) `qwen3.6-35b-a3b` is a **reasoning model**. By default it "thinks" first: - The thinking trace comes back in **`message.reasoning_content`**, and the final answer in **`message.content`**. - Give it enough room — use a generous `max_tokens` (e.g. 1024+) so it can finish thinking *and* answer. With a small budget, all tokens go to reasoning and `content` will be empty. **Want fast, direct answers (no thinking)?** Disable it: ```json { "model": "qwen3.6-35b-a3b", "messages": [{"role":"user","content":"Say hello in 5 words."}], "chat_template_kwargs": {"enable_thinking": false} } ``` --- ## 7. Limits & errors | Code | Meaning | |---|---| | `401` | Missing/invalid API key | | `402` | Credit exhausted (your $100 ran out) | | `429` | Rate limit (max 30 requests/min per key) | | `503` | Backend busy — another request is in flight (single GPU). Retry shortly. | Endpoints: `POST /v1/chat/completions`, `GET /v1/models`, `GET /v1/credits`.