# Connecting to `router.fabryka.ai`

`router.fabryka.ai` is an **OpenAI-compatible** LLM API. If a tool can talk to
OpenAI or OpenRouter, it can talk to fabryka — you only ever change three things:

| Setting | Value |
|---|---|
| **Base URL** | `https://router.fabryka.ai/v1` |
| **API key** | `sk-fab-...` (get one free at <https://router.fabryka.ai> — $100 credit) |
| **Model** | `qwen3.6-35b-a3b` |

Pricing: **$0.20 / 1M input tokens, $0.60 / 1M output tokens.** Check your balance
any time: `GET /v1/credits` (or paste your key at `/account`).

---

## 1. Quick check — curl

```bash
curl https://router.fabryka.ai/v1/chat/completions \
  -H "Authorization: Bearer $FABRYKA_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.6-35b-a3b",
    "messages": [{"role":"user","content":"Say hello in 5 words."}]
  }'
```

Balance:
```bash
curl https://router.fabryka.ai/v1/credits -H "Authorization: Bearer $FABRYKA_KEY"
```

---

## 2. OpenAI SDKs (drop-in)

**Python**
```python
from openai import OpenAI
client = OpenAI(base_url="https://router.fabryka.ai/v1",
                api_key="sk-fab-...")
r = client.chat.completions.create(
    model="qwen3.6-35b-a3b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(r.choices[0].message.content)
```

**JavaScript / TypeScript**
```ts
import OpenAI from "openai";
const client = new OpenAI({
  baseURL: "https://router.fabryka.ai/v1",
  apiKey: process.env.FABRYKA_KEY,
});
const r = await client.chat.completions.create({
  model: "qwen3.6-35b-a3b",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(r.choices[0].message.content);
```

---

## 3. Hermes

Hermes reads its model provider from **`~/.hermes/config.yaml`**. Point the
`model:` block at fabryka — this is the whole integration:

```yaml
# ~/.hermes/config.yaml
model:
  default: qwen3.6-35b-a3b
  provider: custom:fabryka
  base_url: https://router.fabryka.ai/v1
  api_mode: chat_completions
  api_key: sk-fab-YOUR_KEY        # get one free at https://router.fabryka.ai
```

Then restart Hermes so it picks up the change:

```bash
# if Hermes runs as a systemd service:
sudo systemctl restart hermes-sdk-server
# otherwise restart however you launched `hermes`
```

Verify it's wired (call your local Hermes SDK; it should answer via fabryka):

```bash
curl http://127.0.0.1:8800/v1/chat/completions \
  -H "Authorization: Bearer $YOUR_LOCAL_HERMES_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.6-35b-a3b","messages":[{"role":"user","content":"reply: online"}]}'
```

**Notes**
- Change only the **LLM** `api_key` (the upstream model provider) to your
  `sk-fab-...`. The key your *app* uses to call the local Hermes SDK is separate
  and stays the same.
- `qwen3.6-35b-a3b` is a reasoning model. For fast tool-loop steps you can have
  Hermes send `chat_template_kwargs: {enable_thinking: false}` (see §6).
- One model, single-GPU backend → keep concurrency at **1** (the gateway returns
  `503` if a second request lands mid-generation).

---

## 4. OpenRouter-style usage

`router.fabryka.ai` speaks the same protocol as OpenRouter, so anywhere a tool
expects OpenRouter you can swap the base URL:

```diff
- OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
+ OPENROUTER_BASE_URL=https://router.fabryka.ai/v1
  model: qwen3.6-35b-a3b
```

**Add fabryka as a provider in a self-hosted aggregator (LiteLLM example):**
```yaml
# litellm config.yaml
model_list:
  - model_name: fabryka/qwen3.6-35b-a3b
    litellm_params:
      model: openai/qwen3.6-35b-a3b
      api_base: https://router.fabryka.ai/v1
      api_key: os.environ/FABRYKA_KEY
```

> Note: OpenRouter's *hosted* service does not let end users register arbitrary
> upstreams. "As a provider" here means: use fabryka wherever you'd point at
> OpenRouter, or register it as an OpenAI-compatible upstream in your own router
> (LiteLLM, a custom gateway, etc.).

---

## 5. Streaming

Standard OpenAI SSE streaming is supported — set `"stream": true`. Usage is
reported in a final chunk and billed automatically:

```bash
curl https://router.fabryka.ai/v1/chat/completions \
  -H "Authorization: Bearer $FABRYKA_KEY" -H "Content-Type: application/json" \
  -d '{"model":"qwen3.6-35b-a3b","stream":true,
       "messages":[{"role":"user","content":"Count to 5."}]}'
```

---

## 6. Reasoning mode (important for this model)

`qwen3.6-35b-a3b` is a **reasoning model**. By default it "thinks" first:

- The thinking trace comes back in **`message.reasoning_content`**, and the final
  answer in **`message.content`**.
- Give it enough room — use a generous `max_tokens` (e.g. 1024+) so it can finish
  thinking *and* answer. With a small budget, all tokens go to reasoning and
  `content` will be empty.

**Want fast, direct answers (no thinking)?** Disable it:
```json
{
  "model": "qwen3.6-35b-a3b",
  "messages": [{"role":"user","content":"Say hello in 5 words."}],
  "chat_template_kwargs": {"enable_thinking": false}
}
```

---

## 7. Limits & errors

| Code | Meaning |
|---|---|
| `401` | Missing/invalid API key |
| `402` | Credit exhausted (your $100 ran out) |
| `429` | Rate limit (max 30 requests/min per key) |
| `503` | Backend busy — another request is in flight (single GPU). Retry shortly. |

Endpoints: `POST /v1/chat/completions`, `GET /v1/models`, `GET /v1/credits`.