HTTP API Reference

Base URL

All endpoints are served from the Arfniia Router instance, typically http://<host>:5525.
The API is split into control plane endpoints for router configuration and runtime endpoints for inference, learning, and observability.

Control Plane

`POST /v1/routers`

Creates a router definition. Payload must satisfy the ArfniiaRouter schema: router name, at least one entry in base_models or provisioned_throughputs, an embedding model, feedback configuration, feedback_cost_weights that sum to 1, and optional training overrides.
Returns the stored router definition or 409 if the name already exists.

{
  "name": "advanced-reasoning",
  "base_models": [
    "anthropic.claude-3-5-haiku-20241022-v1:0",
    "us.anthropic.claude-sonnet-4-20250514-v1:0"
  ],
  "embedding": "amazon.titan-embed-text-v2:0",
  "training": {
    "num_of_steps": 5,
    "batch_size": 16,
    "context_cache_similarity": 0.95,
    "exploration_level": "low"
  },
  "feedback": {
    "goal": "max",
    "min_value": 0,
    "max_value": 1
  },
  "feedback_cost_weights": [1.0, 0.0]
}

`GET /v1/routers/{name}`

Fetches the stored router definition. Returns 404 if the router is missing.

`PATCH /v1/routers/{name}`

Partially updates an existing router. Any supplied fields are merged with the current definition; unspecified fields keep their existing values. Name changes are rejected with 400.
Returns the updated definition and refreshes in-memory caches.

`DELETE /v1/routers/{name}`

Removes a router and clears associated caches. Returns 204 on success or 404 if not found.

Runtime Inference

`POST /v1/chat/completions`

Accepts the standard OpenAI-compatible chat completions payload. When model equals a router name, the router embeds the request, selects a base model (or provisioned throughput), invokes Bedrock, and schedules learning.
When model matches a Bedrock model ID instead of a router name, the call is proxied to Bedrock without learning.
Optional request headers:
- X-Arfniia-Disable-Learning: truthy value (true, 1, yes, on) skips learning for this call.
- X-Arfniia-Episode-Id, X-Arfniia-Episode-Start, X-Arfniia-Episode-End: mark episodic rollouts; omit when each prompt is independent.
- X-Arfniia-Feature-*: attach runtime features (see custom features). Headers are case-insensitive; values are parsed as floats, booleans, or categorical tokens.
Response mirrors the upstream LLM payload (id, choices, usage, etc.). 500 is returned if routing or downstream inference fails.

`GET /v1/routers/{router_name}/explanations/{response_id}`

Retrieves the explanation blob saved for the given router response (chosen_model, Q-value deltas, cache membership). Returns 404 when the response has no stored explanation.

Feedback APIs

`PUT /v1/feedbacks/{router_name}/sparse/{feedback_value}`

Stores delayed or aggregated KPI feedback (e.g., conversion rate). The latest value is mapped to router_name/sparse and mirrored under an aggregate key.

`PUT /v1/feedbacks/{router_name}/{feedback_name}/{feedback_value}`

Records immediate feedback keyed by feedback_name (commonly the response id). Use this endpoint to reward or penalize individual responses.

`GET /v1/feedbacks/{router_name}`

Returns the most recent feedback bundle for the router, combining sparse and per-response entries. Responds with 404 if no feedback has been recorded.

Observability

`GET /metrics`

Exposes Prometheus metrics for router latency, downstream LLM usage, token counts, exploration rates, and learning statistics.

Error Handling

400: validation error (missing model, unsupported Bedrock identifier, weight sum mismatch).
404: router or feedback record not found.
409: router name conflict on create.
500: unexpected runtime or downstream provider failure.

Authentication

Arfniia Router relies on the surrounding network perimeter. If you need authentication, terminate TLS and enforce headers at your ingress; the router itself does not ship with built-in auth.