HTTP API Reference
Base URL
All endpoints are served from the Arfniia Router instance, typically http://<host>:5525.
The API is split into control plane endpoints for router configuration and runtime endpoints for inference, learning, and observability.
Control Plane
POST /v1/routers
- Creates a router definition. Payload must satisfy the
ArfniiaRouterschema: routername, at least one entry inbase_modelsorprovisioned_throughputs, anembeddingmodel,feedbackconfiguration,feedback_cost_weightsthat sum to 1, and optionaltrainingoverrides. - Returns the stored router definition or
409if the name already exists.
{ "name": "advanced-reasoning", "base_models": [ "anthropic.claude-3-5-haiku-20241022-v1:0", "us.anthropic.claude-sonnet-4-20250514-v1:0" ], "embedding": "amazon.titan-embed-text-v2:0", "training": { "num_of_steps": 5, "batch_size": 16, "context_cache_similarity": 0.95, "exploration_level": "low" }, "feedback": { "goal": "max", "min_value": 0, "max_value": 1 }, "feedback_cost_weights": [1.0, 0.0]}GET /v1/routers/{name}
- Fetches the stored router definition. Returns
404if the router is missing.
PATCH /v1/routers/{name}
- Partially updates an existing router. Any supplied fields are merged with the current definition; unspecified fields keep their existing values. Name changes are rejected with
400. - Returns the updated definition and refreshes in-memory caches.
DELETE /v1/routers/{name}
- Removes a router and clears associated caches. Returns
204on success or404if not found.
Runtime Inference
POST /v1/chat/completions
- Accepts the standard OpenAI-compatible chat completions payload. When
modelequals a router name, the router embeds the request, selects a base model (or provisioned throughput), invokes Bedrock, and schedules learning. - When
modelmatches a Bedrock model ID instead of a router name, the call is proxied to Bedrock without learning. - Optional request headers:
X-Arfniia-Disable-Learning: truthy value (true,1,yes,on) skips learning for this call.X-Arfniia-Episode-Id,X-Arfniia-Episode-Start,X-Arfniia-Episode-End: mark episodic rollouts; omit when each prompt is independent.X-Arfniia-Feature-*: attach runtime features (see custom features). Headers are case-insensitive; values are parsed as floats, booleans, or categorical tokens.
- Response mirrors the upstream LLM payload (
id,choices,usage, etc.).500is returned if routing or downstream inference fails.
GET /v1/routers/{router_name}/explanations/{response_id}
- Retrieves the explanation blob saved for the given router response (
chosen_model, Q-value deltas, cache membership). Returns404when the response has no stored explanation.
Feedback APIs
PUT /v1/feedbacks/{router_name}/sparse/{feedback_value}
- Stores delayed or aggregated KPI feedback (e.g., conversion rate). The latest value is mapped to
router_name/sparseand mirrored under an aggregate key.
PUT /v1/feedbacks/{router_name}/{feedback_name}/{feedback_value}
- Records immediate feedback keyed by
feedback_name(commonly the responseid). Use this endpoint to reward or penalize individual responses.
GET /v1/feedbacks/{router_name}
- Returns the most recent feedback bundle for the router, combining sparse and per-response entries. Responds with
404if no feedback has been recorded.
Observability
GET /metrics
- Exposes Prometheus metrics for router latency, downstream LLM usage, token counts, exploration rates, and learning statistics.
Error Handling
400: validation error (missingmodel, unsupported Bedrock identifier, weight sum mismatch).404: router or feedback record not found.409: router name conflict on create.500: unexpected runtime or downstream provider failure.
Authentication
Arfniia Router relies on the surrounding network perimeter. If you need authentication, terminate TLS and enforce headers at your ingress; the router itself does not ship with built-in auth.