HTTP API Reference
Base URL
All endpoints are served from the Arfniia Router instance, typically http://<host>:5525
.
The API is split into control plane endpoints for router configuration and runtime endpoints for inference, learning, and observability.
Control Plane
POST /v1/routers
- Creates a router definition. Payload must satisfy the
ArfniiaRouter
schema: routername
, at least one entry inbase_models
orprovisioned_throughputs
, anembedding
model,feedback
configuration,feedback_cost_weights
that sum to 1, and optionaltraining
overrides. - Returns the stored router definition or
409
if the name already exists.
{ "name": "advanced-reasoning", "base_models": [ "anthropic.claude-3-5-haiku-20241022-v1:0", "us.anthropic.claude-sonnet-4-20250514-v1:0" ], "embedding": "amazon.titan-embed-text-v2:0", "training": { "num_of_steps": 5, "batch_size": 16, "context_cache_similarity": 0.95, "exploration_level": "low" }, "feedback": { "goal": "max", "min_value": 0, "max_value": 1 }, "feedback_cost_weights": [1.0, 0.0]}
GET /v1/routers/{name}
- Fetches the stored router definition. Returns
404
if the router is missing.
PATCH /v1/routers/{name}
- Partially updates an existing router. Any supplied fields are merged with the current definition; unspecified fields keep their existing values. Name changes are rejected with
400
. - Returns the updated definition and refreshes in-memory caches.
DELETE /v1/routers/{name}
- Removes a router and clears associated caches. Returns
204
on success or404
if not found.
Runtime Inference
POST /v1/chat/completions
- Accepts the standard OpenAI-compatible chat completions payload. When
model
equals a router name, the router embeds the request, selects a base model (or provisioned throughput), invokes Bedrock, and schedules learning. - When
model
matches a Bedrock model ID instead of a router name, the call is proxied to Bedrock without learning. - Optional request headers:
X-Arfniia-Disable-Learning
: truthy value (true
,1
,yes
,on
) skips learning for this call.X-Arfniia-Episode-Id
,X-Arfniia-Episode-Start
,X-Arfniia-Episode-End
: mark episodic rollouts; omit when each prompt is independent.X-Arfniia-Feature-*
: attach runtime features (see custom features). Headers are case-insensitive; values are parsed as floats, booleans, or categorical tokens.
- Response mirrors the upstream LLM payload (
id
,choices
,usage
, etc.).500
is returned if routing or downstream inference fails.
GET /v1/routers/{router_name}/explanations/{response_id}
- Retrieves the explanation blob saved for the given router response (
chosen_model
, Q-value deltas, cache membership). Returns404
when the response has no stored explanation.
Feedback APIs
PUT /v1/feedbacks/{router_name}/sparse/{feedback_value}
- Stores delayed or aggregated KPI feedback (e.g., conversion rate). The latest value is mapped to
router_name/sparse
and mirrored under an aggregate key.
PUT /v1/feedbacks/{router_name}/{feedback_name}/{feedback_value}
- Records immediate feedback keyed by
feedback_name
(commonly the responseid
). Use this endpoint to reward or penalize individual responses.
GET /v1/feedbacks/{router_name}
- Returns the most recent feedback bundle for the router, combining sparse and per-response entries. Responds with
404
if no feedback has been recorded.
Observability
GET /metrics
- Exposes Prometheus metrics for router latency, downstream LLM usage, token counts, exploration rates, and learning statistics.
Error Handling
400
: validation error (missingmodel
, unsupported Bedrock identifier, weight sum mismatch).404
: router or feedback record not found.409
: router name conflict on create.500
: unexpected runtime or downstream provider failure.
Authentication
Arfniia Router relies on the surrounding network perimeter. If you need authentication, terminate TLS and enforce headers at your ingress; the router itself does not ship with built-in auth.