Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nolma.ai/llms.txt

Use this file to discover all available pages before exploring further.

Nolma Guard

Guard gives you complete visibility into what your AI agents cost — and lets you stop runaway spend before it becomes a bill.

Key features

Real-time tracking

Costs update within 500ms of every LLM call. Not daily. Real-time.

Hard enforcement

HTTP 429 fires before the LLM call. Zero cost incurred when limit is hit.

Multi-provider

OpenAI, Anthropic, Gemini, Groq, and Mistral — all in one dashboard.

Per-agent tracking

See cost by agent, model, user, environment, and session.

What gets tracked

Every LLM call through the gateway creates a span with:
FieldDescription
Agent nameFrom NM-Agent header
ModelThe LLM model used
Provideropenai / anthropic / gemini / groq / mistral
Input tokensPrompt token count
Output tokensCompletion token count
Cost (USD)Calculated from token counts
LatencyEnd-to-end response time
Session IDGroups related calls

Dashboard pages

Guard → Overview — Total cost today, burn rate, projected month end, top agents, spend over time chart. Guard → Sessions — Every session with cost, token count, span count, and status. Click any session to see the full span tree. Guard → Budgets — Create and manage budget rules. See real-time utilization from Redis. Guard → Alerts — Alert history with severity, type, and reviewed status.