Documentation Index
Fetch the complete documentation index at: https://docs.nolma.ai/llms.txt
Use this file to discover all available pages before exploring further.
Nolma Guard
Guard gives you complete visibility into what your AI agents cost — and lets you stop runaway spend before it becomes a bill.Key features
Real-time tracking
Costs update within 500ms of every LLM call. Not daily. Real-time.
Hard enforcement
HTTP 429 fires before the LLM call. Zero cost incurred when limit is hit.
Multi-provider
OpenAI, Anthropic, Gemini, Groq, and Mistral — all in one dashboard.
Per-agent tracking
See cost by agent, model, user, environment, and session.
What gets tracked
Every LLM call through the gateway creates a span with:| Field | Description |
|---|---|
| Agent name | From NM-Agent header |
| Model | The LLM model used |
| Provider | openai / anthropic / gemini / groq / mistral |
| Input tokens | Prompt token count |
| Output tokens | Completion token count |
| Cost (USD) | Calculated from token counts |
| Latency | End-to-end response time |
| Session ID | Groups related calls |