Nolma Guard

Guard gives you complete visibility into what your AI agents cost — and lets you stop runaway spend before it becomes a bill.

Key features

Real-time tracking

Costs update within 500ms of every LLM call. Not daily. Real-time.

Hard enforcement

HTTP 429 fires before the LLM call. Zero cost incurred when limit is hit.

Multi-provider

OpenAI, Anthropic, Gemini, Groq, and Mistral — all in one dashboard.

Per-agent tracking

See cost by agent, model, user, environment, and session.

What gets tracked

Every LLM call through the gateway creates a span with:

Field	Description
Agent name	From `NM-Agent` header
Model	The LLM model used
Provider	openai / anthropic / gemini / groq / mistral
Input tokens	Prompt token count
Output tokens	Completion token count
Cost (USD)	Calculated from token counts
Latency	End-to-end response time
Session ID	Groups related calls

Dashboard pages

Guard → Overview — Total cost today, burn rate, projected month end, top agents, spend over time chart. Guard → Sessions — Every session with cost, token count, span count, and status. Click any session to see the full span tree. Guard → Budgets — Create and manage budget rules. See real-time utilization from Redis. Guard → Alerts — Alert history with severity, type, and reviewed status.

Node.js SDK Budget Rules

​Nolma Guard

​Key features