Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nolma.ai/llms.txt

Use this file to discover all available pages before exploring further.

What is Nolma?

Nolma is an AI gateway platform that sits between your agents and every LLM provider. One URL change gives you:
  • Real-time cost tracking per agent, model, user, and session
  • Hard budget enforcement that stops spend before the LLM call fires
  • Prompt intelligence — track what users do with AI outputs and get recommendations to cut costs

Quickstart

Integrate in 5 minutes

Python SDK

pip install nolma

Node.js SDK

npm install @nolma/node

API Reference

Full API docs

How it works

Your agent code
      |
Change base_url to gateway.nolma.ai
      |
Every LLM call tracked automatically
      |
Dashboard shows real-time costs
Budget enforcement fires if needed

Supported providers

ProviderURL prefixModels
OpenAI/openaigpt-4o, gpt-4o-mini
Anthropic/anthropicclaude-sonnet-4-6, claude-haiku-4-5
Google Gemini/geminigemini-2.5-pro, gemini-2.0-flash
Groq/groqllama-3.3-70b
Mistral/mistralmistral-small-latest

The three products

Guard — Cost control

Real-time cost dashboard. Hard budget enforcement that blocks before the LLM call fires. Zero cost incurred when limits hit.

Lens — Prompt intelligence

Track what users do with AI outputs. Acceptance rates, edit distances, retry patterns. Evidence-backed recommendations to cut costs.

Shield — Security (coming soon)

Injection detection, PII redaction, compliance audit trail. Block threats before they reach the LLM.