Gateway
The Hippocortex Gateway is an OpenAI-compatible proxy that adds persistent memory to any LLM call. Instead of changing your code, you change one URL. This is the recommended integration path for most users.
The Gateway runs the full memory pipeline on every request:
- Synthesize — retrieves relevant context using semantic search, graph retrieval, collective brain, and behavioral context
- Inject — adds structured knowledge to the system prompt
- Proxy — forwards the request to your LLM provider
- Capture — records the conversation for future learning
- Vault — automatically detects and encrypts secrets in captured events
- Learn — auto-compiles knowledge artifacts in the background
Two minutes, no SDK, no code changes.
Reliability
~99% with graceful fallback. If the memory layer is temporarily unavailable, the Gateway falls back to proxying the request directly to your LLM provider without memory enrichment. Your application never breaks.
Endpoint
POST https://api.hippocortex.dev/v1/chat/completions
Supports both streaming (stream: true) and non-streaming responses.
Authentication
Pass your Hippocortex API key as the Bearer token. Use X-LLM-* headers for your LLM provider credentials.
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer hx_live_... (your Hippocortex API key) |
X-LLM-API-Key | Yes | Your LLM provider's API key (e.g., sk-... for OpenAI) |
X-LLM-Base-URL | No | The LLM provider's base URL. Defaults to https://api.openai.com |
X-LLM-Model | No | Override the model in the request body |
X-Hippocortex-Session | No | Custom session ID for grouping related conversations |
Quick Start
Change your base_url and add headers. That is the entire integration.
Python (OpenAI SDK):
from openai import OpenAI
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "sk-...", # Your OpenAI key
},
)
# Use exactly as before. Every call now has memory.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Deploy payments to staging"}],
)
TypeScript (OpenAI SDK):
import OpenAI from 'openai'
const client = new OpenAI({
baseURL: 'https://api.hippocortex.dev/v1',
apiKey: 'hx_live_...',
defaultHeaders: {
'X-LLM-API-Key': 'sk-...',
},
})
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Deploy payments to staging' }],
})
Supported Providers
The Gateway works with any LLM provider that exposes an OpenAI-compatible chat completions endpoint. Set the X-LLM-Base-URL header to point at your provider.
| Provider | X-LLM-Base-URL | X-LLM-API-Key |
|---|---|---|
| OpenAI | https://api.openai.com (default) | sk-... |
| Anthropic | https://api.anthropic.com | sk-ant-... |
| Google Gemini | https://generativelanguage.googleapis.com/v1beta/openai | Your Google AI key |
| Groq | https://api.groq.com/openai | gsk_... |
| Together | https://api.together.xyz | tog_... |
| Mistral | https://api.mistral.ai/v1 | Your Mistral key |
| Fireworks | https://api.fireworks.ai/inference/v1 | Your Fireworks key |
| Ollama (local) | http://localhost:11434 | unused |
| Any OpenAI-compatible | Your provider's base URL | Your provider's key |
Provider Examples
Anthropic:
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "sk-ant-...",
"X-LLM-Base-URL": "https://api.anthropic.com",
},
)
Google Gemini:
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "your-google-ai-key",
"X-LLM-Base-URL": "https://generativelanguage.googleapis.com/v1beta/openai",
},
)
Groq:
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "gsk_...",
"X-LLM-Base-URL": "https://api.groq.com/openai",
},
)
Together:
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "tog_...",
"X-LLM-Base-URL": "https://api.together.xyz",
},
)
Mistral:
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "your-mistral-key",
"X-LLM-Base-URL": "https://api.mistral.ai/v1",
},
)
Fireworks:
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "your-fireworks-key",
"X-LLM-Base-URL": "https://api.fireworks.ai/inference/v1",
},
)
Ollama (local):
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "unused",
"X-LLM-Base-URL": "http://localhost:11434",
},
)
Sessions
By default, the Gateway generates a session ID per request. To group related conversations, pass the X-Hippocortex-Session header:
client = OpenAI(
base_url="https://api.hippocortex.dev/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "sk-...",
"X-Hippocortex-Session": "chat-thread-42",
},
)
Events within the same session are grouped during compilation, which helps the Memory Compiler find patterns across related interactions.
Streaming
The Gateway fully supports streaming responses. Pass stream: true in your request body and tokens stream back as they arrive from the upstream provider.
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain the deploy process"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
The full response is captured after the stream completes.
Auto-Compile
You do not need to trigger compilation manually. The pipeline automatically compiles after every 10 captured events, with a 5-minute sweep to catch stragglers. Knowledge artifacts are continuously updated as new experience accumulates.
Self-Hosted
If you run a self-hosted Hippocortex server, the Gateway is available at your server's address:
client = OpenAI(
base_url="http://localhost:3100/v1",
api_key="hx_live_...",
default_headers={
"X-LLM-API-Key": "sk-...",
},
)
When to Use the Gateway vs the SDK
Use the Gateway when:
- You want the simplest possible integration (change one URL)
- You want automatic capture and compile with no code changes
- You are using a language or framework without a Hippocortex SDK
- You want to add memory to an existing application without modifying it
Use the SDK when:
- You need fine-grained control over what gets captured
- You want to capture non-LLM events (file edits, commands, custom actions)
- You want to call
synthesize()separately from LLM calls - You need to work offline or in an air-gapped environment
Both approaches produce the same memory artifacts and can be used together.