Gateway

The Hippocortex Gateway is an OpenAI-compatible proxy that adds persistent memory to any LLM call. Instead of changing your code, you change one URL. This is the recommended integration path for most users.

The Gateway runs the full memory pipeline on every request:

Synthesize — retrieves relevant context using semantic search, graph retrieval, collective brain, and behavioral context
Inject — adds structured knowledge to the system prompt
Proxy — forwards the request to your LLM provider
Capture — records the conversation for future learning
Vault — automatically detects and encrypts secrets in captured events
Learn — auto-compiles knowledge artifacts in the background

Two minutes, no SDK, no code changes.

Reliability

~99% with graceful fallback. If the memory layer is temporarily unavailable, the Gateway falls back to proxying the request directly to your LLM provider without memory enrichment. Your application never breaks.

Endpoint

POST https://api.hippocortex.dev/v1/chat/completions

Supports both streaming (stream: true) and non-streaming responses.

Authentication

Pass your Hippocortex API key as the Bearer token. Use X-LLM-* headers for your LLM provider credentials.

Header	Required	Description
`Authorization`	Yes	`Bearer hx_live_...` (your Hippocortex API key)
`X-LLM-API-Key`	Yes	Your LLM provider's API key (e.g., `sk-...` for OpenAI)
`X-LLM-Base-URL`	No	The LLM provider's base URL. Defaults to `https://api.openai.com`
`X-LLM-Model`	No	Override the model in the request body
`X-Hippocortex-Session`	No	Custom session ID for grouping related conversations

Quick Start

Change your base_url and add headers. That is the entire integration.

Python (OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "sk-...",   # Your OpenAI key
    },
)

# Use exactly as before. Every call now has memory.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy payments to staging"}],
)

TypeScript (OpenAI SDK):

import OpenAI from 'openai'

const client = new OpenAI({
  baseURL: 'https://api.hippocortex.dev/v1',
  apiKey: 'hx_live_...',
  defaultHeaders: {
    'X-LLM-API-Key': 'sk-...',
  },
})

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Deploy payments to staging' }],
})

Supported Providers

The Gateway works with any LLM provider that exposes an OpenAI-compatible chat completions endpoint. Set the X-LLM-Base-URL header to point at your provider.

Provider	X-LLM-Base-URL	X-LLM-API-Key
OpenAI	`https://api.openai.com` (default)	`sk-...`
Anthropic	`https://api.anthropic.com`	`sk-ant-...`
Google Gemini	`https://generativelanguage.googleapis.com/v1beta/openai`	Your Google AI key
Groq	`https://api.groq.com/openai`	`gsk_...`
Together	`https://api.together.xyz`	`tog_...`
Mistral	`https://api.mistral.ai/v1`	Your Mistral key
Fireworks	`https://api.fireworks.ai/inference/v1`	Your Fireworks key
Ollama (local)	`http://localhost:11434`	`unused`
Any OpenAI-compatible	Your provider's base URL	Your provider's key

Provider Examples

Anthropic:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "sk-ant-...",
        "X-LLM-Base-URL": "https://api.anthropic.com",
    },
)

Google Gemini:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "your-google-ai-key",
        "X-LLM-Base-URL": "https://generativelanguage.googleapis.com/v1beta/openai",
    },
)

Groq:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "gsk_...",
        "X-LLM-Base-URL": "https://api.groq.com/openai",
    },
)

Together:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "tog_...",
        "X-LLM-Base-URL": "https://api.together.xyz",
    },
)

Mistral:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "your-mistral-key",
        "X-LLM-Base-URL": "https://api.mistral.ai/v1",
    },
)

Fireworks:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "your-fireworks-key",
        "X-LLM-Base-URL": "https://api.fireworks.ai/inference/v1",
    },
)

Ollama (local):

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "unused",
        "X-LLM-Base-URL": "http://localhost:11434",
    },
)

Sessions

By default, the Gateway generates a session ID per request. To group related conversations, pass the X-Hippocortex-Session header:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "sk-...",
        "X-Hippocortex-Session": "chat-thread-42",
    },
)

Events within the same session are grouped during compilation, which helps the Memory Compiler find patterns across related interactions.

Streaming

The Gateway fully supports streaming responses. Pass stream: true in your request body and tokens stream back as they arrive from the upstream provider.

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain the deploy process"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

The full response is captured after the stream completes.

Auto-Compile

You do not need to trigger compilation manually. The pipeline automatically compiles after every 10 captured events, with a 5-minute sweep to catch stragglers. Knowledge artifacts are continuously updated as new experience accumulates.

Self-Hosted

If you run a self-hosted Hippocortex server, the Gateway is available at your server's address:

client = OpenAI(
    base_url="http://localhost:3100/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "sk-...",
    },
)

When to Use the Gateway vs the SDK

Use the Gateway when:

You want the simplest possible integration (change one URL)
You want automatic capture and compile with no code changes
You are using a language or framework without a Hippocortex SDK
You want to add memory to an existing application without modifying it

Use the SDK when:

You need fine-grained control over what gets captured
You want to capture non-LLM events (file edits, commands, custom actions)
You want to call synthesize() separately from LLM calls
You need to work offline or in an air-gapped environment

Both approaches produce the same memory artifacts and can be used together.