Gateway

The Hippocortex Gateway is an OpenAI-compatible proxy that adds persistent memory to any LLM call. Instead of changing your code, you change one URL. This is the recommended integration path for most users.

The Gateway runs the full memory pipeline on every request:

  1. Synthesize — retrieves relevant context using semantic search, graph retrieval, collective brain, and behavioral context
  2. Inject — adds structured knowledge to the system prompt
  3. Proxy — forwards the request to your LLM provider
  4. Capture — records the conversation for future learning
  5. Vault — automatically detects and encrypts secrets in captured events
  6. Learn — auto-compiles knowledge artifacts in the background

Two minutes, no SDK, no code changes.

Reliability

~99% with graceful fallback. If the memory layer is temporarily unavailable, the Gateway falls back to proxying the request directly to your LLM provider without memory enrichment. Your application never breaks.

Endpoint

POST https://api.hippocortex.dev/v1/chat/completions

Supports both streaming (stream: true) and non-streaming responses.

Authentication

Pass your Hippocortex API key as the Bearer token. Use X-LLM-* headers for your LLM provider credentials.

HeaderRequiredDescription
AuthorizationYesBearer hx_live_... (your Hippocortex API key)
X-LLM-API-KeyYesYour LLM provider's API key (e.g., sk-... for OpenAI)
X-LLM-Base-URLNoThe LLM provider's base URL. Defaults to https://api.openai.com
X-LLM-ModelNoOverride the model in the request body
X-Hippocortex-SessionNoCustom session ID for grouping related conversations

Quick Start

Change your base_url and add headers. That is the entire integration.

Python (OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "sk-...",   # Your OpenAI key
    },
)

# Use exactly as before. Every call now has memory.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Deploy payments to staging"}],
)

TypeScript (OpenAI SDK):

import OpenAI from 'openai'

const client = new OpenAI({
  baseURL: 'https://api.hippocortex.dev/v1',
  apiKey: 'hx_live_...',
  defaultHeaders: {
    'X-LLM-API-Key': 'sk-...',
  },
})

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Deploy payments to staging' }],
})

Supported Providers

The Gateway works with any LLM provider that exposes an OpenAI-compatible chat completions endpoint. Set the X-LLM-Base-URL header to point at your provider.

ProviderX-LLM-Base-URLX-LLM-API-Key
OpenAIhttps://api.openai.com (default)sk-...
Anthropichttps://api.anthropic.comsk-ant-...
Google Geminihttps://generativelanguage.googleapis.com/v1beta/openaiYour Google AI key
Groqhttps://api.groq.com/openaigsk_...
Togetherhttps://api.together.xyztog_...
Mistralhttps://api.mistral.ai/v1Your Mistral key
Fireworkshttps://api.fireworks.ai/inference/v1Your Fireworks key
Ollama (local)http://localhost:11434unused
Any OpenAI-compatibleYour provider's base URLYour provider's key

Provider Examples

Anthropic:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "sk-ant-...",
        "X-LLM-Base-URL": "https://api.anthropic.com",
    },
)

Google Gemini:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "your-google-ai-key",
        "X-LLM-Base-URL": "https://generativelanguage.googleapis.com/v1beta/openai",
    },
)

Groq:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "gsk_...",
        "X-LLM-Base-URL": "https://api.groq.com/openai",
    },
)

Together:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "tog_...",
        "X-LLM-Base-URL": "https://api.together.xyz",
    },
)

Mistral:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "your-mistral-key",
        "X-LLM-Base-URL": "https://api.mistral.ai/v1",
    },
)

Fireworks:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "your-fireworks-key",
        "X-LLM-Base-URL": "https://api.fireworks.ai/inference/v1",
    },
)

Ollama (local):

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "unused",
        "X-LLM-Base-URL": "http://localhost:11434",
    },
)

Sessions

By default, the Gateway generates a session ID per request. To group related conversations, pass the X-Hippocortex-Session header:

client = OpenAI(
    base_url="https://api.hippocortex.dev/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "sk-...",
        "X-Hippocortex-Session": "chat-thread-42",
    },
)

Events within the same session are grouped during compilation, which helps the Memory Compiler find patterns across related interactions.

Streaming

The Gateway fully supports streaming responses. Pass stream: true in your request body and tokens stream back as they arrive from the upstream provider.

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain the deploy process"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

The full response is captured after the stream completes.

Auto-Compile

You do not need to trigger compilation manually. The pipeline automatically compiles after every 10 captured events, with a 5-minute sweep to catch stragglers. Knowledge artifacts are continuously updated as new experience accumulates.

Self-Hosted

If you run a self-hosted Hippocortex server, the Gateway is available at your server's address:

client = OpenAI(
    base_url="http://localhost:3100/v1",
    api_key="hx_live_...",
    default_headers={
        "X-LLM-API-Key": "sk-...",
    },
)

When to Use the Gateway vs the SDK

Use the Gateway when:

  • You want the simplest possible integration (change one URL)
  • You want automatic capture and compile with no code changes
  • You are using a language or framework without a Hippocortex SDK
  • You want to add memory to an existing application without modifying it

Use the SDK when:

  • You need fine-grained control over what gets captured
  • You want to capture non-LLM events (file edits, commands, custom actions)
  • You want to call synthesize() separately from LLM calls
  • You need to work offline or in an air-gapped environment

Both approaches produce the same memory artifacts and can be used together.