Synthesize API

The Synthesize endpoint retrieves compressed, relevant context from all memory layers for a given query. This is how your agent recalls what it has learned.

When you call Synthesize, the retrieval engine searches across recent events, compiled artifacts, and entity knowledge. It ranks results using an 8-signal scoring system (semantic similarity, temporal recency, frequency, salience, provenance quality, entity overlap, session relevance, and confidence). Results are then packed into a context pack that fits within your specified token budget.

The output is organized into sections (procedures, failures, decisions, facts, causal patterns) so your agent receives structured, actionable context rather than a raw dump of memories.

POST /v1/synthesize

curl -X POST https://api.hippocortex.dev/v1/synthesize \
  -H "Authorization: Bearer hx_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "query": "deploy payment service to production",
    "options": {
      "maxTokens": 8000,
      "sections": ["procedures", "failures", "decisions"],
      "minConfidence": 0.5,
      "includeProvenance": true
    }
  }'

Request Body

FieldTypeDefaultDescription
querystring(required)The query to synthesize context for
options.maxTokensnumber4000Token budget for the response
options.sectionsstring[]allWhich sections to include
options.minConfidencenumber0.3Minimum confidence threshold (0-1)
options.includeProvenancebooleantrueAttach source references

Available Sections

SectionWhat It Contains
proceduresRelevant task schemas and step sequences
failuresFailure playbooks matching the query
decisionsDecision policies and conditional rules
factsKnown facts and entity information
causalCausal patterns and relationships
contextGeneral background context

Response

{
  "ok": true,
  "data": {
    "packId": "pack-abc123",
    "entries": [
      {
        "section": "procedures",
        "content": "To deploy the payment service to production:\n1. Run test suite\n2. Build Docker image\n3. Push to registry\n4. Update deployment\n5. Verify health check",
        "confidence": 0.87,
        "provenance": [
          {
            "sourceType": "artifact",
            "sourceId": "art-deploy-001",
            "artifactType": "task_schema",
            "evidenceCount": 12
          }
        ]
      },
      {
        "section": "failures",
        "content": "Known issue: deployment can fail if Redis connection pool is exhausted. Recovery: restart Redis and increase pool size.",
        "confidence": 0.92,
        "provenance": [...]
      }
    ],
    "budget": {
      "limit": 8000,
      "used": 3200,
      "compressionRatio": 12.5,
      "entriesIncluded": 5,
      "entriesDropped": 2
    }
  }
}

The budget object shows how the token budget was allocated. compressionRatio indicates how much raw knowledge was compressed to fit. entriesDropped shows how many lower-ranked entries were excluded to stay within budget.

How Retrieval Works

The retrieval engine evaluates each candidate memory against 8 signals:

  1. Semantic similarity to the query
  2. Temporal recency (recent memories score higher)
  3. Frequency of the pattern across events
  4. Salience of the source events
  5. Provenance quality (how well-supported the knowledge is)
  6. Entity overlap with the query
  7. Session relevance (same session context scores higher)
  8. Confidence of the compiled artifact

Results are sorted by combined score, then packed into sections until the token budget is exhausted. Higher-confidence entries are included first.

Best Practices

  1. Set appropriate token budgets. Account for your model's context window minus your prompt and expected output.
  2. Use section filters when you only need specific types of context (e.g., only procedures and failures for a deployment task).
  3. Check the budget response. If entriesDropped is high, consider increasing maxTokens or narrowing the query.
  4. Use provenance for debugging. The sourceId links back to the artifact that produced each entry, which traces further to source events.