Skip to main content
on-prem

Memanto On-Prem

Memanto supports two backends:
  • Cloud (default) — Memanto talks to Moorcheh Cloud over the network with a MOORCHEH_API_KEY.
  • On-Prem — Memanto talks to a local Moorcheh server running in Docker on your own machine or private network. No Moorcheh API key required.
Both backends expose the same CLI, REST API, and SDK behavior. Switching between them is a single command (memanto config backend on-prem). Service code never branches on backend — everything in this documentation works identically once you are configured.

When to Choose On-Prem

ReasonWhat on-prem gives you
Data residencyMemories and embeddings stay on your hardware. Moorcheh runs in a local Docker container.
Air-gapped environmentsWith the ollama provider, no outbound calls to OpenAI, Cohere, or Moorcheh are required after initial model pulls.
Cost controlZero per-request cost for embeddings, LLM answers, and search when using Ollama.
ComplianceUseful for HIPAA, SOC 2, and similar regimes that restrict third-party data processors.
Offline developmentRun Memanto locally without internet connectivity.
Pick Cloud instead if you want zero-install, sub-90ms hosted retrieval, and don’t need to keep data on-prem. See Moorcheh Setup & Integration for the cloud path.

Architecture

on-prem-architecture-diagram Three components run side-by-side:
  1. Memanto CLI/Server — the same memanto binary you’d use against the cloud. With MEMANTO_BACKEND=on-prem, it routes Moorcheh calls to the local server at http://localhost:8080 instead of the cloud API.
  2. Moorcheh on-prem server — a containerized build of Moorcheh started by the moorcheh up command (shipped with the moorcheh-client Python package). Exposes the same namespaces / documents / similarity_search / answer / files / vectors resource shape as the cloud SDK.
  3. Embedding + LLM providers — Ollama (default, runs in a sibling container, zero API keys), or your own OpenAI / Cohere account.
Memanto’s service layer never branches on backend, so every feature in the Guides, CLI, and API Reference sections works the same way on-prem.

What’s Included

Everything available in cloud mode is available on-prem:
  • All 13 typed memory typesinstruction, fact, decision, goal, commitment, preference, relationship, context, event, learning, observation, artifact, error. See Memory Types.
  • remember / recall / answer — the same three primitives. answer uses your locally configured LLM provider instead of the cloud’s hosted model.
  • Temporal queries--as-of, --changed-since, --recent. See Temporal Memory.
  • Batch ingestion — up to 100 memories per request.
  • File upload.pdf, .docx, .xlsx, .json, .txt, .csv, .md. Files are copied into ~/.moorcheh/uploads and made searchable by the on-prem indexer.
  • Sessions — same 6-hour JWT session model.
  • Agent management — create, list, activate, deactivate, bootstrap.
  • Daily summaries, conflict detection, scheduled runsmemanto daily-summary, memanto conflicts, memanto schedule enable.
  • Web UImemanto ui works against on-prem the same as cloud.
  • Integrations — Claude Code, Cursor, Codex, Windsurf, Gemini CLI, Cline, Continue, OpenCode, Goose, Roo, GitHub Copilot, Augment (via memanto connect).
  • REST API — every endpoint under /api/v2/agents/... works identically.
The only thing that differs between backends is the underlying retrieval engine — on-prem stores data in Moorcheh’s local container; cloud stores it in Moorcheh’s hosted service.

Isolation Guarantees

When you switch to on-prem, Memanto isolates your configuration so cloud and on-prem state never cross-contaminate:
  • Per-backend data directory: cloud uses ~/.memanto/, on-prem uses ~/.memanto/on-prem/. Agents, sessions, and registry entries are separate.
  • Per-backend connection state: the on-prem server URL, embedding provider, and LLM model live in ~/.memanto/on-prem/state.json. The cloud’s ~/.memanto/config.yaml is never touched by on-prem onboarding.
  • Per-backend session tokens: switching backends clears the active session so you don’t accidentally send a cloud token to the on-prem server (or vice versa).
This means you can keep both backends configured at once and switch between them with memanto config backend cloud / memanto config backend on-prem. Your agents on each side remain intact.

Next Steps