Skip to main content

On-Prem Configuration

The on-prem wizard wires sensible defaults, but every value is editable. This page is the reference for where each setting lives, what controls it, and how to change it after install.

Configuration Surfaces

On-prem reads from three places. They are evaluated in this order — later sources override earlier ones:
  1. Environment variables (or a project .env) — highest precedence.
  2. ~/.memanto/on-prem/state.json — set by the on-prem onboarding wizard.
  3. Built-in defaults in Memanto’s Settings model.
The shared ~/.memanto/config.yaml is owned by the cloud backend; the on-prem wizard does not write into it. The only config.yaml key on-prem touches is backend: on-prem (so subsequent CLI runs know which backend to dispatch to).

Environment Variables

These are the on-prem-relevant variables in Settings. Defaults shown.
VariableDefaultPurpose
MEMANTO_BACKENDcloudSet to on-prem to route all Moorcheh calls to the local server. The wizard sets this for you; you can also set it inline (MEMANTO_BACKEND=on-prem memanto status).
MOORCHEH_ONPREM_URLhttp://localhost:8080Base URL of the Moorcheh on-prem server. Override if you’ve remapped the port or are running Memanto in a different container.
MOORCHEH_ONPREM_EMBEDDING_PROVIDER(empty)Surfaced in memanto status so you can see at a glance what provider is in use. Auto-populated from state.json.
MOORCHEH_ONPREM_TIMEOUT300HTTP read timeout in seconds for the on-prem Moorcheh client. Default is high because first-call LLM cold-starts on Ollama can take 1–2 minutes (model load).
MOORCHEH_API_KEY(empty)Not required on-prem. The on-prem stack does not consult this.
HOST0.0.0.0Bind host for Memanto’s own REST server (memanto serve).
PORT8000Bind port for Memanto’s own REST server.
ANSWER_MODELanthropic.claude-sonnet-4-6Cloud default. On-prem, the active LLM is sourced from state.json (llm_model); this env var is ignored unless state.json is empty.
ANSWER_TEMPERATURE0.7LLM temperature for answer.generate. Honored on both backends.
ANSWER_LIMIT15Number of context memories passed to the LLM for answer.
ANSWER_THRESHOLD0.01Confidence threshold for memory relevance during answer.
RECALL_LIMIT10Default Top-N results returned by recall.
SUMMARY_MODELanthropic.claude-sonnet-4-6Same backend-awareness rule as ANSWER_MODEL.
ALLOWED_ORIGINS*CORS origins for Memanto’s REST API. Restrict in production.
LOG_LEVELINFODEBUG, INFO, WARNING, ERROR.

Setting Env Vars

For a single command:
MEMANTO_BACKEND=on-prem memanto status
In a project .env (loaded automatically):
MEMANTO_BACKEND=on-prem
MOORCHEH_ONPREM_URL=http://moorcheh.internal:8080
MOORCHEH_ONPREM_TIMEOUT=600
LOG_LEVEL=DEBUG
Globally for your shell (Linux/macOS):
export MEMANTO_BACKEND=on-prem
On Windows PowerShell:
$env:MEMANTO_BACKEND = "on-prem"

On-Prem State File

~/.memanto/on-prem/state.json is the source of truth for on-prem configuration. It is written by the wizard and read by both the CLI and the embedded server. Example contents:
{
  "installed_at": "2026-06-09T14:32:11Z",
  "embedding_provider": "ollama",
  "embedding_model": "nomic-embed-text",
  "llm_provider": "ollama",
  "llm_model": "qwen2.5",
  "url": "http://localhost:8080"
}
KeyUsed for
urlExported as MOORCHEH_ONPREM_URL at process startup.
embedding_provider, embedding_modelRe-used when re-onboarding on-prem (lets you switch cloud↔on-prem without re-picking a provider).
llm_provider, llm_modelSent as ai_model to the on-prem server on every answer.generate call. If empty/missing, the on-prem server falls back to whatever LLM is configured in ~/.moorcheh/config.json.
installed_atMetadata, useful for support diagnostics.
You can edit this file by hand. After saving, restart memanto serve (or run any CLI command) to reload.

Moorcheh Server Config

~/.moorcheh/config.json is owned by the moorcheh-client package. The Memanto wizard writes the full embedding + LLM block there before calling moorcheh up, so the on-prem server has both ready on first boot. Schema:
{
  "embedding": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "api_key": null,
    "base_url": "http://ollama:11434"
  },
  "llm": {
    "provider": "ollama",
    "model": "qwen2.5",
    "api_key": null,
    "base_url": "http://ollama:11434"
  }
}
To switch the on-prem server to a different provider after install:
  1. Stop the stack: moorcheh down.
  2. Edit ~/.moorcheh/config.json (or use moorcheh configure interactively).
  3. Restart: moorcheh up.
  4. Update ~/.memanto/on-prem/state.json to match (embedding_provider, llm_model, etc.) — Memanto reads its model id from there.

Provider Reference

  • Embedding model: nomic-embed-text (default; ~270 MB).
  • LLM model: qwen2.5 (default; ~4.7 GB) — change with any ollama pull-able model.
  • API key: none.
  • Where it runs: sibling Docker container started by moorcheh up.

OpenAI

  • Embedding model: text-embedding-3-small (default; cheaper) or text-embedding-3-large.
  • LLM model: gpt-4o-mini (default), gpt-4o, etc.
  • API key: required; stored in ~/.moorcheh/config.json under embedding.api_key / llm.api_key.

Cohere

  • Embedding model: embed-english-v3.0 (default) or embed-multilingual-v3.0.
  • LLM model: command-r-plus-08-2024 (default).
  • API key: required.

Answer & Recall Tuning

These knobs work identically on cloud and on-prem.
SettingEnv varDefaultWhat it does
Answer modelANSWER_MODEL (cloud) / state.json: llm_model (on-prem)Which LLM answer calls.
Answer temperatureANSWER_TEMPERATURE0.7Higher = more creative, lower = more deterministic.
Answer context sizeANSWER_LIMIT15How many memories to pass as context. Lower for faster answers, higher for better grounding.
Answer thresholdANSWER_THRESHOLD0.01Memories below this similarity are dropped.
Recall top-NRECALL_LIMIT10Default page size for recall. Override per-call with --limit.

Timeouts

Ollama cold-starts can be slow on first call after moorcheh up. Memanto sets the on-prem client’s read timeout to 300 seconds by default so an initial answer.generate doesn’t fail with a ReadTimeout. Override:
export MOORCHEH_ONPREM_TIMEOUT=600
After the first call the model stays resident in Ollama’s RAM and subsequent calls return in under seconds.

Disk Locations Recap

PathOwnerEditable?
~/.memanto/on-prem/state.jsonMemanto CLIYes — hand-edit then restart CLI/server.
~/.memanto/.envMemanto CLIYes — but on-prem does not need a MOORCHEH_API_KEY.
~/.moorcheh/config.jsonmoorcheh-clientYes via moorcheh configure or by hand.
~/.moorcheh/uploads/moorcheh-clientAppend-only; staging for memanto upload files.

Next Steps