On-Prem Configuration
The on-prem wizard wires sensible defaults, but every value is editable. This page is the reference for where each setting lives, what controls it, and how to change it after install.Configuration Surfaces
On-prem reads from three places. They are evaluated in this order — later sources override earlier ones:- Environment variables (or a project
.env) — highest precedence. ~/.memanto/on-prem/state.json— set by the on-prem onboarding wizard.- Built-in defaults in Memanto’s
Settingsmodel.
~/.memanto/config.yaml is owned by the cloud backend; the on-prem wizard does not write into it. The only config.yaml key on-prem touches is backend: on-prem (so subsequent CLI runs know which backend to dispatch to).
Environment Variables
These are the on-prem-relevant variables inSettings. Defaults shown.
| Variable | Default | Purpose |
|---|---|---|
MEMANTO_BACKEND | cloud | Set to on-prem to route all Moorcheh calls to the local server. The wizard sets this for you; you can also set it inline (MEMANTO_BACKEND=on-prem memanto status). |
MOORCHEH_ONPREM_URL | http://localhost:8080 | Base URL of the Moorcheh on-prem server. Override if you’ve remapped the port or are running Memanto in a different container. |
MOORCHEH_ONPREM_EMBEDDING_PROVIDER | (empty) | Surfaced in memanto status so you can see at a glance what provider is in use. Auto-populated from state.json. |
MOORCHEH_ONPREM_TIMEOUT | 300 | HTTP read timeout in seconds for the on-prem Moorcheh client. Default is high because first-call LLM cold-starts on Ollama can take 1–2 minutes (model load). |
MOORCHEH_API_KEY | (empty) | Not required on-prem. The on-prem stack does not consult this. |
HOST | 0.0.0.0 | Bind host for Memanto’s own REST server (memanto serve). |
PORT | 8000 | Bind port for Memanto’s own REST server. |
ANSWER_MODEL | anthropic.claude-sonnet-4-6 | Cloud default. On-prem, the active LLM is sourced from state.json (llm_model); this env var is ignored unless state.json is empty. |
ANSWER_TEMPERATURE | 0.7 | LLM temperature for answer.generate. Honored on both backends. |
ANSWER_LIMIT | 15 | Number of context memories passed to the LLM for answer. |
ANSWER_THRESHOLD | 0.01 | Confidence threshold for memory relevance during answer. |
RECALL_LIMIT | 10 | Default Top-N results returned by recall. |
SUMMARY_MODEL | anthropic.claude-sonnet-4-6 | Same backend-awareness rule as ANSWER_MODEL. |
ALLOWED_ORIGINS | * | CORS origins for Memanto’s REST API. Restrict in production. |
LOG_LEVEL | INFO | DEBUG, INFO, WARNING, ERROR. |
Setting Env Vars
For a single command:.env (loaded automatically):
On-Prem State File
~/.memanto/on-prem/state.json is the source of truth for on-prem configuration. It is written by the wizard and read by both the CLI and the embedded server.
Example contents:
| Key | Used for |
|---|---|
url | Exported as MOORCHEH_ONPREM_URL at process startup. |
embedding_provider, embedding_model | Re-used when re-onboarding on-prem (lets you switch cloud↔on-prem without re-picking a provider). |
llm_provider, llm_model | Sent as ai_model to the on-prem server on every answer.generate call. If empty/missing, the on-prem server falls back to whatever LLM is configured in ~/.moorcheh/config.json. |
installed_at | Metadata, useful for support diagnostics. |
memanto serve (or run any CLI command) to reload.
Moorcheh Server Config
~/.moorcheh/config.json is owned by the moorcheh-client package. The Memanto wizard writes the full embedding + LLM block there before calling moorcheh up, so the on-prem server has both ready on first boot. Schema:
- Stop the stack:
moorcheh down. - Edit
~/.moorcheh/config.json(or usemoorcheh configureinteractively). - Restart:
moorcheh up. - Update
~/.memanto/on-prem/state.jsonto match (embedding_provider,llm_model, etc.) — Memanto reads its model id from there.
Provider Reference
Ollama (Local, Recommended for Air-Gap)
- Embedding model:
nomic-embed-text(default; ~270 MB). - LLM model:
qwen2.5(default; ~4.7 GB) — change with anyollama pull-able model. - API key: none.
- Where it runs: sibling Docker container started by
moorcheh up.
OpenAI
- Embedding model:
text-embedding-3-small(default; cheaper) ortext-embedding-3-large. - LLM model:
gpt-4o-mini(default),gpt-4o, etc. - API key: required; stored in
~/.moorcheh/config.jsonunderembedding.api_key/llm.api_key.
Cohere
- Embedding model:
embed-english-v3.0(default) orembed-multilingual-v3.0. - LLM model:
command-r-plus-08-2024(default). - API key: required.
Answer & Recall Tuning
These knobs work identically on cloud and on-prem.| Setting | Env var | Default | What it does |
|---|---|---|---|
| Answer model | ANSWER_MODEL (cloud) / state.json: llm_model (on-prem) | — | Which LLM answer calls. |
| Answer temperature | ANSWER_TEMPERATURE | 0.7 | Higher = more creative, lower = more deterministic. |
| Answer context size | ANSWER_LIMIT | 15 | How many memories to pass as context. Lower for faster answers, higher for better grounding. |
| Answer threshold | ANSWER_THRESHOLD | 0.01 | Memories below this similarity are dropped. |
| Recall top-N | RECALL_LIMIT | 10 | Default page size for recall. Override per-call with --limit. |
Timeouts
Ollama cold-starts can be slow on first call aftermoorcheh up. Memanto sets the on-prem client’s read timeout to 300 seconds by default so an initial answer.generate doesn’t fail with a ReadTimeout. Override:
Disk Locations Recap
| Path | Owner | Editable? |
|---|---|---|
~/.memanto/on-prem/state.json | Memanto CLI | Yes — hand-edit then restart CLI/server. |
~/.memanto/.env | Memanto CLI | Yes — but on-prem does not need a MOORCHEH_API_KEY. |
~/.moorcheh/config.json | moorcheh-client | Yes via moorcheh configure or by hand. |
~/.moorcheh/uploads/ | moorcheh-client | Append-only; staging for memanto upload files. |
Next Steps
- Backend Switching — toggle between cloud and on-prem without losing state.
- Self-Hosting Memanto Server — run
memanto serveunder Docker/Compose/systemd. - Kubernetes Deployment — manifests for a clustered on-prem deployment.
- Security & Operations — production hardening checklist.