Skip to main content

On-Prem Quickstart

This guide walks through the full on-prem install end-to-end. It uses the built-in memanto first-run wizard, which provisions the Moorcheh on-prem server, configures embedding + LLM providers, and writes all state for you. Time to complete: 5–10 minutes on first run (most of which is the initial Ollama model pull). Result: A fully working on-prem Memanto stack with remember, recall, and answer operating against a local Moorcheh container. Before you start, confirm you have Docker running and Python 3.10+ on PATH. See Requirements for details.

1. Install the CLI

pip install memanto
Or with uv:
pip install uv
uv tool install memanto
Verify:
memanto --version

2. Run the Wizard

Launch the interactive setup:
memanto
The first time you run the CLI with no subcommand, it prints the welcome banner and asks you to choose a backend:
Choose your backend
  1  Moorcheh Cloud      (instant, needs API key, all features)
  2  Moorcheh On-Prem    (~5-10 min install, Docker required, no API key)
  Enter 1 or 2 [1]:
Type 2 and press Enter. The wizard then:
  1. Verifies Docker is installed and the daemon is running.
  2. Installs moorcheh-client>=0.1.3 if it isn’t already.
  3. Prompts you for an embedding provider.
  4. Prompts you for an LLM provider (for answer.generate).
  5. Writes both into ~/.moorcheh/config.json.
  6. Runs moorcheh up --embedding-provider … --embedding-model … to start the on-prem stack.
  7. Waits for http://localhost:8080/health to return 200.
  8. If you chose Ollama, pulls the embedding and LLM models inside the Ollama container via docker exec ollama pull ….
  9. Saves the final state to ~/.memanto/on-prem/state.json.

3. Pick an Embedding Provider

The wizard asks:
Embedding provider
  1  Ollama (local, zero API keys)        - we'll pull the embedding model for you
  2  Bring your own (OpenAI or Cohere)    - cloud-hosted embeddings, requires an API key
  Enter 1 or 2 [1]:
ChoiceDefault modelAPI keyWhere embeddings run
Ollamanomic-embed-textNoneLocal container started by moorcheh up.
OpenAItext-embedding-3-smallYesOpenAI API.
Cohereembed-english-v3.0YesCohere API.
If you pick OpenAI or Cohere, the wizard prompts (with hidden input) for your provider API key and validates it is non-empty.

4. Pick an LLM Provider

Next, the wizard asks for the LLM used by memanto answer:
Answer LLM provider
  1  Ollama (local, zero API keys)   - model: qwen2.5
  2  OpenAI                           - model: gpt-4o-mini, requires an API key
  3  Cohere                           - model: command-r-plus-08-2024, requires an API key
  Enter 1, 2, or 3 [1]:
The default mirrors your embedding choice — if you picked OpenAI for embeddings, OpenAI is suggested for the LLM, and the wizard reuses your API key so you don’t have to enter it twice. You can mix providers freely (e.g., Ollama embeddings + OpenAI LLM) — Memanto stores both choices independently. To change the model later, edit answer.model in ~/.memanto/on-prem/config.yaml.

5. Wait for the Server

After provider selection, the wizard prints:
  Starting Moorcheh server (`moorcheh up`)...
  ✓ Docker is running
  ✓ moorcheh-client installed
  ✓ LLM config saved to ~/.moorcheh/config.json
  Waiting for http://localhost:8080/...
  ✓ Moorcheh server online
If you picked Ollama, you’ll also see:
  Pulling nomic-embed-text inside Ollama container abc123def456...
  ✓ Embedding model ready in container
  Pulling qwen2.5 inside Ollama container abc123def456...
  ✓ Embedding model ready in container
The initial Ollama pull downloads ~5 GB and can take several minutes on a slow connection. Subsequent runs reuse the cached models. When the wizard finishes you’ll see:
Setup complete!
  Backend:    On-Prem
  Config:     /Users/<you>/.memanto
  Server:     http://localhost:8080
  Embedding:  ollama

6. Verify the Install

Run the status dashboard:
memanto status
You should see something like:
Configuration
  Config Dir       /Users/<you>/.memanto
  Backend          on-prem
  On-Prem URL      http://localhost:8080
  Embedding        ollama
  On-Prem Server   ● online
If the On-Prem Server line says ● offline, see Troubleshooting.

7. Try It End-to-End

The on-prem CLI is identical to the cloud CLI — no --on-prem flags anywhere. Create an agent, store a memory, then ask Memanto a question:
# Create an agent (auto-activates a 6-hour session)
memanto agent create on-prem-demo

# Store a memory — instantly searchable, no indexing wait
memanto remember "The user prefers dark mode for the dashboard" --type preference

# Recall it semantically
memanto recall "What theme does the user want?"

# Generate a grounded answer (uses your chosen LLM provider)
memanto answer "Based on memory, what theme should I set?"
Expected behavior:
  • remember returns a memory_id immediately — no indexing delay.
  • recall returns the stored memory by semantic similarity, even with no keyword overlap.
  • answer calls your chosen LLM (Ollama / OpenAI / Cohere) with the recalled memories as context and prints the grounded answer.
Open the web UI to browse everything in a browser:
memanto ui
This starts the embedded Memanto server (on port 8000 by default) and opens the dashboard in your default browser.

8. Start the REST API (Optional)

If you want to drive Memanto from your own application code or external tools, start the local REST server:
memanto serve
The server listens on http://localhost:8000. It auto-detects your on-prem backend choice and routes all Moorcheh calls to http://localhost:8080. Interactive API docs are at http://localhost:8000/docs. All endpoints documented in the API Reference work identically on-prem — including /api/v2/agents/{id}/remember, /recall, /answer, /upload-file, and /batch-remember.

What Just Got Installed

ComponentWhereStarted by
Memanto CLI + serverpip site-packagesmemanto, memanto serve
Moorcheh on-prem containerDockermoorcheh up (called by the wizard)
Ollama container (if chosen)Docker, sibling to Moorchehmoorcheh up
Embedding + LLM models (if Ollama)Inside Ollama containerdocker exec ollama pull …
On-prem state~/.memanto/on-prem/state.jsonWizard
Moorcheh provider config~/.moorcheh/config.jsonWizard (via moorcheh-client)
Nothing else is installed system-wide. To remove the on-prem stack later, run moorcheh down (or docker compose down against the moorcheh project), uninstall moorcheh-client, and delete ~/.memanto/on-prem/.

Next Steps