On-Prem Quickstart

This guide walks through the full on-prem install end-to-end. It uses the built-in memanto first-run wizard, which provisions the Moorcheh on-prem server, configures embedding + LLM providers, and writes all state for you. Time to complete: 5–10 minutes on first run (most of which is the initial Ollama model pull). Result: A fully working on-prem Memanto stack with remember, recall, and answer operating against a local Moorcheh container. Before you start, confirm you have Docker running and Python 3.10+ on PATH. See Requirements for details.

1. Install the CLI

pip install memanto

Or with uv:

pip install uv
uv tool install memanto

Verify:

memanto --version

2. Run the Wizard

Launch the interactive setup:

memanto

The first time you run the CLI with no subcommand, it prints the welcome banner and asks you to choose a backend:

Choose your backend
  1  Moorcheh Cloud      (instant, needs API key, all features)
  2  Moorcheh On-Prem    (~5-10 min install, Docker required, no API key)
  Enter 1 or 2 [1]:

Type 2 and press Enter. The wizard then:

Verifies Docker is installed and the daemon is running.
Installs moorcheh-client>=0.1.3 if it isn’t already.
Prompts you for an embedding provider.
Prompts you for an LLM provider (for answer.generate).
Writes both into ~/.moorcheh/config.json.
Runs moorcheh up --embedding-provider … --embedding-model … to start the on-prem stack.
Waits for http://localhost:8080/health to return 200.
If you chose Ollama, pulls the embedding and LLM models inside the Ollama container via docker exec ollama pull ….
Saves the final state to ~/.memanto/on-prem/state.json.

3. Pick an Embedding Provider

The wizard asks:

Embedding provider
  1  Ollama (local, zero API keys)        - we'll pull the embedding model for you
  2  Bring your own (OpenAI or Cohere)    - cloud-hosted embeddings, requires an API key
  Enter 1 or 2 [1]:

Choice	Default model	API key	Where embeddings run
Ollama	`nomic-embed-text`	None	Local container started by `moorcheh up`.
OpenAI	`text-embedding-3-small`	Yes	OpenAI API.
Cohere	`embed-english-v3.0`	Yes	Cohere API.

If you pick OpenAI or Cohere, the wizard prompts (with hidden input) for your provider API key and validates it is non-empty.

4. Pick an LLM Provider

Next, the wizard asks for the LLM used by memanto answer:

Answer LLM provider
  1  Ollama (local, zero API keys)   - model: qwen2.5
  2  OpenAI                           - model: gpt-4o-mini, requires an API key
  3  Cohere                           - model: command-r-plus-08-2024, requires an API key
  Enter 1, 2, or 3 [1]:

The default mirrors your embedding choice — if you picked OpenAI for embeddings, OpenAI is suggested for the LLM, and the wizard reuses your API key so you don’t have to enter it twice. You can mix providers freely (e.g., Ollama embeddings + OpenAI LLM) — Memanto stores both choices independently. To change the model later, edit answer.model in ~/.memanto/on-prem/config.yaml.

5. Wait for the Server

After provider selection, the wizard prints:

  Starting Moorcheh server (`moorcheh up`)...
  ✓ Docker is running
  ✓ moorcheh-client installed
  ✓ LLM config saved to ~/.moorcheh/config.json
  Waiting for http://localhost:8080/...
  ✓ Moorcheh server online

If you picked Ollama, you’ll also see:

  Pulling nomic-embed-text inside Ollama container abc123def456...
  ✓ Embedding model ready in container
  Pulling qwen2.5 inside Ollama container abc123def456...
  ✓ Embedding model ready in container

The initial Ollama pull downloads ~5 GB and can take several minutes on a slow connection. Subsequent runs reuse the cached models. When the wizard finishes you’ll see:

Setup complete!
  Backend:    On-Prem
  Config:     /Users/<you>/.memanto
  Server:     http://localhost:8080
  Embedding:  ollama

6. Verify the Install

Run the status dashboard:

memanto status

You should see something like:

Configuration
  Config Dir       /Users/<you>/.memanto
  Backend          on-prem
  On-Prem URL      http://localhost:8080
  Embedding        ollama
  On-Prem Server   ● online

If the On-Prem Server line says ● offline, see Troubleshooting.

7. Try It End-to-End

The on-prem CLI is identical to the cloud CLI — no --on-prem flags anywhere. Create an agent, store a memory, then ask Memanto a question:

# Create an agent (auto-activates a 6-hour session)
memanto agent create on-prem-demo

# Store a memory — instantly searchable, no indexing wait
memanto remember "The user prefers dark mode for the dashboard" --type preference

# Recall it semantically
memanto recall "What theme does the user want?"

# Generate a grounded answer (uses your chosen LLM provider)
memanto answer "Based on memory, what theme should I set?"

Expected behavior:

remember returns a memory_id immediately — no indexing delay.
recall returns the stored memory by semantic similarity, even with no keyword overlap.
answer calls your chosen LLM (Ollama / OpenAI / Cohere) with the recalled memories as context and prints the grounded answer.

Open the web UI to browse everything in a browser:

memanto ui

This starts the embedded Memanto server (on port 8000 by default) and opens the dashboard in your default browser.

8. Start the REST API (Optional)

If you want to drive Memanto from your own application code or external tools, start the local REST server:

memanto serve

The server listens on http://localhost:8000. It auto-detects your on-prem backend choice and routes all Moorcheh calls to http://localhost:8080. Interactive API docs are at http://localhost:8000/docs. All endpoints documented in the API Reference work identically on-prem — including /api/v2/agents/{id}/remember, /recall, /answer, /upload-file, and /batch-remember.

What Just Got Installed

Component	Where	Started by
Memanto CLI + server	`pip` site-packages	`memanto`, `memanto serve`
Moorcheh on-prem container	Docker	`moorcheh up` (called by the wizard)
Ollama container (if chosen)	Docker, sibling to Moorcheh	`moorcheh up`
Embedding + LLM models (if Ollama)	Inside Ollama container	`docker exec ollama pull …`
On-prem state	`~/.memanto/on-prem/state.json`	Wizard
Moorcheh provider config	`~/.moorcheh/config.json`	Wizard (via `moorcheh-client`)

Nothing else is installed system-wide. To remove the on-prem stack later, run moorcheh down (or docker compose down against the moorcheh project), uninstall moorcheh-client, and delete ~/.memanto/on-prem/.

Next Steps

Configuration — tune providers, model overrides, timeouts, ports.
Backend Switching — swap between on-prem and cloud without losing either side’s state.
Self-Hosting Memanto Server — run the Memanto REST API as a long-lived service.
Troubleshooting — common errors and what to check.

Introduction

Configuration

Deployment

On-Prem Quickstart

On-Prem Quickstart

1. Install the CLI

2. Run the Wizard

3. Pick an Embedding Provider

4. Pick an LLM Provider

5. Wait for the Server

6. Verify the Install

7. Try It End-to-End

8. Start the REST API (Optional)

What Just Got Installed

Next Steps

​On-Prem Quickstart

​1. Install the CLI

​2. Run the Wizard

​3. Pick an Embedding Provider

​4. Pick an LLM Provider

​5. Wait for the Server

​6. Verify the Install

​7. Try It End-to-End

​8. Start the REST API (Optional)

​What Just Got Installed

​Next Steps

On-Prem Quickstart

1. Install the CLI

2. Run the Wizard

3. Pick an Embedding Provider

4. Pick an LLM Provider

5. Wait for the Server

6. Verify the Install

7. Try It End-to-End

8. Start the REST API (Optional)

What Just Got Installed

Next Steps