On-Prem Quickstart
This guide walks through the full on-prem install end-to-end. It uses the built-inmemanto first-run wizard, which provisions the Moorcheh on-prem server, configures embedding + LLM providers, and writes all state for you.
Time to complete: 5–10 minutes on first run (most of which is the initial Ollama model pull).
Result: A fully working on-prem Memanto stack with remember, recall, and answer operating against a local Moorcheh container.
Before you start, confirm you have Docker running and Python 3.10+ on PATH. See Requirements for details.
1. Install the CLI
uv:
2. Run the Wizard
Launch the interactive setup:2 and press Enter. The wizard then:
- Verifies Docker is installed and the daemon is running.
- Installs
moorcheh-client>=0.1.3if it isn’t already. - Prompts you for an embedding provider.
- Prompts you for an LLM provider (for
answer.generate). - Writes both into
~/.moorcheh/config.json. - Runs
moorcheh up --embedding-provider … --embedding-model …to start the on-prem stack. - Waits for
http://localhost:8080/healthto return 200. - If you chose Ollama, pulls the embedding and LLM models inside the Ollama container via
docker exec ollama pull …. - Saves the final state to
~/.memanto/on-prem/state.json.
3. Pick an Embedding Provider
The wizard asks:| Choice | Default model | API key | Where embeddings run |
|---|---|---|---|
| Ollama | nomic-embed-text | None | Local container started by moorcheh up. |
| OpenAI | text-embedding-3-small | Yes | OpenAI API. |
| Cohere | embed-english-v3.0 | Yes | Cohere API. |
4. Pick an LLM Provider
Next, the wizard asks for the LLM used bymemanto answer:
answer.model in ~/.memanto/on-prem/config.yaml.
5. Wait for the Server
After provider selection, the wizard prints:6. Verify the Install
Run the status dashboard:● offline, see Troubleshooting.
7. Try It End-to-End
The on-prem CLI is identical to the cloud CLI — no--on-prem flags anywhere. Create an agent, store a memory, then ask Memanto a question:
rememberreturns amemory_idimmediately — no indexing delay.recallreturns the stored memory by semantic similarity, even with no keyword overlap.answercalls your chosen LLM (Ollama / OpenAI / Cohere) with the recalled memories as context and prints the grounded answer.
8000 by default) and opens the dashboard in your default browser.
8. Start the REST API (Optional)
If you want to drive Memanto from your own application code or external tools, start the local REST server:http://localhost:8000. It auto-detects your on-prem backend choice and routes all Moorcheh calls to http://localhost:8080. Interactive API docs are at http://localhost:8000/docs.
All endpoints documented in the API Reference work identically on-prem — including /api/v2/agents/{id}/remember, /recall, /answer, /upload-file, and /batch-remember.
What Just Got Installed
| Component | Where | Started by |
|---|---|---|
| Memanto CLI + server | pip site-packages | memanto, memanto serve |
| Moorcheh on-prem container | Docker | moorcheh up (called by the wizard) |
| Ollama container (if chosen) | Docker, sibling to Moorcheh | moorcheh up |
| Embedding + LLM models (if Ollama) | Inside Ollama container | docker exec ollama pull … |
| On-prem state | ~/.memanto/on-prem/state.json | Wizard |
| Moorcheh provider config | ~/.moorcheh/config.json | Wizard (via moorcheh-client) |
moorcheh down (or docker compose down against the moorcheh project), uninstall moorcheh-client, and delete ~/.memanto/on-prem/.
Next Steps
- Configuration — tune providers, model overrides, timeouts, ports.
- Backend Switching — swap between on-prem and cloud without losing either side’s state.
- Self-Hosting Memanto Server — run the Memanto REST API as a long-lived service.
- Troubleshooting — common errors and what to check.