On-Prem Requirements
The on-prem stack runs Memanto plus the Moorcheh server (and optionally Ollama) in Docker on a single host. Everything is automated by thememanto CLI — you only need to make sure the prerequisites are in place before you start.
Operating Systems
The on-prem stack is supported on:- Windows 10/11 with Docker Desktop (WSL2 backend)
- macOS 12+ (Apple Silicon and Intel) with Docker Desktop
- Linux (Ubuntu 20.04+, Debian 11+, RHEL 8+, Amazon Linux 2) with Docker Engine 20.10+
Hardware
| Component | Minimum | Recommended | Notes |
|---|---|---|---|
| CPU | 4 cores | 8+ cores | More cores noticeably speed up Ollama inference. |
| RAM | 8 GB | 16 GB+ | Ollama with qwen2.5 needs ~6 GB resident; embeddings add ~1 GB. With OpenAI/Cohere as providers, 4–6 GB total is enough. |
| Disk | 10 GB free | 30 GB+ free | Ollama model images are 1–7 GB each. Moorcheh storage grows with your memory volume. |
| GPU | Not required | NVIDIA GPU with 8 GB+ VRAM | Optional — speeds up Ollama. CPU-only inference works on any modern machine. |
Software
Required
- Docker Engine 20.10+ or Docker Desktop 4.0+ with the daemon running.
- The Memanto onboarding wizard fails fast with a clear error if
docker infodoes not succeed. - Verify with:
- The Memanto onboarding wizard fails fast with a clear error if
- Python 3.10+ for the Memanto CLI itself.
- Verify with:
- Verify with:
memantoPython package.moorcheh-client>=0.1.3— the Python package that ships themoorcheh upcommand and exposes the on-prem SDK shape Memanto talks to. The onboarding wizard installs this automatically the first time you choose On-Prem at the prompt; you can also install it explicitly:
Optional
uvicorn[standard]if you plan to runmemanto servedirectly. Installed automatically as a dependency ofmemantoin most cases.- NVIDIA Container Toolkit if you want Ollama to use a GPU inside Docker.
Network & Ports
| Port | Bound to | Used by |
|---|---|---|
| 8000 | localhost (default) | Memanto’s REST API (memanto serve, memanto ui). |
| 8080 | localhost (default) | Moorcheh on-prem server, started by moorcheh up. |
| 11434 | inside the Docker network | Ollama, when used as the embedding/LLM provider. Started as a sibling container by moorcheh up. |
- Once, to
pip install memantoandmoorcheh-client. - Once per Ollama model, to pull the image from the Ollama registry.
- For every
answer.generatecall if your LLM provider is OpenAI or Cohere.
Provider Choices
You will be prompted to choose providers during onboarding. The choices and what they imply:| Provider | Embedding | LLM (Answer) | API key | Cost | Best for |
|---|---|---|---|---|---|
| Ollama | nomic-embed-text | qwen2.5 | None | $0 | True air-gap; local development; demos; cost-sensitive workloads. |
| OpenAI | text-embedding-3-small | gpt-4o-mini | Required | Per-token | High-quality embeddings; existing OpenAI relationships. |
| Cohere | embed-english-v3.0 | command-r-plus-08-2024 | Required | Per-token | High-quality long-context answers; multilingual embeddings. |
Disk Layout
Once onboarding finishes, the on-prem stack uses these locations on the host:| Path | Owner | Purpose |
|---|---|---|
~/.memanto/ | Memanto CLI | Top-level config dir. The cloud backend stores everything here. |
~/.memanto/on-prem/ | Memanto CLI (on-prem only) | Isolated data dir: agents, sessions, registry, and state.json for the on-prem backend. |
~/.memanto/on-prem/state.json | Memanto CLI | Source of truth for url, embedding_provider, embedding_model, llm_provider, llm_model. |
~/.moorcheh/config.json | moorcheh-client | Embedding and LLM provider config consumed by the on-prem server. |
~/.moorcheh/uploads/ | moorcheh-client | Staging area for files uploaded via memanto upload — paths inside this dir are mapped into the container. |
Verifying Prerequisites
Before running the on-prem setup, the wizard performs these checks for you:dockeris onPATH.docker inforeturns successfully (daemon is up).moorcheh-client>=0.1.3is importable (installs it if not).- Provider API keys (if you chose OpenAI or Cohere) are non-empty.