Skip to main content

On-Prem Requirements

The on-prem stack runs Memanto plus the Moorcheh server (and optionally Ollama) in Docker on a single host. Everything is automated by the memanto CLI — you only need to make sure the prerequisites are in place before you start.

Operating Systems

The on-prem stack is supported on:
  • Windows 10/11 with Docker Desktop (WSL2 backend)
  • macOS 12+ (Apple Silicon and Intel) with Docker Desktop
  • Linux (Ubuntu 20.04+, Debian 11+, RHEL 8+, Amazon Linux 2) with Docker Engine 20.10+

Hardware

ComponentMinimumRecommendedNotes
CPU4 cores8+ coresMore cores noticeably speed up Ollama inference.
RAM8 GB16 GB+Ollama with qwen2.5 needs ~6 GB resident; embeddings add ~1 GB. With OpenAI/Cohere as providers, 4–6 GB total is enough.
Disk10 GB free30 GB+ freeOllama model images are 1–7 GB each. Moorcheh storage grows with your memory volume.
GPUNot requiredNVIDIA GPU with 8 GB+ VRAMOptional — speeds up Ollama. CPU-only inference works on any modern machine.
For air-gapped deployments using OpenAI or Cohere as the embedding/LLM provider, the hardware footprint is much smaller (no Ollama container needed).

Software

Required

  • Docker Engine 20.10+ or Docker Desktop 4.0+ with the daemon running.
    • The Memanto onboarding wizard fails fast with a clear error if docker info does not succeed.
    • Verify with:
      docker --version
      docker info
      
  • Python 3.10+ for the Memanto CLI itself.
    • Verify with:
      python --version
      
  • memanto Python package.
    pip install memanto
    
  • moorcheh-client>=0.1.3 — the Python package that ships the moorcheh up command and exposes the on-prem SDK shape Memanto talks to. The onboarding wizard installs this automatically the first time you choose On-Prem at the prompt; you can also install it explicitly:
    pip install "moorcheh-client>=0.1.3"
    

Optional

  • uvicorn[standard] if you plan to run memanto serve directly. Installed automatically as a dependency of memanto in most cases.
  • NVIDIA Container Toolkit if you want Ollama to use a GPU inside Docker.

Network & Ports

PortBound toUsed by
8000localhost (default)Memanto’s REST API (memanto serve, memanto ui).
8080localhost (default)Moorcheh on-prem server, started by moorcheh up.
11434inside the Docker networkOllama, when used as the embedding/LLM provider. Started as a sibling container by moorcheh up.
You do not need any inbound internet access for the runtime path. Internet is required only:
  • Once, to pip install memanto and moorcheh-client.
  • Once per Ollama model, to pull the image from the Ollama registry.
  • For every answer.generate call if your LLM provider is OpenAI or Cohere.

Provider Choices

You will be prompted to choose providers during onboarding. The choices and what they imply:
ProviderEmbeddingLLM (Answer)API keyCostBest for
Ollamanomic-embed-textqwen2.5None$0True air-gap; local development; demos; cost-sensitive workloads.
OpenAItext-embedding-3-smallgpt-4o-miniRequiredPer-tokenHigh-quality embeddings; existing OpenAI relationships.
Cohereembed-english-v3.0command-r-plus-08-2024RequiredPer-tokenHigh-quality long-context answers; multilingual embeddings.
You can mix providers — e.g., Ollama embeddings with an OpenAI LLM for answers. The onboarding wizard prompts for each independently.

Disk Layout

Once onboarding finishes, the on-prem stack uses these locations on the host:
PathOwnerPurpose
~/.memanto/Memanto CLITop-level config dir. The cloud backend stores everything here.
~/.memanto/on-prem/Memanto CLI (on-prem only)Isolated data dir: agents, sessions, registry, and state.json for the on-prem backend.
~/.memanto/on-prem/state.jsonMemanto CLISource of truth for url, embedding_provider, embedding_model, llm_provider, llm_model.
~/.moorcheh/config.jsonmoorcheh-clientEmbedding and LLM provider config consumed by the on-prem server.
~/.moorcheh/uploads/moorcheh-clientStaging area for files uploaded via memanto upload — paths inside this dir are mapped into the container.
Cloud and on-prem state are deliberately kept in separate directories so you can switch backends without one polluting the other.

Verifying Prerequisites

Before running the on-prem setup, the wizard performs these checks for you:
  1. docker is on PATH.
  2. docker info returns successfully (daemon is up).
  3. moorcheh-client>=0.1.3 is importable (installs it if not).
  4. Provider API keys (if you chose OpenAI or Cohere) are non-empty.
If any check fails, the wizard prints a one-line error with a hint and exits with a non-zero status — no partial state is left behind.

Next Step

On-Prem Quickstart — install the stack end-to-end in 5–10 minutes.