LangChain + Memanto

Add persistent, cross-session memory to your LangChain agents and chains using Memanto. LangChain built-in memory classes reset between runs. Memanto plugs in as a custom memory backend that stores and retrieves context semantically so your chains remember what matters, even days later.

How It Works

LangChain Chain / Agent -> MemantoMemory -> Memanto Server -> Moorcheh.ai

You drop MemantoMemory in wherever LangChain expects a BaseMemory. It handles session activation, storing new messages, and injecting recalled context into your prompts.

The Moorcheh API key (MOORCHEH_API_KEY) is configured on the Memanto server, not in your LangChain code. The only credential the client sends is X-Session-Token.

Prerequisites

Python 3.8+
Moorcheh API key configured on the Memanto server
Memanto server running locally

Install

pip install memanto langchain langchain-openai httpx

Step 1: Start Memanto Server

export MOORCHEH_API_KEY=your_moorcheh_key
memanto serve

Step 2: Create the Memory Class

Create memanto_memory.py:

import httpx
from langchain.memory import BaseMemory

class MemantoMemory(BaseMemory):
    """LangChain-compatible memory backend powered by Memanto."""

    agent_id: str = "langchain-agent"
    memanto_url: str = "http://localhost:8000"
    memory_key: str = "memory"
    session_token: str = ""

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self._client = httpx.Client()
        self._activate()

    def _activate(self):
        response = self._client.post(
            f"{self.memanto_url}/api/v2/agents/{self.agent_id}/activate"
        )
        response.raise_for_status()
        self.session_token = response.json()["session_token"]

    @property
    def _headers(self) -> dict:
        return {
            "X-Session-Token": self.session_token,
            "Content-Type": "application/json",
        }

    @property
    def memory_variables(self) -> list[str]:
        return [self.memory_key]

    def load_memory_variables(self, inputs: dict) -> dict:
        """Called before each LLM call - recalls relevant memories."""
        query = inputs.get("input", inputs.get("human_input", ""))
        if not query:
            return {self.memory_key: ""}

        response = self._client.post(
            f"{self.memanto_url}/api/v2/agents/{self.agent_id}/recall",
            headers=self._headers,
            json={"query": query, "limit": 5},
        )
        response.raise_for_status()
        memories = response.json().get("memories", [])
        if not memories:
            return {self.memory_key: ""}

        context = "\n".join(f"- {m['content']}" for m in memories)
        return {self.memory_key: f"Relevant memory:\n{context}"}

    def save_context(self, inputs: dict, outputs: dict) -> None:
        """Called after each LLM call - stores the conversation turn."""
        human = inputs.get("input", inputs.get("human_input", ""))
        ai = outputs.get("output", outputs.get("response", ""))

        if human:
            self._client.post(
                f"{self.memanto_url}/api/v2/agents/{self.agent_id}/remember",
                headers=self._headers,
                json={"content": f"User said: {human}", "type": "fact"},
            )
        if ai:
            self._client.post(
                f"{self.memanto_url}/api/v2/agents/{self.agent_id}/remember",
                headers=self._headers,
                json={"content": f"Assistant replied: {ai}", "type": "fact"},
            )

    def clear(self) -> None:
        pass  # Memories persist in Memanto - clear via CLI if needed

Step 3: Use in a Chain

Create agent.py:

from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate
from memanto_memory import MemantoMemory

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
memory = MemantoMemory(agent_id="my-assistant")

prompt = PromptTemplate(
    input_variables=["memory", "input"],
    template=(
        "You are a helpful assistant with long-term memory.\n\n"
        "{memory}\n\n"
        "Human: {input}\n"
        "Assistant:"
    ),
)

chain = ConversationChain(llm=llm, memory=memory, prompt=prompt, verbose=True)

# First run - Alice introduces herself
response = chain.invoke({"input": "My name is Alice and I prefer dark mode."})
print(response["output"])

# Second run - Memanto recalls that Alice prefers dark mode
response = chain.invoke({"input": "What UI settings should I use?"})
print(response["output"])

Step 4: Run

export OPENAI_API_KEY=sk_your_openai_key
# MOORCHEH_API_KEY is read by the Memanto server, not by this script.
python agent.py

Using with LCEL (LangChain Expression Language)

Inject recalled memory directly into an LCEL pipeline:

import httpx
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda

AGENT_ID = "lcel-agent"
BASE_URL = "http://localhost:8000/api/v2"

token = httpx.post(
    f"{BASE_URL}/agents/{AGENT_ID}/activate"
).json()["session_token"]

HEADERS = {"X-Session-Token": token, "Content-Type": "application/json"}

def recall_context(inputs: dict) -> dict:
    resp = httpx.post(
        f"{BASE_URL}/agents/{AGENT_ID}/recall",
        headers=HEADERS,
        json={"query": inputs["question"], "limit": 5},
    )
    memories = resp.json().get("memories", [])
    context = "\n".join(f"- {m['content']}" for m in memories) or "No prior context."
    return {**inputs, "context": context}

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant.\n\nMemory:\n{context}"),
    ("human", "{question}"),
])

chain = RunnableLambda(recall_context) | prompt | ChatOpenAI(model="gpt-4o-mini")

result = chain.invoke({"question": "What are my UI preferences?"})
print(result.content)

Using Memanto’s Built-in Answer (Optional)

For cases where you want a direct, grounded response from memory without routing through your chain, Memanto exposes an answer endpoint that uses its native RAG model. No external LLM call is made on your side. This is useful as a quick lookup tool — for example, answering a simple factual question about a user before deciding whether to invoke the full chain.

import httpx

AGENT_ID = "my-assistant"
BASE_URL = "http://localhost:8000/api/v2"

token = httpx.post(
    f"{BASE_URL}/agents/{AGENT_ID}/activate"
).json()["session_token"]

HEADERS = {"X-Session-Token": token, "Content-Type": "application/json"}

def memanto_answer(question: str) -> str:
    """Get a synthesized answer from stored memories using Memanto's native RAG."""
    response = httpx.post(
        f"{BASE_URL}/agents/{AGENT_ID}/answer",
        headers=HEADERS,
        json={"question": question},
    )
    response.raise_for_status()
    return response.json().get("answer", "")

answer = memanto_answer("What UI preferences does Alice have?")
print(answer)
# -> "Alice prefers dark mode and concise responses."

You can also use this inside an LCEL chain as a conditional step — call memanto_answer first, and only invoke the full LLM if the memory answer is empty:

from langchain_core.runnables import RunnableLambda

def answer_or_recall(inputs: dict) -> dict:
    quick = memanto_answer(inputs["question"])
    if quick:
        return {**inputs, "context": f"Memory answer: {quick}"}
    resp = httpx.post(
        f"{BASE_URL}/agents/{AGENT_ID}/recall",
        headers=HEADERS,
        json={"query": inputs["question"], "limit": 5},
    )
    memories = resp.json().get("memories", [])
    context = "\n".join(f"- {m['content']}" for m in memories) or "No prior context."
    return {**inputs, "context": context}

When to use answer vs recall

Use recall (via load_memory_variables) when your LLM should reason over the raw memories itself.

Use answer when you want a ready-made response from memory, or to short-circuit the chain for simple factual lookups.

Persistent Memory Across Sessions

Memories stored via save_context survive process restarts and are available in future sessions for the same agent_id:

# View stored memories
memanto recall "all context" --agent my-assistant

# Export to file
memanto memory export --agent my-assistant

LangChain Integration

LangChain + Memanto

How It Works

Prerequisites

Install

Step 1: Start Memanto Server

Step 2: Create the Memory Class

Step 3: Use in a Chain

Step 4: Run

Using with LCEL (LangChain Expression Language)

Using Memanto’s Built-in Answer (Optional)

Persistent Memory Across Sessions

Next Steps

​LangChain + Memanto

​How It Works

​Prerequisites

​Install

​Step 1: Start Memanto Server

​Step 2: Create the Memory Class

​Step 3: Use in a Chain

​Step 4: Run

​Using with LCEL (LangChain Expression Language)

​Using Memanto’s Built-in Answer (Optional)

​Persistent Memory Across Sessions

​Next Steps

LangChain + Memanto

How It Works

Prerequisites

Install

Step 1: Start Memanto Server

Step 2: Create the Memory Class

Step 3: Use in a Chain

Step 4: Run

Using with LCEL (LangChain Expression Language)

Using Memanto’s Built-in Answer (Optional)

Persistent Memory Across Sessions

Next Steps