Documentation Index
Fetch the complete documentation index at: https://docs.memanto.ai/llms.txt
Use this file to discover all available pages before exploring further.
LangChain + Memanto
Add persistent, cross-session memory to your LangChain agents and chains using Memanto.
LangChain built-in memory classes reset between runs. Memanto plugs in as a custom memory backend that stores and retrieves context semantically so your chains remember what matters, even days later.
How It Works
LangChain Chain / Agent -> MemantoMemory -> Memanto Server -> Moorcheh.ai
You drop MemantoMemory in wherever LangChain expects a BaseMemory. It handles session activation, storing new messages, and injecting recalled context into your prompts.
The Moorcheh API key (MOORCHEH_API_KEY) is configured on the Memanto server, not in your LangChain code. The only credential the client sends is X-Session-Token.
Prerequisites
- Python 3.8+
- Moorcheh API key configured on the Memanto server
- Memanto server running locally
Install
pip install memanto langchain langchain-openai httpx
Step 1: Start Memanto Server
export MOORCHEH_API_KEY=your_moorcheh_key
memanto serve
Step 2: Create the Memory Class
Create memanto_memory.py:
import httpx
from langchain.memory import BaseMemory
class MemantoMemory(BaseMemory):
"""LangChain-compatible memory backend powered by Memanto."""
agent_id: str = "langchain-agent"
memanto_url: str = "http://localhost:8000"
memory_key: str = "memory"
session_token: str = ""
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._client = httpx.Client()
self._activate()
def _activate(self):
response = self._client.post(
f"{self.memanto_url}/api/v2/agents/{self.agent_id}/activate"
)
response.raise_for_status()
self.session_token = response.json()["session_token"]
@property
def _headers(self) -> dict:
return {
"X-Session-Token": self.session_token,
"Content-Type": "application/json",
}
@property
def memory_variables(self) -> list[str]:
return [self.memory_key]
def load_memory_variables(self, inputs: dict) -> dict:
"""Called before each LLM call - recalls relevant memories."""
query = inputs.get("input", inputs.get("human_input", ""))
if not query:
return {self.memory_key: ""}
response = self._client.post(
f"{self.memanto_url}/api/v2/agents/{self.agent_id}/recall",
headers=self._headers,
json={"query": query, "limit": 5},
)
response.raise_for_status()
memories = response.json().get("memories", [])
if not memories:
return {self.memory_key: ""}
context = "\n".join(f"- {m['content']}" for m in memories)
return {self.memory_key: f"Relevant memory:\n{context}"}
def save_context(self, inputs: dict, outputs: dict) -> None:
"""Called after each LLM call - stores the conversation turn."""
human = inputs.get("input", inputs.get("human_input", ""))
ai = outputs.get("output", outputs.get("response", ""))
if human:
self._client.post(
f"{self.memanto_url}/api/v2/agents/{self.agent_id}/remember",
headers=self._headers,
json={"content": f"User said: {human}", "type": "fact"},
)
if ai:
self._client.post(
f"{self.memanto_url}/api/v2/agents/{self.agent_id}/remember",
headers=self._headers,
json={"content": f"Assistant replied: {ai}", "type": "fact"},
)
def clear(self) -> None:
pass # Memories persist in Memanto - clear via CLI if needed
Step 3: Use in a Chain
Create agent.py:
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate
from memanto_memory import MemantoMemory
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
memory = MemantoMemory(agent_id="my-assistant")
prompt = PromptTemplate(
input_variables=["memory", "input"],
template=(
"You are a helpful assistant with long-term memory.\n\n"
"{memory}\n\n"
"Human: {input}\n"
"Assistant:"
),
)
chain = ConversationChain(llm=llm, memory=memory, prompt=prompt, verbose=True)
# First run - Alice introduces herself
response = chain.invoke({"input": "My name is Alice and I prefer dark mode."})
print(response["output"])
# Second run - Memanto recalls that Alice prefers dark mode
response = chain.invoke({"input": "What UI settings should I use?"})
print(response["output"])
Step 4: Run
export OPENAI_API_KEY=sk_your_openai_key
# MOORCHEH_API_KEY is read by the Memanto server, not by this script.
python agent.py
Using with LCEL (LangChain Expression Language)
Inject recalled memory directly into an LCEL pipeline:
import httpx
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda
AGENT_ID = "lcel-agent"
BASE_URL = "http://localhost:8000/api/v2"
token = httpx.post(
f"{BASE_URL}/agents/{AGENT_ID}/activate"
).json()["session_token"]
HEADERS = {"X-Session-Token": token, "Content-Type": "application/json"}
def recall_context(inputs: dict) -> dict:
resp = httpx.post(
f"{BASE_URL}/agents/{AGENT_ID}/recall",
headers=HEADERS,
json={"query": inputs["question"], "limit": 5},
)
memories = resp.json().get("memories", [])
context = "\n".join(f"- {m['content']}" for m in memories) or "No prior context."
return {**inputs, "context": context}
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant.\n\nMemory:\n{context}"),
("human", "{question}"),
])
chain = RunnableLambda(recall_context) | prompt | ChatOpenAI(model="gpt-4o-mini")
result = chain.invoke({"question": "What are my UI preferences?"})
print(result.content)
Using Memanto’s Built-in Answer (Optional)
For cases where you want a direct, grounded response from memory without routing through your chain, Memanto exposes an answer endpoint that uses its native RAG model. No external LLM call is made on your side.
This is useful as a quick lookup tool — for example, answering a simple factual question about a user before deciding whether to invoke the full chain.
import httpx
AGENT_ID = "my-assistant"
BASE_URL = "http://localhost:8000/api/v2"
token = httpx.post(
f"{BASE_URL}/agents/{AGENT_ID}/activate"
).json()["session_token"]
HEADERS = {"X-Session-Token": token, "Content-Type": "application/json"}
def memanto_answer(question: str) -> str:
"""Get a synthesized answer from stored memories using Memanto's native RAG."""
response = httpx.post(
f"{BASE_URL}/agents/{AGENT_ID}/answer",
headers=HEADERS,
json={"question": question},
)
response.raise_for_status()
return response.json().get("answer", "")
answer = memanto_answer("What UI preferences does Alice have?")
print(answer)
# -> "Alice prefers dark mode and concise responses."
You can also use this inside an LCEL chain as a conditional step — call memanto_answer first, and only invoke the full LLM if the memory answer is empty:
from langchain_core.runnables import RunnableLambda
def answer_or_recall(inputs: dict) -> dict:
quick = memanto_answer(inputs["question"])
if quick:
return {**inputs, "context": f"Memory answer: {quick}"}
resp = httpx.post(
f"{BASE_URL}/agents/{AGENT_ID}/recall",
headers=HEADERS,
json={"query": inputs["question"], "limit": 5},
)
memories = resp.json().get("memories", [])
context = "\n".join(f"- {m['content']}" for m in memories) or "No prior context."
return {**inputs, "context": context}
When to use answer vs recall
- Use
recall (via load_memory_variables) when your LLM should reason over the raw memories itself.
- Use
answer when you want a ready-made response from memory, or to short-circuit the chain for simple factual lookups.
Persistent Memory Across Sessions
Memories stored via save_context survive process restarts and are available in future sessions for the same agent_id:
# View stored memories
memanto recall "all context" --agent my-assistant
# Export to file
memanto memory export --agent my-assistant
Next Steps