Documentation Index
Fetch the complete documentation index at: https://docs.memanto.ai/llms.txt
Use this file to discover all available pages before exploring further.
LlamaIndex + Memanto
Give your LlamaIndex agents and query engines persistent memory across sessions using Memanto.
LlamaIndex excels at querying documents and data, but context resets between runs. Memanto adds a semantic memory layer so your agents can store insights, user preferences, and decisions - and recall them later.
How It Works
LlamaIndex Agent -> Memanto FunctionTools (remember / recall / answer) -> Memanto Server -> Moorcheh.ai
Memanto is wired in as three FunctionTool instances (remember, recall, answer) that your LlamaIndex agent can call during reasoning. The agent decides when to store something, when to search raw memories, and when to get a synthesized answer directly from memory.
Prerequisites
Install
pip install memanto llama-index llama-index-llms-openai httpx
Step 1: Start Memanto Server
Create memanto_tools.py:
import httpx
from llama_index.core.tools import FunctionTool
MEMANTO_URL = "http://localhost:8000"
AGENT_ID = "llamaindex-agent"
# Activate session once at startup
_token = httpx.post(
f"{MEMANTO_URL}/api/v2/agents/{AGENT_ID}/activate"
).json()["session_token"]
_HEADERS = {
"X-Session-Token": _token,
"Content-Type": "application/json",
}
def remember(content: str, memory_type: str = "fact") -> str:
"""
Store important information in long-term memory.
Args:
content: The information to store.
memory_type: Category of memory. Options: fact, preference,
decision, goal, commitment, event, error.
"""
response = httpx.post(
f"{MEMANTO_URL}/api/v2/agents/{AGENT_ID}/remember",
json={"content": content, "type": memory_type},
headers=_HEADERS,
)
response.raise_for_status()
return f"Stored memory: {response.json()['memory_id']}"
def recall(query: str) -> str:
"""
Search long-term memory for relevant information.
Args:
query: A natural language question or topic to search for.
"""
response = httpx.post(
f"{MEMANTO_URL}/api/v2/agents/{AGENT_ID}/recall",
json={"query": query, "limit": 5},
headers=_HEADERS,
)
response.raise_for_status()
memories = response.json().get("memories", [])
if not memories:
return "No relevant memories found."
return "\n".join(f"- [{m['type']}] {m['content']}" for m in memories)
def answer(question: str) -> str:
"""
Get a synthesized answer from long-term memory using Memanto's built-in RAG.
Args:
question: A natural language question to answer from stored memories.
Use this when you want a ready-to-use response instead of raw memory items.
Memanto answers using its native model - no extra LLM call needed.
"""
response = httpx.post(
f"{MEMANTO_URL}/api/v2/agents/{AGENT_ID}/answer",
json={"question": question},
headers=_HEADERS,
)
response.raise_for_status()
return response.json().get("answer", "No answer found.")
remember_tool = FunctionTool.from_defaults(fn=remember)
recall_tool = FunctionTool.from_defaults(fn=recall)
answer_tool = FunctionTool.from_defaults(fn=answer)
Set MOORCHEH_API_KEY on the Memanto server — clients only send X-Session-Token.
Step 3: Build the Agent
Create agent.py:
import os
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from memanto_tools import remember_tool, recall_tool, answer_tool
llm = OpenAI(model="gpt-4o-mini", temperature=0)
agent = ReActAgent.from_tools(
tools=[remember_tool, recall_tool, answer_tool],
llm=llm,
verbose=True,
system_prompt=(
"You are a helpful assistant with long-term memory. "
"When you learn something important about the user, store it with the remember tool. "
"Use recall to search raw memories, or answer to get a synthesized response from memory."
)
)
# Agent stores user preferences to memory
response = agent.chat("I prefer dark mode and concise answers. Please remember this.")
print(response.response)
# Agent recalls preferences before answering
response = agent.chat("How should I configure my editor?")
print(response.response)
Step 4: Run
export OPENAI_API_KEY=sk_your_openai_key
# MOORCHEH_API_KEY is read by the Memanto server, not by this script.
python agent.py
Getting Synthesized Answers from Memory
The answer_tool calls Memanto’s built-in RAG - it synthesizes a direct response from stored memories using Memanto’s native model. No extra LLM token usage on your side.
# Agent picks the right tool automatically based on the question
response = agent.chat("What are my editor preferences?")
# -> Agent calls answer_tool, returns: "You prefer dark mode and concise answers."
response = agent.chat("List everything you know about my setup.")
# -> Agent calls recall_tool, returns raw memory items for full reasoning
When to use answer_tool vs recall_tool
- Use
recall_tool when the agent needs to reason over multiple raw memory items.
- Use
answer_tool when the agent (or user) needs a clean, direct response from memory.
Using with a Query Engine
Combine Memanto memory with LlamaIndex document retrieval:
import os, httpx
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from memanto_tools import remember_tool, recall_tool, answer_tool
# Load your documents
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# Wrap the query engine as a tool
doc_tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="document_search",
description="Search the project documentation for specific information."
)
# Agent now has both: document search + persistent memory
agent = ReActAgent.from_tools(
tools=[doc_tool, remember_tool, recall_tool, answer_tool],
llm=OpenAI(model="gpt-4o-mini"),
verbose=True
)
# Agent searches docs and stores key findings in memory
response = agent.chat("What is the deployment process? Remember the key steps.")
print(response.response)
# Later: agent recalls the steps without re-reading docs
response = agent.chat("Walk me through the deployment steps again.")
print(response.response)
Persistent Memory Across Sessions
Because memories live in Memanto and not in-process, they persist across agent restarts:
# Check what the agent has remembered
memanto recall "user preferences" --agent llamaindex-agent
# Export all memories
memanto memory export --agent llamaindex-agent
Next Steps