Skip to main content

LlamaIndex + MEMANTO

LlamaIndex Give your LlamaIndex agents and query engines persistent memory across sessions using MEMANTO. LlamaIndex excels at querying documents and data, but context resets between runs. MEMANTO adds a semantic memory layer so your agents can store insights, user preferences, and decisions — and recall them later.

How It Works

LlamaIndex Agent -> MEMANTO FunctionTools (remember / recall / answer) -> MEMANTO Server -> Moorcheh.ai
MEMANTO is wired in as three FunctionTool instances (remember, recall, answer) that your LlamaIndex agent can call during reasoning. The agent decides when to store something, when to search raw memories, and when to get a synthesized answer directly from memory.

Prerequisites

Install

pip install memanto llama-index llama-index-llms-openai httpx

Step 1: Start MEMANTO Server

memanto serve

Step 2: Create the Memory Tools

Create memanto_tools.py:
import os
import httpx
from llama_index.core.tools import FunctionTool

MEMANTO_URL = "http://localhost:8000"
API_KEY = os.environ["MOORCHEH_API_KEY"]
AGENT_ID = "llamaindex-agent"

# Activate session once at startup
_token = httpx.post(
    f"{MEMANTO_URL}/api/v2/agents/{AGENT_ID}/activate",
    headers={"Authorization": f"Bearer {API_KEY}"}
).json()["session_token"]

_HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "X-Session-Token": _token
}

def remember(content: str, memory_type: str = "fact") -> str:
    """
    Store important information in long-term memory.

    Args:
        content: The information to store.
        memory_type: Category of memory. Options: fact, preference,
                     decision, goal, commitment, event, error.
    """
    response = httpx.post(
        f"{MEMANTO_URL}/api/v2/agents/{AGENT_ID}/remember",
        params={"memory_type": memory_type, "content": content},
        headers=_HEADERS
    )
    response.raise_for_status()
    return f"Stored memory: {response.json()['memory_id']}"

def recall(query: str) -> str:
    """
    Search long-term memory for relevant information.

    Args:
        query: A natural language question or topic to search for.
    """
    response = httpx.get(
        f"{MEMANTO_URL}/api/v2/agents/{AGENT_ID}/recall",
        params={"query": query, "limit": 5},
        headers=_HEADERS
    )
    response.raise_for_status()
    memories = response.json().get("memories", [])
    if not memories:
        return "No relevant memories found."
    return "
".join(f"- [{m['type']}] {m['content']}" for m in memories)

def answer(question: str) -> str:
    """
    Get a synthesized answer from long-term memory using MEMANTO's built-in RAG.

    Args:
        question: A natural language question to answer from stored memories.

    Use this when you want a ready-to-use response instead of raw memory items.
    MEMANTO answers using its native model — no extra LLM call needed.
    """
    response = httpx.post(
        f"{MEMANTO_URL}/api/v2/agents/{AGENT_ID}/answer",
        params={"question": question},
        headers=_HEADERS
    )
    response.raise_for_status()
    return response.json().get("answer", "No answer found.")

remember_tool = FunctionTool.from_defaults(fn=remember)
recall_tool = FunctionTool.from_defaults(fn=recall)
answer_tool = FunctionTool.from_defaults(fn=answer)

Step 3: Build the Agent

Create agent.py:
import os
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from memanto_tools import remember_tool, recall_tool, answer_tool

llm = OpenAI(model="gpt-4o-mini", temperature=0)

agent = ReActAgent.from_tools(
    tools=[remember_tool, recall_tool, answer_tool],
    llm=llm,
    verbose=True,
    system_prompt=(
        "You are a helpful assistant with long-term memory. "
        "When you learn something important about the user, store it with the remember tool. "
        "Use recall to search raw memories, or answer to get a synthesized response from memory."
    )
)

# Agent stores user preferences to memory
response = agent.chat("I prefer dark mode and concise answers. Please remember this.")
print(response.response)

# Agent recalls preferences before answering
response = agent.chat("How should I configure my editor?")
print(response.response)

Step 4: Run

export MOORCHEH_API_KEY=mk_your_api_key
export OPENAI_API_KEY=sk_your_openai_key
python agent.py

Getting Synthesized Answers from Memory

The answer_tool calls MEMANTO’s built-in RAG — it synthesizes a direct response from stored memories using MEMANTO’s native model. No extra LLM token usage on your side.
# Agent picks the right tool automatically based on the question
response = agent.chat("What are my editor preferences?")
# -> Agent calls answer_tool, returns: "You prefer dark mode and concise answers."

response = agent.chat("List everything you know about my setup.")
# -> Agent calls recall_tool, returns raw memory items for full reasoning
When to use answer_tool vs recall_tool
  • Use recall_tool when the agent needs to reason over multiple raw memory items.
  • Use answer_tool when the agent (or user) needs a clean, direct response from memory.

Using with a Query Engine

Combine MEMANTO memory with LlamaIndex document retrieval:
import os, httpx
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from memanto_tools import remember_tool, recall_tool, answer_tool

# Load your documents
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Wrap the query engine as a tool
doc_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="document_search",
    description="Search the project documentation for specific information."
)

# Agent now has both: document search + persistent memory
agent = ReActAgent.from_tools(
    tools=[doc_tool, remember_tool, recall_tool, answer_tool],
    llm=OpenAI(model="gpt-4o-mini"),
    verbose=True
)

# Agent searches docs and stores key findings in memory
response = agent.chat("What is the deployment process? Remember the key steps.")
print(response.response)

# Later: agent recalls the steps without re-reading docs
response = agent.chat("Walk me through the deployment steps again.")
print(response.response)

Persistent Memory Across Sessions

Because memories live in MEMANTO and not in-process, they persist across agent restarts:
# Check what the agent has remembered
memanto recall "user preferences" --agent llamaindex-agent

# Export all memories
memanto memory export --agent llamaindex-agent

Next Steps