-
Hi, I am building a setup that is similar to this example: LangGraph Agentic RAG The retriever_tool now always returns the page content of the retrieved documents as one large string where the pages are separated by '\n\n'. I want to know the source of these retrieved documents so I want to have a look at the metadata. How can I extract the retrieved Docuement objects and their metadata in this setup? Please let me know how I should modify the code as referenced above in order to implement this. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
The retriever tool is a pretty simple around a retriever that:
The retriever is turned into a tool here:
It is in step 3 where this formatting is happening. The simplest solution is probably to NOT use the off-the-shelf |
Beta Was this translation helpful? Give feedback.
-
This is how one can reproduce this issue. I am wondering why Langchain tools do not support backward compatibility. It feels disappointing to build a custom retriever tool for the new Langchain version to do exactly what the previous version was doing nicely. langchain 0.0.324 from langchain.vectorstores import FAISS
from langchain.agents.agent_toolkits import create_retriever_tool
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)
vectorstore = FAISS.load_local("data/faiss_index", embeddings)
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 2},
)
tool = create_retriever_tool(
retriever,
name="search_knowledge_base",
description="Submit a question about the food review process, ...",
)
tool.invoke("what is food?")
[Document(page_content='# Food 1\n\n## Subsection\n\n ...', metadata={'source': 'data/food/food_1.md'}),
Document(page_content='# Food 2\n\n## Best Practices:\n\n- ...', metadata={'source': 'data/food/food_2.md'}), langchain 0.2.10 from langchain.vectorstores import FAISS
from langchain.agents.agent_toolkits import create_retriever_tool
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", chunk_size=1)
vectorstore = FAISS.load_local("data/faiss_index", embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 2},
)
tool = create_retriever_tool(
retriever,
name="search_knowledge_base",
description="Submit a question about the food review process, ...",
)
tool.invoke("what is food?")
'# Food 1\n\n## Subsection\n\n ... \n\n# Food 2\n\n## Best Practices:\n\n- ...' |
Beta Was this translation helpful? Give feedback.
The retriever tool is a pretty simple around a retriever that:
The retriever is turned into a tool here:
It is in step 3 where this formatting is happening. The simplest solution is probably to NOT use the off-the-shelf
create_retriever_tool
and instead write your own tool (add call the raw retriever inside there, get back raw documents with metadata, and do whatever you want there)