admin管理员组

文章数量:1431903

I am having trouble using MultiQueryRetriever and PromptTemplate.

My goal is to take a list of allegations against a police officer, and using the MultiQueryRetriever, have the LLM generate one query per allegation + description combination, in order to fetch the most relevant rule broken for each allegation. I have Chroma as my vector store, it contains a police department officer rule book. To do this, I am using a custom prompt that instructs the LLM to generate one Chroma query for each allegation. In order to generate this query, it must look at the allegation and try to extract potential violations (relevant to the allegation) from the description, then form a query that can be used to fetch relevant rules from Chroma. (Take a look at the actual prompt for more detail)

This is the LineListOutputParser that I've defined:

class LineListOutputParser(BaseOutputParser[List[str]]):
    """Output parser for a list of lines."""

    def parse(self, text: str) -> List[str]:
        lines = text.strip().split("\n")
        return list(filter(None, lines))  # Remove empty lines

This is the custom prompt I've designed:

chroma_prompt = PromptTemplate(
    input_variables=["allegations", "description", "num_allegations"],
    template=(
        """You are an AI language model assistant. Your task is to analyze the following civilian complaint 
        description against a police officer, and the allegations that are raised against the officer. Identify 
        potential acts of misconduct or crimes committed by the officer, and generate {num_allegations} different queries to
        retrieve relevant sections from the Police Rulebook (one query per allegation-description combination), stored in a vector database.
        By generating multiple perspectives on the analysis, your goal is to help the user overcome some of the limitations of the 
        distance-based similarity search. Provide these alternative analyses as distinct queries, separated by newlines.
        
        Allegations made against officer: {allegations}
        Incident description: {description}
        """
    )
)

This is the code section that fetches from Chroma:

def fetch_from_chroma(allegations, description, ia_num, llm, k=2):
    """
    Fetches relevant documents from Chroma using Maximal Marginal Relevance (MMR).

    Parameters:
    - query (str): The query string.
    - ia_num (int): Internal Affairs number for logging/debugging.
    - k (int): Number of results to fetch, set to 3 by default.
    - lambda_mult (float): MMR diversity parameter. Values closer to 1 prioritize diversity, closer to 0 prioritize relevance.

    Returns:
    - context_text (str): Combined context text from retrieved documents.
    - sources (list): List of source metadata.
    """
    embedding_function = OpenAIEmbeddings()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)
    
    line_output_parser = LineListOutputParser()
    
    llm_chain = chroma_prompt | llm | line_output_parser
    
    
    retriever = MultiQueryRetriever(
        retriever=db.as_retriever(search_type="similarity", search_kwargs={"k": k}), llm_chain=llm_chain, parser_key="lines"
    )
    


    # Invoke the retriever with the input dictionary
    results = retriever.invoke({
        "allegations": ", ".join(allegations),
    "description": description,
    "num_allegations": str(len(allegations))
    })


    if len(results) == 0:
        print(f"{ia_num} - Unable to find matching results.")
        return "No Context Available", "No Sources Available"

    context_text = "\n\n---\n\n".join([doc.page_content for doc in results])
    sources = [doc.metadata.get("source", None) for doc in results]
    print(f"{ia_num} - Found matching results.")
    return context_text, sources

However, I am getting this error and have no idea why:

KeyError: "Input to PromptTemplate is missing variables {'description', 'allegations', 'num_allegations'}.  Expected: ['allegations', 'description', 'num_allegations'] Received: ['question']\nNote: if you intended {description} to be part of the string and not a variable, please escape it with double curly braces like: '{{description}}'." 

for some reason, it keeps saying that I only passed in a variable 'question', but when i call retriever.invoke(), I am clearly passing in the required variables.

here is an example input that is being passed in:

{'allegations': 'Conformance to Laws, Conduct Unbecoming, Respectful Treatment, Alcohol off Duty', 'description': 'Officer firstname Lastname fled from a taxicab without paying the fare. Officer Lastname was located by Officers from Area A‐7. Officer Lastname A‐7 where he was uncooperative with Sgt. Lastname and refused to talk to him. Sgt. Lastname escorted Officer Lastname back to the o his department equipment was received by Sgt. Lastname including a Glock 40 Serial # number, Radio # number, Handcuffs #number, 3 magazine Police Badge# number, 1 container of OC Spray and 1 bullet resistant vest. Department equipment to be turned over to Sgt. last name', 'num_allegations': '4'}

System Info:
langchain==0.3.7
langchain-community==0.2.3
langchain-core==0.3.19
langchain-google-genai==2.0.1
langchain-openai==0.2.8
langchain-text-splitters==0.3.2

using a mac

Using Python 3.11.9

本文标签: pythonKeyError with MultiQueryRetriever and Custom Prompt for Fetching data from ChromaDBStack Overflow