This model leverages the power of large language models to generate an answer snippet from the most relevant chunks. Currently, SearchAssist supports using OpenAI and Azure OpenAI LLMs to generate the answer snippets.
Chunking and Chunk Retrieval in Generative Answers
For Generative Answers, a plain text extraction model is used. This extraction model treats all text the same way, i.e., title, sub-title, content, everything would be read as content only. Chunks are generated as per the number of tokens defined. By default, this value is set to 400. While retrieval, the top n chunks as configured matching to the user query are retrieved and sent to the LLM for answer generation.
Note:
- This extraction model extracts all available text but it will skip over images. Hence it is not recommended for documents that are primarily comprised of images.
- You can modify the number of tokens in each chunk using custom configs based on the format of the content and your specific use case requirements.
Generative Answers Configuration
To enable the Generative Model for answer snippets, the application must be integrated with one or more of the supported LLMs.
For integration, go to the Integrations page under the Manage tab. Refer to this documentation for step-by-step instructions.
After you have successfully integrated one or more of the supported third-party LLMs, you can configure it to be used for generating Answer Snippets.
Configure Generative Model
Go to the Answer Snippet configuration page and select the model to be enabled from the drop-down menu. The drop-down will list all the configured LLMs. To enable or disable the selected model, use the slider button. If an enabled integration is removed from the Integrations pages, it is automatically disabled for answer snippets.
Set the following configuration properties for the selected integration.
Similarity Score: This is the minimum expected score of the match between the user query and the chunk. It defines how closely should a chunk match the user query to qualify as an answer chunk. The higher the value of this field, the closer the match.
Temperature: This field describes the level of randomness and creativity in answer generation. A lower value of this field suggests that the generated text will be more focussed and consistent whereas a higher value indicates more creative and varied text. The value can range from 0.1 to 1.0 and can be incremented or decremented in multiples of 0.1. The default value for this field is set to 0.5.
Number of Chunks: This is the number of chunks of data to be sent to the LLM for answer generation. This can vary depending upon the type of input data to the LLM engine. This field can take values from 1 to 5 with 5 as the default value. A higher value of this field implies better coverage and more precise answers but at the same time, making multiple API calls to process each chunk can introduce additional latency and increase the cost to the user.
Prompt Editor: The Prompt Editor is a feature that allows you to tailor prompts for answer generation in the Language Model (LLM). A prompt is an instruction sent to the LLM model that enables it to return desired responses. This feature enables you to customize prompts for specific use cases, ensuring more relevant and domain-specific answers. You can define your own set of instructions, provide use-case-specific few-shot examples, and define specifications for the response to get from LLM, like tone, answer length, verbiage, etc.
The prompt editor comes with two built-in prompts: a Default Prompt that provides answers adhering to the information in the sources and a Multilingual Prompt that can be used to get answers from the content in multiple languages. You can edit the built-in prompts or write a new prompt.
To add a new prompt, click the Create Prompt button, enter a name and description of the prompt, and click Create. This will create a new prompt that you can edit and use for answer generation.
To edit an existing prompt, click the edit icon in the widget.
Train the application
Click the Train button after enabling Answer Snippets to generate the chunks from the ingested data and start searching over your data once the training is successful.
Guidelines for Prompt Customization
Prompt Structure
It is recommended that the prompt be defined as a set of messages where each message has two properties: role and content. The ‘role’ can take one of three values: ‘system’, ‘user’, or the ‘assistant’. The ‘content’ contains the text of the message from the role. Assigning roles to the messages allows you to establish a context and guide the model’s understanding of the inputs and the expected output. Click on the info icon to view the guidelines for the prompt format.
SearchAssist provides two dynamic variables that provide contextual information to the prompt.
- chunks – Chunks selected by SearchAssist to be sent to the LLM model for answer generation.
- query – User query for which the answer is to be generated.
To use these variables in the prompt, enclose them in double curly braces. Note that it is mandatory to use both of these variables in the prompt for it to be considered a valid prompt.
Points to remember
- Both system variables must be present in the prompt.
- The system message should define the answer format such that the generated answers provide the chunk_id associated with them. To do so, add an instruction like this in the prompt, “Your response should adhere to the following format: **’ Some relevant answer[chunk_id] Another relevant data[chunk_id]’**.”
Best Practices for Writing Prompts
- Add clear and detailed instructions for a precise answer.
- Provide examples.
- Specify the approximate length of the response.
Sample Answer Prompt
[ { “role”: “system”, “content”: “You are an AI system responsible for generating answers and references based on user-provided context. The user will provide context, and your task is to answer the user’s query at the end. Your response should adhere to the following format: **’Some relevant answer[chunk_id] Another relevant data[chunk_id]’**. In this format, you must strictly include the relevant answer or information followed by the chunk_id, which serves as a reference to the source of the data within the provided context. Importantly, place only the correct chunk_ids within square brackets. These chunk_ids must be located exclusively at the end of each content and indicated explicitly with a key ‘chunk id.’ Do not include any other text, words, or characters within square brackets. Your responses should also be properly formatted with all necessary special characters like new lines, tabs, and bullets, as required for clarity and presentation. If there are multiple answers present in the provided context, you should include all of them in your response. You should only provide an answer if you can extract the information directly from the content provided by the user. If you have partial information, you should still provide a partial answer. Always send relevant and correct chunk_ids with the answer fragments. You must not fabricate or create chunk_ids; they should accurately reference the source of each piece of information. If you cannot find the answer to the user’s query within the provided content, your response should be ‘I don’t know.’.GENERATE ANSWERS AND REFERENCES EXCLUSIVELY BASED ON THE CONTENT PROVIDED BY THE USER. IF A QUERY LACKS INFORMATION IN THE CONTEXT, YOU MUST RESPOND WITH ‘I don’t know’ WITHOUT EXCEPTIONS. Please generate the response in the same language as the user’s query and context. To summarize, your task is to generate well-formatted responses, including special characters like new lines, tabs, and bullets when necessary, and to provide all relevant answers from the provided context while ensuring accuracy and correctness. Each answer fragment should be accompanied by the appropriate chunk_id, and you should never create chunk_ids. Answer in the same language as the user’s query and context. }, { “role“: “user“, “content“: “From the content that I have provided, solve the query : ‘ {{query}} ‘ ‘EXCLUDE ANY irrelevant information, such as ‘Note:‘ from the response. NEVER FABRICATE ANY CHUNK_ID.” } ] |
Multilingual Support for Generative Snippets
SearchAssist supports Generative Snippets in multiple languages. This implies that it generates answers in the language of the query. For example, the answer would also be generated in Spanish if the user query is in Spanish. To enable Multilingual support, take the following steps.
- Ensure that the specific language is added and enabled under Index Language. Note that SearchAssist supports multiple languages but for Generative snippets currently only English, Spanish, and German languages are supported.
- Add appropriate instructions in the prompt. You can select the built-in Multilingual Prompt which already has the required instructions to support multilingual answers.
- With the Generative Answer model, when a user asks a question in English, and the correct answer is found, it can be presented to the user in French using the LLM, given that the LLM has support for the language. Add corresponding instructions to the prompt. However, this can not be done for the Extractive model.