This guide covers the configuration of retrieval strategies, answer generation, and search results in Search AI. These settings determine how content is retrieved from your index and how responses are delivered to users.
Navigate to Responses > Retrieval Strategies to access these settings.
Retrieval Strategies
Configure the chunk retrieval strategy and corresponding thresholds for finding relevant content.
Retrieval Methods
Search AI supports two retrieval methods. The choice depends on the nature of your content, the type of queries your users ask, and the precision required.
| Strategy | Description | Best For |
|---|
| Vector Retrieval | Uses cosine similarity between query vector and chunk vectors. Scores range from 0 (no match) to 1 (complete match) | Semantic similarity matching, contextual queries |
| Hybrid Retrieval | Combines keyword-based matching with vector-based scoring — keyword matching captures exact terms and text patterns while vector scoring handles semantic meaning, leveraging strengths of both approaches | Balanced precision and recall, recommended when content has both structured terminology and natural language |
How to choose
- Use Vector Retrieval when queries are conversational or conceptual and exact keyword matches are less important.
- Use Hybrid Retrieval (default) when content contains specific terminology, product names, or structured data where keyword matching adds precision on top of semantic search.
Hybrid Retrieval is the default retrieval strategy.
Hybrid Retrieval is the default retrieval strategy.
Qualification Criteria
| Parameter | Description | Range | Default |
|---|
| Similarity Score Threshold | Minimum similarity score for a chunk to qualify. Chunks below this score are discarded. Higher values require closer matches. | 0-100 | 20 |
| Proximity Threshold | How closely retrieved chunks must be located relative to the highest-ranking chunk. Chunks beyond this threshold are discarded. The lower the value of the proximity threshold, the closer the chunks are. | 0-50 | 20 |
| Top K Chunks | Maximum number of qualified chunks sent to the LLM as context for answer generation. | - | 20 |
| Token Budget for Chunks | Maximum tokens allocated for chunks sent to the LLM. The combined total of chunk tokens, prompt, query, conversation context, and expected response must stay within the LLM’s context window. | 1-1,000,000 | 20,000 |
When setting the Token Budget for Chunks, account for your LLM’s total context window size minus the tokens used by the prompt, query, and expected response. See Token Management for guidance.
Default Configuration Summary:
- Retrieval Mechanism: Hybrid Retrieval
- Similarity Score: 20
- Proximity Threshold: 20
- Top K Chunks: 20
- Token budget for chunks: 20,000
Answer Generation
Configure how responses are composed and delivered to users.
Navigate to Responses > Answer Configuration to access these settings.
Answer Components
| Component | Description |
|---|
| Answer Text | The generated response addressing the user’s question |
| Snippet Reference | Link to source as citation for further reading |
Answer Types
| Type | Description | Configuration |
|---|
| Extractive | Top chunk retrieved is directly presented as-is without text changes | Configure Response Length (tokens) |
| Generative | Top chunks are sent to configured LLM, which generates a paraphrased answer | Requires LLM integration and enabled Answer Generation in GenAI Tools |
Generative Answer Configuration
Chunk Settings:
Token Budget for Chunks: Specifies the total tokens that can be included in chunks sent to the LLM. Default: 20,000. Maximum: 1,000,000.
To calculate the right value: subtract the tokens used by the prompt, instructions, and expected response from the LLM’s maximum context window. The remainder is the maximum token budget for chunks.
Example: For a 4,096-token context window — if the prompt uses 500 tokens and the response uses 500 tokens, up to 3,096 tokens remain for chunks. At 500 tokens per chunk, that’s 6 chunks maximum. To limit to 3 chunks, set the budget to 1,500.
Enable Document Level Processing: When enabled, Search AI sends full documents to the LLM instead of individual chunks. This is useful when relevant information is distributed across multiple chunks and sending only a few may result in incomplete answers. Search AI identifies and sends complete documents associated with the most relevant chunks, up to the defined token budget.
| Setting | Description | Default | Max |
|---|
| Token Budget for Chunks | Total tokens for chunks sent to LLM | 20,000 | 1,000,000 |
| Enable Document Level Processing | Send full documents instead of just chunks for richer context | Disabled | - |
| Token Budget for Documents | Maximum tokens when sending full documents | 50,000 | 100,000 |
Chunk Order Options:
The order of data chunks can affect the context and thereby, the results of a user query. The decision to use a specific chunk order should align with the goals of the task and the nature of the data being processed.
| Order | Description | Use Case |
|---|
| Most to Least Relevant | Highest relevance first, then decreasing | Standard prioritization |
| Least to Most Relevant | Lowest relevance first, most relevant at end followed by query | When recency in context matters |
LLM Configuration:
| Setting | Description |
|---|
| Select Generative Model | Choose from configured LLM models |
| Answer Prompt | Select prompt template for answer generation |
| Temperature | Controls randomness (lower = more deterministic, higher = more creative) |
| Response Length | Expected answer length in tokens |
Feedback Configuration:
Enable feedback mechanism to allow users to rate answers. When this is enabled, the web SDK automatically includes the feedback options for the users. Feedback data appears in Answer Insights analytics.
Response Streaming
Enable real-time token-by-token response delivery for Web/Mobile SDK channels, reducing perceived latency for longer answers.
Streaming is configured via prompt settings and is currently supported only for OpenAI and Azure OpenAI models. Not available for API-based responses.
See Enable Response Streaming.
Search Results
Search results display a ranked list of documents or chunks by relevance, presenting each with a title and a snippet.
Navigate to Responses > Search Results to enable and configure this feature.
Unlike answers — which provide a single, focused response to a query — search results are more useful when broader information is needed
When to Use Search Results vs Answers
| Use Case | Recommended |
|---|
| Direct, specific questions | Answers |
| Broad topic exploration | Search Results |
| Complex queries requiring comparisons | Search Results |
| Debugging/troubleshooting with multiple sources | Search Results |
How Search Results Are Generated
When enabled, Search AI processes the user’s query, retrieves the most relevant chunks from the index, organizes them by their corresponding documents, and presents them with relevant metadata for each chunk.
When both search results and extractive answers are enabled, the top search result matches the answer. To avoid redundancy, the highest-ranking result is omitted from the search results list — results start from the next most relevant entry.
Configuration Settings
| Setting | Description | Range | Default |
|---|
| Number of Search Results | Maximum chunks displayed | 1-100 | 20 |
Filters (Facets)
Filters enable users to narrow results based on specific criteria, useful for large result sets.
Search Results are currently accessible via the Search API only. See the
Search API reference for details.
Filter Types:
| Type | Description |
|---|
| Static | Fixed, predefined filter values |
| Dynamic | Values derived from search results |
Filter UI Options:
| UI Type | Availability | Selection |
|---|
| Tabs | Static filters only | Single value, string fields only |
| Single Select | Dynamic filters | One value at a time |
| Multi Select | Dynamic filters | Multiple values concurrently |
Creating Filters
- Provide unique Filter Name
- Select Filter Type (Static or Dynamic)
- Choose Field for filtering
- Select Filter UI style
Filter Rules:
- Only one tab-style filter can be enabled at a time.
- Only string fields can be used with tab-style UI.
- Two filters cannot use the same field concurrently - only one filter per field can be enabled at a time.
- A filter applies only if the search results contain content for the specified field.
Default Filters
Every application includes default filters that can be updated, deleted, or disabled as needed.
Search Results Access
Currently available via Search API only.