Retrieval and Answer Configuration Guide

This guide covers the configuration of retrieval strategies, answer generation, and search results in Search AI. These settings determine how content is retrieved from your index and how responses are delivered to users. Navigate to Responses > Retrieval Strategies to access these settings.

Retrieval Mode

Search AI supports two retrieval modes that determine how content is retrieved from the index before answer generation.

Retrieval Mode	Description
Default Retrieval	Executes a single retrieval across all indexed content. A single set of qualification criteria (similarity score, proximity threshold, Top K chunks, and token budget) is applied to all sources before the retrieved chunks are sent to the LLM. This is the default retrieval mode for Search AI.
Parallel Retrieval	Executes multiple independent retrievals (2–8), each scoped by its own conditions and qualification settings. The qualified chunks from all retrievals are merged before answer generation, enabling balanced retrieval across multiple content sources. Learn More.

Parallel Retrieval Configuration

Parallel Retrieval enables Search AI to execute multiple independent retrievals for a single search query. Each retrieval targets a specific subset of indexed content using its own conditions and qualification settings. The results from all configured retrievals are combined before answer generation. This approach improves result coverage across diverse content sources by allowing each retrieval to independently qualify relevant content instead of competing for the same top-ranked results. Click on +Add to add a new parallel retrieval config. You can configure upto 8 parallel retrieval calls. Each call has its own sources and qualification criteria. Parallel Configuration Properties

Parameter	Description
Name	Unique name that identifies the parallel retrieval. The name is used in the configuration, API (`parallelCalls`), and debug information. Spaces are not allowed; underscores (`_`) and hyphens (`-`) are supported.
Conditions	Defines the content included in this retrieval. Each condition consists of a Field, Operator, and Value. Multiple conditions are evaluated using AND logic. At least one condition is required.
Similarity Score Threshold	Specifies the minimum similarity score required for a chunk to qualify for retrieval. Higher values increase precision, while lower values increase coverage.
Proximity Threshold	Limits how much lower-ranked chunks can differ from the highest-ranked chunk. Enable this option to return only closely related results.
Top K Chunks (Text)	Specifies the maximum number of qualified text chunks retrieved for this retrieval before answer generation.
Chunks Sent to LLM	Specifies how many of the qualified chunks from this retrieval are included in the LLM prompt for answer generation. Default value: 5
Token Budget for Chunks	Specifies the maximum number of tokens that this retrieval can contribute to the LLM prompt. The combined token usage across all retrievals should fit within the model’s context window. Default value: 5000

Runtime Behavior

When Parallel Retrieval is enabled:

Search AI executes all configured retrievals independently.
Each retrieval applies its own qualification criteria.
Qualified chunks from all retrievals are combined based on their scores .
From each retrieval, the configured number of Chunks Sent to LLM are selected. The combined set is then constrained by the configured token budgets before being sent to the LLM..
A single answer is then generated using the selected chunks.

Constraints

The following constraints apply when using Parallel Retrieval:

Configure a minimum of 2 and a maximum of 8 parallel retrievals.
Every retrieval must have a unique name.
Each retrieval must contain at least one valid condition.
Each condition must specify a field, operator, and value.
Document-level processing is automatically disabled when Parallel Retrieval is enabled.

To execute only selected retrievals for a request via the Advanced Search API, use the optional parallelCalls field. Learn More.
When Parallel Retrieval is enabled, debug information includes details for each executed retrieval.

Retrieval Strategies

Configure the chunk retrieval strategy and corresponding thresholds for finding relevant content.

Retrieval Methods

Search AI supports two retrieval methods. The choice depends on the nature of your content, the type of queries your users ask, and the precision required.

Strategy	Description	Best For
Vector Retrieval	Uses cosine similarity between query vector and chunk vectors. Scores range from 0 (no match) to 1 (complete match)	Semantic similarity matching, contextual queries
Hybrid Retrieval	Combines keyword-based matching with vector-based scoring — keyword matching captures exact terms and text patterns while vector scoring handles semantic meaning, leveraging strengths of both approaches	Balanced precision and recall, recommended when content has both structured terminology and natural language

How to choose

Use Vector Retrieval when queries are conversational or conceptual and exact keyword matches are less important.
Use Hybrid Retrieval (default) when content contains specific terminology, product names, or structured data where keyword matching adds precision on top of semantic search.

Hybrid Retrieval is the default retrieval strategy.

Qualification Criteria

Parameter	Description	Range	Default
Similarity Score Threshold	Minimum similarity score for a chunk to qualify. Chunks below this score are discarded. Higher values require closer matches.	0-100	20
Proximity Threshold	How closely retrieved chunks must be located relative to the highest-ranking chunk. Chunks beyond this threshold are discarded. The lower the value of the proximity threshold, the closer the chunks are.	0-50	20
Top K Chunks	Maximum number of qualified chunks sent to the LLM as context for answer generation.	-	20
Token Budget for Chunks	Maximum tokens allocated for chunks sent to the LLM. The combined total of chunk tokens, prompt, query, conversation context, and expected response must stay within the LLM’s context window.	1-1,000,000	20,000

When setting the Token Budget for Chunks, account for your LLM’s total context window size minus the tokens used by the prompt, query, and expected response. See Token Management for guidance.

Answer Generation

Configure how responses are composed and delivered to users. Navigate to Responses > Answer Configuration to access these settings.

Answer Components

Component	Description
Answer Text	The generated response addressing the user’s question
Snippet Reference	Link to source as citation for further reading

Answer Types

Type	Description	Configuration
Extractive	Top chunk retrieved is directly presented as-is without text changes	Configure Response Length (tokens)
Generative	Top chunks are sent to configured LLM, which generates a paraphrased answer	Requires LLM integration and enabled Answer Generation in GenAI Tools

Generative Answer Configuration

Chunk Settings: Token Budget for Chunks: Specifies the total tokens that can be included in chunks sent to the LLM. Default: 20,000. Maximum: 1,000,000. To calculate the right value: subtract the tokens used by the prompt, instructions, and expected response from the LLM’s maximum context window. The remainder is the maximum token budget for chunks. Example: For a 4,096-token context window — if the prompt uses 500 tokens and the response uses 500 tokens, up to 3,096 tokens remain for chunks. At 500 tokens per chunk, that’s 6 chunks maximum. To limit to 3 chunks, set the budget to 1,500. Enable Document Level Processing: When enabled, Search AI sends full documents to the LLM instead of individual chunks. This is useful when relevant information is distributed across multiple chunks and sending only a few may result in incomplete answers. Search AI identifies and sends complete documents associated with the most relevant chunks, up to the defined token budget.

Setting	Description	Default	Max
Token Budget for Chunks	Total tokens for chunks sent to LLM	20,000	1,000,000
Enable Document Level Processing	Send full documents instead of just chunks for richer context	Disabled	-
Token Budget for Documents	Maximum tokens when sending full documents	50,000	100,000

Chunk Order Options: The order of data chunks can affect the context and thereby, the results of a user query. The decision to use a specific chunk order should align with the goals of the task and the nature of the data being processed.

Order	Description	Use Case
Most to Least Relevant	Highest relevance first, then decreasing	Standard prioritization
Least to Most Relevant	Lowest relevance first, most relevant at end followed by query	When recency in context matters

LLM Configuration:

Setting	Description
Select Generative Model	Choose from configured LLM models
Answer Prompt	Select prompt template for answer generation
Temperature	Controls randomness (lower = more deterministic, higher = more creative)
Response Length	Expected answer length in tokens

Feedback Configuration: Enable feedback mechanism to allow users to rate answers. When this is enabled, the web SDK automatically includes the feedback options for the users. Feedback data appears in Answer Insights analytics.

Response Streaming

Enable real-time token-by-token response delivery for Web/Mobile SDK channels, reducing perceived latency for longer answers.

Streaming is configured via prompt settings and is currently supported only for OpenAI and Azure OpenAI models. Not available for API-based responses.

See Enable Response Streaming.

Search Results

Search results display a ranked list of documents or chunks by relevance, presenting each with a title and a snippet. Navigate to Responses > Search Results to enable and configure this feature. Unlike answers — which provide a single, focused response to a query — search results are more useful when broader information is needed

When to Use Search Results vs Answers

Use Case	Recommended
Direct, specific questions	Answers
Broad topic exploration	Search Results
Complex queries requiring comparisons	Search Results
Debugging/troubleshooting with multiple sources	Search Results

How Search Results Are Generated

When enabled, Search AI processes the user’s query, retrieves the most relevant chunks from the index, organizes them by their corresponding documents, and presents them with relevant metadata for each chunk. A maximum of three chunks per document are displayed, ordered by descending relevance.

Advanced Search API: Search results are returned as part of the results object in the response.
Web SDK: When both answers and search results are enabled, the answer is displayed first, followed by the search results.

When both search results and extractive answers are enabled, the top search result matches the answer. To avoid redundancy, the highest-ranking result is omitted from the search results list — results start from the next most relevant entry.

Configuration Settings

Setting	Description	Range	Default
Number of Search Results	Maximum chunks displayed	1-100	20

Filters enable users to narrow results based on specific criteria, useful for large result sets. Filter Types:

Type	Description
Static	Fixed, predefined filter values
Dynamic	Values derived from search results

Filter UI Options:

UI Type	Availability	Selection
Tabs	Static filters only	Single value, string fields only
Single Select	Dynamic filters	One value at a time
Multi Select	Dynamic filters	Multiple values concurrently

Creating Filters

Provide unique Filter Name
Select Filter Type (Static or Dynamic)
Choose Field for filtering
Select Filter UI style

Filter Rules:

Only one tab-style filter can be enabled at a time.
Only string fields can be used with tab-style UI.
Two filters cannot use the same field concurrently - only one filter per field can be enabled at a time.
A filter applies only if the search results contain content for the specified field.

Default Filters

Every application includes default filters that can be updated, deleted, or disabled as needed.

Business Solutions

Retrieval and Answer Configuration Guide

Retrieval Mode

Parallel Retrieval Configuration

Runtime Behavior

Constraints

Retrieval Strategies

Retrieval Methods

Qualification Criteria

Answer Generation

Answer Components

Answer Types

Generative Answer Configuration

Response Streaming

Search Results

When to Use Search Results vs Answers

How Search Results Are Generated

Configuration Settings

Filters (Facets)

Creating Filters

Default Filters

​Retrieval Mode

​Parallel Retrieval Configuration

​Runtime Behavior

​Constraints

​Retrieval Strategies

​Retrieval Methods

​Qualification Criteria

​Answer Generation

​Answer Components

​Answer Types

​Generative Answer Configuration

​Response Streaming

​Search Results

​When to Use Search Results vs Answers

​How Search Results Are Generated

​Configuration Settings

​Filters (Facets)

​Creating Filters

​Default Filters

Retrieval Mode

Parallel Retrieval Configuration

Runtime Behavior

Constraints

Retrieval Strategies

Retrieval Methods

Qualification Criteria

Answer Generation

Answer Components

Answer Types

Generative Answer Configuration

Response Streaming

Search Results

When to Use Search Results vs Answers

How Search Results Are Generated

Configuration Settings

Filters (Facets)

Creating Filters

Default Filters