GETTING STARTED
SearchAssist Overview
SearchAssist Introduction
Onboarding SearchAssist
Build your first App
Glossary
Release Notes
What's new in SearchAssist
Previous Versions

CONCEPTS
Managing Sources
Introduction
Files
Web Pages
FAQs
Structured Data 
Connectors
Introduction to Connectors
Azure Storage Connector
Confluence Cloud Connector
Confluence Server Connector
Custom Connector
DotCMS Connector
Dropbox Connector
Google Drive Connector
Oracle Knowledge Connector
Salesforce Connector
ServiceNow Connector
SharePoint Connector
Zendesk Connector
RACL
Virtual Assistants
Managing Indices
Introduction
Index Fields
Traits
Workbench
Introduction to Workbench
Field Mapping
Entity Extraction
Traits Extraction
Keyword Extraction
Exclude Document
Semantic Meaning
Snippet Extraction
Custom LLM Prompts
Index Settings
Index Languages
Managing Chunks
Chunk Browser
Managing Relevance
Introduction
Weights
Highlighting
Presentable
Synonyms
Stop Words
Search Relevance
Spell Correction
Prefix Search
Custom Configurations
Personalizing Results
Introduction
Answer Snippets
Introduction
Extractive Model
Generative Model
Enabling Both Models
Simulation and Testing
Debugging
Best Practices and Points to Remember
Troubleshooting Answers
Answer Snippets Support Across Content Sources
Result Ranking
Facets
Business Rules
Introduction
Contextual Rules
NLP Rules
Engagement
Small Talk
Bot Actions
Designing Search Experience
Introduction
Search Interface
Result Templates
Testing
Preview and Test
Debug Tool
Running Experiments
Introduction
Experiments
Analyzing Search Performance
Overview
Dashboard
User Engagement
Search Insights
Result Insights
Answer Insights

ADMINISTRATION
General Settings
Credentials
Channels
Team
Collaboration
Integrations
OpenAI Integration
Azure OpenAI Integration
Custom Integration
Billing and Usage
Plan Details
Usage Logs
Order and Invoices
Smart Hibernation

SearchAssist APIs
API Introduction
API List

SearchAssist SDK

HOW TOs
Use Custom Fields to Filter Search Results and Answers
Add Custom Metadata to Ingested Content
Write Painless Scripts
Configure Business Rules for Generative Answers

Web Content

Organizations might already have a web page listing the features or product details that the search users might be looking for. Business users can leverage this information and enable the SearchAssist to respond to search user queries without replicating the data. 

SearchAssist allows for the content to be ingested into the application through web crawling. For example, consider a banking website. The banking website contains the bulk of the information that answers the search user queries. In this scenario, the SearchAssist application is configured to crawl the bank’s website and index all the web pages so that the indexed pages are retrieved to answer the search users’ queries.

Web Crawling

Web Crawling allows you to extract and index content from single or multiple websites to make the content ready for search.

To crawl web domains, follow the below steps:

  1. Log in to the application with valid credentials.
  2. Click the Indices tab on the top.
  3. On the left pane, under the Sources section, click Content.
  4. On the Add Content page, click Crawl Web Domain.
  5. On the Crawl Web Domain dialog box, enter the domain URL in the Source URL field.
  6. Enter a name in the Source Title field and a description in the Description field.
  7. To schedule the web crawl, under the Schedule section, turn on the toggle.
    • Set the Start Date and Time, and Frequency at which the crawl needs to be scheduled. This is possible only if the schedule toggle is turned on.

  8. Under the Crawl Option section, select an option from the drop-down list:
    • Crawl Everything – To enable crawling all the URLs that belong to the web domain.
    • Crawl Everything Except Specific URLs – To list down the URLs within the web domain that you want to ignore from crawling.
    • Crawl Only Specific URLs – To list down only the URLs that you want to crawl from the web domain.
  9. Select Crawl Settings as per your requirements
    • Java Script-rendered – allow crawling of websites with content rendered through JS code.
    • Crawl Beyond Sitemap – allow crawling the web pages above and beyond the URLs that are provided in the sitemap file of the target website.
    • Use Cookies – allow crawling the web pages that require cookie acceptance.
    • Respect robots.txt – to honor any directives from the robots.txt file for the web domain
    • Crawl Depth – The maximum depth allowed to crawl any site, the value of 0 indicates no limit
    • Max URL Limit – The maximum number of URLs to be crawled, the value of 0 indicates no limit
  10. Click Proceed.
  11. Crawl Web Domain dialog box appears with the URL validation status.
  12.  You can choose to Crawl immediately or later.

Management

Once you add content to the application, it needs to be updated as the content from websites may not be static. You can manage (schedule periodic web crawling and edit crawling) and ensure that the content is in sync with the data on the website.

Manage

Once the content has been added, you can perform the following actions:

  1. On the Content list view page, for the respective source from the list
    1. you can delete the source;
    2. recrawl in case of web content.
  2. Click on any content row for the content dialog box with the following details are displayed:
    • Name
    • Description
    • for web content
      • Pages crawled, and last updated time
      • Configurations as specified above, which are editable
      • Crawl execution details along with the log
    • for file content
      • Number of pages
      • Document preview and option to download the same
      • Date and time of update

Web Content

Organizations might already have a web page listing the features or product details that the search users might be looking for. Business users can leverage this information and enable the SearchAssist to respond to search user queries without replicating the data. 

SearchAssist allows for the content to be ingested into the application through web crawling. For example, consider a banking website. The banking website contains the bulk of the information that answers the search user queries. In this scenario, the SearchAssist application is configured to crawl the bank’s website and index all the web pages so that the indexed pages are retrieved to answer the search users’ queries.

Web Crawling

Web Crawling allows you to extract and index content from single or multiple websites to make the content ready for search.

To crawl web domains, follow the below steps:

  1. Log in to the application with valid credentials.
  2. Click the Indices tab on the top.
  3. On the left pane, under the Sources section, click Content.
  4. On the Add Content page, click Crawl Web Domain.
  5. On the Crawl Web Domain dialog box, enter the domain URL in the Source URL field.
  6. Enter a name in the Source Title field and a description in the Description field.
  7. To schedule the web crawl, under the Schedule section, turn on the toggle.
    • Set the Start Date and Time, and Frequency at which the crawl needs to be scheduled. This is possible only if the schedule toggle is turned on.

  8. Under the Crawl Option section, select an option from the drop-down list:
    • Crawl Everything – To enable crawling all the URLs that belong to the web domain.
    • Crawl Everything Except Specific URLs – To list down the URLs within the web domain that you want to ignore from crawling.
    • Crawl Only Specific URLs – To list down only the URLs that you want to crawl from the web domain.
  9. Select Crawl Settings as per your requirements
    • Java Script-rendered – allow crawling of websites with content rendered through JS code.
    • Crawl Beyond Sitemap – allow crawling the web pages above and beyond the URLs that are provided in the sitemap file of the target website.
    • Use Cookies – allow crawling the web pages that require cookie acceptance.
    • Respect robots.txt – to honor any directives from the robots.txt file for the web domain
    • Crawl Depth – The maximum depth allowed to crawl any site, the value of 0 indicates no limit
    • Max URL Limit – The maximum number of URLs to be crawled, the value of 0 indicates no limit
  10. Click Proceed.
  11. Crawl Web Domain dialog box appears with the URL validation status.
  12.  You can choose to Crawl immediately or later.

Management

Once you add content to the application, it needs to be updated as the content from websites may not be static. You can manage (schedule periodic web crawling and edit crawling) and ensure that the content is in sync with the data on the website.

Manage

Once the content has been added, you can perform the following actions:

  1. On the Content list view page, for the respective source from the list
    1. you can delete the source;
    2. recrawl in case of web content.
  2. Click on any content row for the content dialog box with the following details are displayed:
    • Name
    • Description
    • for web content
      • Pages crawled, and last updated time
      • Configurations as specified above, which are editable
      • Crawl execution details along with the log
    • for file content
      • Number of pages
      • Document preview and option to download the same
      • Date and time of update