| Specification | Details |
|---|---|
| Repository type | Cloud |
| Supported content | Files in buckets (.pdf, .txt, .ppt, .docx) |
| RACL support | No |
| Content filtering | Yes (Advanced Filters for paths and file extensions) |
- Generate an Access Key for the Amazon S3 account from which content is to be ingested.
- Configure the Amazon S3 connector in Search AI.
Prerequisites
The IAM user whose credentials are used to configure the connector must have the following permissions:Only buckets from the same region can be used for content ingestion.
Generate an Access Key
- Sign in to the AWS Management Console.
- Navigate to the IAM user’s details page.
- Click the Security credentials tab.
- Under Access keys, click Create access key.
- Follow the prompts and save the Access Key ID and Secret Access Key (download the .csv file). The secret key is shown only once at creation.
Configure the Amazon S3 Connector in Search AI
On the Authorization page of the connector, provide the following fields and click Connect.| Field | Description |
|---|---|
| Name | Unique name for the connector |
| Access Key | Access Key ID generated in the previous step |
| Secret | Secret Access Key generated in the previous step |
| Region | AWS region of your account |
Content Ingestion
After successfully connecting the Search AI connector to the Amazon S3 account, go to the Configuration tab and set up content synchronization. For immediate sync, use the Sync Now option and the Schedule Sync option to set up a scheduler to sync the content in the future. Upon sync, Search AI ingests all the files (in supported formats) from the buckets accessible to the user used to log into the connector. This content is then accessible to all the users of Search AI.Advanced Filters
Advanced Filters allow users to refine the content that’s synced from Amazon S3. The following filter options are available:Paths
Paths is a filter that allows users to directly specify folder paths to sync. This enables the connector to sync only the specified paths, which improves sync performance and provides greater flexibility when syncing specific folders. Users can add multiple paths. Example: ftp.domain.d/web/web/home/domain_file.support/pdf/The path should always start from the bucket name. In the above example URL, the bucket name is ftp.domain.d.
File Extensions
File Extensions is a filter that allows users to sync files based on specific extensions. During the sync process, users can choose to include only particular file types. This helps in limiting the sync to only the required file types.Incremental Sync
Improves sync efficiency by avoiding redundant processing of unchanged files.- The first sync will perform a Full Sync.
- From the second sync onward, only newly added and modified files will be processed.
- If filters (such as Advanced Filters) are changed, the system will automatically perform a Full Sync again.
Policy-Based Sync All
Ensures that the connector syncs only the resources that the configured access token has permission to access. This helps maintain proper access control and improves security during the sync process. To enable this functionality, the connector requires permissions to identify the IAM user and its associated policies. The required IAM permissions are listed in the Prerequisites section above.Fallback Behavior
If the above IAM permissions aren’t provided, the connector will fall back to the previous approach:- The system will retrieve all buckets available to the account.
- It will attempt to process files across those buckets.
- Files that the access token has permission for will be processed.
- Files without access permissions will be blocked by Amazon S3.