Amazon S3 Connector

Back to Search AI connectors list The Amazon S3 Connector allows seamless integration between Amazon S3 and Search AI, enabling the ingestion of files stored in Amazon S3 buckets into the Search AI platform. By connecting to an Amazon S3 account, users can retrieve and make their content available for intelligent search and analysis.

Specification	Details
Repository type	Cloud
Supported content	Files in buckets (.pdf, .txt, .ppt, .docx)
RACL support	No
Content filtering	Yes (Advanced Filters for paths and file extensions)

To integrate Search AI with the Amazon S3 account and ingest data from it, follow the steps listed below.

Generate an Access Key for the Amazon S3 account from which content is to be ingested.
Configure the Amazon S3 connector in Search AI.

Prerequisites

The IAM user whose credentials are used to configure the connector must have the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowListAllBuckets",
      "Effect": "Allow",
      "Action": "s3:ListAllMyBuckets",
      "Resource": "*"
    },
    {
      "Sid": "BucketAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket",
        "s3:GetBucketAcl",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::prod.ftp.domain.com.d",
        "arn:aws:s3:::prod.ftp.domain.com.d/*"
      ]
    },
    {
      "Sid": "AllowSTSCallerIdentity",
      "Effect": "Allow",
      "Action": "sts:GetCallerIdentity",
      "Resource": "*"
    },
    {
      "Sid": "AllowIAMReadForPolicyBasedSync",
      "Effect": "Allow",
      "Action": [
        "iam:GetUser",
        "iam:ListAttachedUserPolicies",
        "iam:ListUserPolicies",
        "iam:GetUserPolicy",
        "iam:GetPolicy",
        "iam:GetPolicyVersion",
        "iam:ListAttachedRolePolicies",
        "iam:GetRole"
      ],
      "Resource": "*"
    }
  ]
}

Only buckets from the same region can be used for content ingestion.

Generate an Access Key

Sign in to the AWS Management Console.
Navigate to the IAM user’s details page.
Click the Security credentials tab.
Under Access keys, click Create access key.
Follow the prompts and save the Access Key ID and Secret Access Key (download the .csv file). The secret key is shown only once at creation.

Refer to the AWS documentation for detailed instructions.

Configure the Amazon S3 Connector in Search AI

On the Authorization page of the connector, provide the following fields and click Connect.

Field	Description
Name	Unique name for the connector
Access Key	Access Key ID generated in the previous step
Secret	Secret Access Key generated in the previous step
Region	AWS region of your account

Content Ingestion

After successfully connecting the Search AI connector to the Amazon S3 account, go to the Configuration tab and set up content synchronization. For immediate sync, use the Sync Now option and the Schedule Sync option to set up a scheduler to sync the content in the future. Upon sync, Search AI ingests all the files (in supported formats) from the buckets accessible to the user used to log into the connector. This content is then accessible to all the users of Search AI.

Advanced Filters

Advanced Filters allow users to refine the content that’s synced from Amazon S3. The following filter options are available:

Paths

Paths is a filter that allows users to directly specify folder paths to sync. This enables the connector to sync only the specified paths, which improves sync performance and provides greater flexibility when syncing specific folders. Users can add multiple paths. Example: ftp.domain.d/web/web/home/domain_file.support/pdf/

The path should always start from the bucket name. In the above example URL, the bucket name is ftp.domain.d.

File Extensions

File Extensions is a filter that allows users to sync files based on specific extensions. During the sync process, users can choose to include only particular file types. This helps in limiting the sync to only the required file types.

Incremental Sync

Improves sync efficiency by avoiding redundant processing of unchanged files.

The first sync will perform a Full Sync.
From the second sync onward, only newly added and modified files will be processed.
If filters (such as Advanced Filters) are changed, the system will automatically perform a Full Sync again.

Policy-Based Sync All

Ensures that the connector syncs only the resources that the configured access token has permission to access. This helps maintain proper access control and improves security during the sync process. To enable this functionality, the connector requires permissions to identify the IAM user and its associated policies. The required IAM permissions are listed in the Prerequisites section above.

Fallback Behavior

If the above IAM permissions aren’t provided, the connector will fall back to the previous approach:

The system will retrieve all buckets available to the account.
It will attempt to process files across those buckets.
Files that the access token has permission for will be processed.
Files without access permissions will be blocked by Amazon S3.

Modules

Platform Services

Administration

References

Prerequisites

Generate an Access Key

Configure the Amazon S3 Connector in Search AI

Content Ingestion

Advanced Filters

Paths

File Extensions

Incremental Sync

Policy-Based Sync All

Fallback Behavior

Modules

Platform Services

Administration

References

​Prerequisites

​Generate an Access Key

​Configure the Amazon S3 Connector in Search AI

​Content Ingestion

​Advanced Filters

​Paths

​File Extensions

​Incremental Sync

​Policy-Based Sync All

​Fallback Behavior

Prerequisites

Generate an Access Key

Configure the Amazon S3 Connector in Search AI

Content Ingestion

Advanced Filters

Paths

File Extensions

Incremental Sync

Policy-Based Sync All

Fallback Behavior