Skip to main content
Back to Connectors Set up the AWS S3 connector to pull voice recordings and chat transcripts from an S3 bucket into Quality AI Express for analysis. The AWS S3 Connector pulls conversation recordings and chat transcripts from an S3 bucket into Quality AI Express on a configurable schedule. Use this connector to analyze interactions from third-party Contact Center as a Service (CCaaS) solutions. The connector ingests voice recordings (stereo or mono), chat transcripts, pre-transcribed voice data, and metadata .csv files.
SectionDescription
PrerequisitesAWS IAM permissions, file formats, and platform requirements.
Supported Recording TypesVoice and chat formats — stereo, mono, pre-transcribed audio, and chat transcripts.
Data FlowHow data moves from an S3 bucket through the connector into Quality AI Express.
CSV Metadata FormatsRequired and optional CSV fields for each recording type.
JSON Transcript SchemasJSON structure for voice and chat transcript files.
Storage Structure OptionsFolder organization options for your S3 bucket.
Set Up the ConnectorSteps to create, test, and activate the connector.
TroubleshootingFixes for common authentication, file validation, and data processing issues.

Prerequisites

AWS Requirements

RequirementDetails
S3 bucketCreated in your preferred region with an organized folder structure.
IAM permissionsRead-only access (s3:GetObject, s3:ListBucket) via access keys or IAM role.
Audio filesWAV or MP3 format, maximum 50 MB each, available via HTTPS.
Chat filesJSON format.
Test fileA test.csv file with sample data in each configured S3 folder.
Required IAM policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:ListBucket"],
            "Resource": [
                "arn:aws:s3:::your-bucket-name",
                "arn:aws:s3:::your-bucket-name/*"
            ]
        }
    ]
}

Platform Requirements

RequirementDetails
Quality AI ExpressFeature enabled in platform settings.
AgentsAll agents onboarded with valid, matching email addresses.
QueuesService queues configured and ready for mapping.
PermissionsYou have Integrations & Extensions access.

Supported Recording Types

TypeFormatFiles per ConversationChannel AssignmentAnalytics
Stereo VoiceWAV/MP31Left = Agent, Right = CustomerFull Analytics
Mono VoiceWAV/MP32 (separate agent/customer files)N/AEnhanced Analytics
Voice TranscriptsJSON1Pre-transcribed audioText Analytics
Chat ScriptsJSON1Message-level attributionFull Text Analytics

Mono Recording Requirement

Mono recordings require two separate audio files — one for the agent and one for the customer. A single mixed mono file is not supported and significantly reduces transcription accuracy.
SupportedNot Supported
conv-123456-agent.wav (agent only) + conv-123456-customer.wav (customer only)conv-123456-mixed.wav (both speakers combined)

Data Flow

[S3 Bucket]
└── Audio + Metadata Files


[S3 Connector]
└── Validation + Ingestion


[Quality AI Express]
└── Transcription + Analysis


[Analytics Dashboard]
├── Quality Score
├── Sentiment
└── Topics

CSV Metadata Formats

Each recording type requires specific CSV fields. The core fields are the same across all types, only the recording-specific fields differ.

Stereo Voice Recordings

Configuration: recordingType = stereo, channelType = voice
FieldRequiredTypeExampleNotes
conversationIdRequiredStringconv-123456Unique identifier, max 50 chars.
agentEmailRequiredStringjohndoe@example.comMust match a platform user account.
conversationStartTimeRequiredString2025-04-10T14:30:00ZISO 8601, UTC timezone.
conversationEndTimeRequiredString2025-04-10T14:32:45ZEnter a time after start time.
channelTypeRequiredStringvoiceAlways voice for audio.
recordingTypeRequiredStringstereoAlways stereo for this format.
chatScriptUrlRequiredStringhttps://your-domain.com/path/to/chat-transcript.jsonFull HTTPS URL to the JSON transcript file.
recordingUrlRequiredStringhttps://your-domain.com/path/to/recording.wavHTTPS URL to the audio file.
transcriptUrlRequiredStringhttps://your-bucket.s3.amazonaws.com/transcripts/voice-123.jsonFull HTTPS URL to the JSON transcript file.
queueIdRequiredStringsupport-tier1Must exist in queue mapping.
agentChannelRequiredInteger0Agent audio channel (0 = left, 1 = right).
customerChannelRequiredInteger1Customer audio channel (0 = left, 1 = right).
languageOptionalStringenISO 639-1 format, defaults to en.
asproviderOptionalStringmicrosoftAudio service provider.

Mono Voice Recordings

Configuration: recordingType = mono, channelType = voice
Mono recordings require two separate CSV rows and two audio files per conversation — one for the agent and one for the customer. Use the same conversationId for both rows.
FieldRequiredTypeExampleNotes
conversationIdRequiredStringconv-123456Same ID for both agent and customer rows.
agentEmailRequiredStringjohndoe@example.comMust match a platform user account.
conversationStartTimeRequiredString2025-04-10T14:30:00ZISO 8601, UTC timezone.
conversationEndTimeRequiredString2025-04-10T14:32:45ZMust be after start time.
channelTypeRequiredStringvoiceAlways voice for audio.
recordingTypeRequiredStringmonoAlways mono for this format.
agentRecordingsRequiredStringhttps://s3.amazonaws.com/bucket/conv-123456-agent.wavURL to the agent audio file.
customerRecordingsRequiredStringhttps://s3.amazonaws.com/bucket/conv-123456-customer.wavURL to the customer audio file.
queueIdRequiredStringsupport-tier1Must exist in queue mapping.
agentIdOptionalStringagent-789Internal agent identifier.
languageOptionalStringenISO 639-1 format, defaults to en.
asProviderOptionalStringmicrosoftTranscription provider.

Voice Transcripts (Pre-transcribed Audio)

Configuration: recordingType = transcription, channelType = voice Use this format when you have transcribed voice recordings and want to import the text for analysis without reprocessing the audio.
FieldRequiredTypeExampleNotes
conversationIdRequiredStringconv-123456Unique identifier, max 50 chars.
agentEmailRequiredStringjohndoe@example.comMust match a platform user account.
conversationStartTimeRequiredString2025-04-10T14:30:00ZISO 8601, UTC timezone.
conversationEndTimeRequiredString2025-04-10T14:32:45ZMust be after start time.
channelTypeRequiredStringvoiceAlways voice for audio transcripts.
recordingTypeRequiredStringtranscriptionAlways transcription for this format.
transcriptPathRequiredStringtranscripts/voice-123.jsonPath to the JSON transcript file.
queueIdRequiredStringsupport-tier1Must exist in queue mapping.
languageOptionalStringenISO 639-1 format, defaults to en.
asProviderOptionalStringmicrosoftOriginal audio service provider.

Chat Scripts (Live Chat Interactions)

Configuration: recordingType = transcription, channelType = chat Use this format for live chat interactions from web chat, WhatsApp, Facebook Messenger, and other messaging platforms.
For conversations involving agent or queue transfers, use the queueId of the queue where the conversation ended and the agentEmail of the agent who closed the conversation.
FieldRequiredTypeExampleNotes
conversationIdRequiredStringconv-123456Unique identifier, max 50 chars.
agentEmailRequiredStringjohndoe@example.comMust match a platform user account.
conversationStartTimeRequiredString2025-04-10T14:30:00ZISO 8601, UTC timezone.
conversationEndTimeRequiredString2025-04-10T14:45:00ZMust be after start time.
channelTypeRequiredStringchatAlways chat for text interactions.
recordingTypeRequiredStringtranscriptionAlways transcription for chat.
transcriptPathRequiredStringtranscripts/chat-123.jsonPath to the JSON transcript file.
queueIdRequiredStringsupport-tier1Must exist in queue mapping.
languageOptionalStringen-USDefaults to en if not specified.

Agent and Queue Validation

The system validates agent and queue combinations as follows:
  • Process the conversation when the system has onboarded the queue, even if the system hasn’t onboarded the agent.
  • Process the conversation when the system has onboarded both the agent and the queue, even if the agent isn’t mapped to the queue.
  • Reject the conversation when the system hasn’t onboarded or configured the queue.

JSON Transcript Schemas

Voice Transcript Format

Full example:
{
  "recognizedPhrases": [
    {
      "recognitionStatus": "Success",
      "channel": 0,
      "offset": "PT14S",
      "duration": "PT2.4S",
      "offsetInTicks": 140000000.0,
      "durationInTicks": 24000000.0,
      "durationMilliseconds": 2400,
      "offsetMilliseconds": 14000,
      "nBest": [
        {
          "confidence": 0.8205426,
          "lexical": "yes one four three four two six",
          "itn": "yes 143426",
          "maskedITN": "yes one four three four two six",
          "display": "Yes, 143426.",
          "words": [
            {
              "word": "yes",
              "offset": "PT14S",
              "duration": "PT0.32S",
              "offsetInTicks": 140000000.0,
              "durationInTicks": 3200000.0,
              "durationMilliseconds": 320,
              "offsetMilliseconds": 14000,
              "confidence": 0.51653963
            }
          ]
        }
      ]
    }
  ]
}
Required fields only:
{
  "recognizedPhrases": [
    {
      "channel": 0,
      "offsetInTicks": 140000000.0,
      "nBest": [
        {
          "lexical": "yes one four three four two six",
          "words": [
            {
              "word": "yes",
              "offsetInTicks": 140000000.0,
              "durationInTicks": 3200000.0,
              "confidence": 0.51653963
            }
          ]
        }
      ]
    }
  ]
}

Chat Transcript Format

Example:
{
  "1": {
    "type": "AGENT",
    "text": "Good afternoon, how can I help you today?",
    "timestamp": 1749562206000,
    "userId": "johndoe@example.com"
  },
  "2": {
    "type": "USER",
    "text": "I need help with my account balance.",
    "timestamp": 1749562253142,
    "userId": "customer_12345"
  }
}
Required fields:
FieldValuesNotes
typeAGENT, USER, or SYSTEMIdentifies the speaker.
textMessage contentThe message text.
timestampUnix timestamp in millisecondsMessage send time.
userIdParticipant identifierAgent email or customer ID.

Storage Structure Options

Choose a folder structure for your S3 bucket before setting up the connector. Option A: Unified Path
s3://your-bucket/conversations/
├── metadata.csv                    # All interaction metadata
├── audio/
│   ├── conv-123456.wav            # Stereo recording
│   ├── conv-123457-agent.wav      # Mono - agent only
│   ├── conv-123457-customer.wav   # Mono - customer only  
│   └── conv-123458.wav            # Stereo recording
├── chat/
│   ├── chat-123459.json           # Chat transcript
│   └── chat-123460.json           # Chat transcript
└── test.csv                       # Required for validation
Option B: Separate Voice and Chat Paths
s3://your-bucket/
├── voice-interactions/
│   ├── voice_metadata.csv
│   ├── recordings/
│   │   ├── conv-123456.wav            # Stereo
│   │   ├── conv-123457-agent.wav      # Mono agent
│   │   └── conv-123457-customer.wav   # Mono customer
│   └── test.csv
└── chat-interactions/
    ├── chat_metadata.csv
    ├── transcripts/
    │   ├── chat-123459.json
    │   └── chat-123460.json
    └── test.csv

Set Up the Connector

Before you begin, verify that your S3 bucket is ready:
  • All audio files are available via HTTPS URLs.
  • .csv files contain the required fields with correct column headers.
  • Mono recordings have separate agent and customer files.
  • A test.csv file exists in each configured folder.
File and folder names shouldn’t contain any spaces or special characters.

Step 1: Create the Connector

  1. Navigate to Quality AI > Configure > Connectors.
  2. Select + Add Connector > Amazon S3 > Connect.
  3. Enter a Name for the connector.
  4. Select your AWS Region.
  5. Choose an Auth Type and enter your credentials:
    • Access Keys: Enter your Access Key and Secret Key.
    • IAM Role: Enter the IAM Role ARN.
  6. Set the folder path:
    • Unified Path: Enter a single path for both voice and chat (for example, s3://your-bucket/conversations/).
    • Separate Paths: Enter a Voice Path and a Chat Path.

Step 2: Test the Connection

  1. Select the Test tab in the connector configuration.
  2. Select Test to validate the configuration. If validation fails, correct the issue and select Re-Test.
  3. Verify that the following validation checks pass:
    • Authentication: Validates your AWS credentials and S3 bucket access.
    • Filepath Availability & Accessibility: Verifies that the configured S3 path exists and is available.
    • File Format Validation: Verifies that supported file formats are present and readable.
    • Metadata Validation: Verifies that required metadata fields are present and valid.
The Test tab displays validation progress and any errors.
The connector must pass all validation checks before you can proceed with queue mapping and activation.

Step 3: Map Queues and Configure the Schedule

  1. Navigate to the Queue tab.
  2. Map each queueId from the .csv files to a Quality AI Express queue. Values must match exactly.
  3. Navigate to the Schedule tab.
  4. Set the Interval (minutes, hours, or days) and Start Time (UTC).
  5. Select Save to activate the connector.
After saving, verify the setup is complete:
  • The system saves and validates the queue mappings.
  • The processing schedule is active.
  • The first ingestion job displays in the Log tab.
  • The system reports no processing errors.
The setup is complete when conversations appear in Quality AI Express dashboards and analytics data is available for ingested interactions.

Troubleshooting

Authentication

ProblemSymptomResolution
Invalid CredentialsAuthentication failed.Verify the access key, secret key, IAM role ARN, and credential expiration.
Permission DeniedAccess denied to S3 bucket.Add S3 read permissions to the IAM user or role. Verify the bucket policy and region.

File Validation

ProblemSymptomResolution
File Access ErrorFile or path not found.Verify the bucket name, region, folder path, and file availability.
Invalid File Format or MetadataValidation failed.Confirm that test.csv exists and contains the required structure, column headers, and timestamps.

Data Processing

ProblemSymptomResolution
Timestamp ErrorsInvalid timestamp format.Use ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ) in UTC. Verify that the end time is after the start time.

Performance Considerations

Processing time depends on conversation length, file size, ASR transcription latency for voice interactions, and LLM processing time.