Back to Connectors
Set up the AWS S3 connector to pull voice recordings and chat transcripts from an S3 bucket into Quality AI Express for analysis.
The AWS S3 Connector pulls conversation recordings and chat transcripts from an S3 bucket into Quality AI Express on a configurable schedule. Use this connector to analyze interactions from third-party Contact Center as a Service (CCaaS) solutions.
The connector ingests voice recordings (stereo or mono), chat transcripts, pre-transcribed voice data, and metadata .csv files.
Section Description Prerequisites AWS IAM permissions, file formats, and platform requirements. Supported Recording Types Voice and chat formats — stereo, mono, pre-transcribed audio, and chat transcripts. Data Flow How data moves from an S3 bucket through the connector into Quality AI Express. CSV Metadata Formats Required and optional CSV fields for each recording type. JSON Transcript Schemas JSON structure for voice and chat transcript files. Storage Structure Options Folder organization options for your S3 bucket. Set Up the Connector Steps to create, test, and activate the connector. Troubleshooting Fixes for common authentication, file validation, and data processing issues.
Prerequisites
AWS Requirements
Requirement Details S3 bucket Created in your preferred region with an organized folder structure. IAM permissions Read-only access (s3:GetObject, s3:ListBucket) via access keys or IAM role. Audio files WAV or MP3 format, maximum 50 MB each, available via HTTPS. Chat files JSON format. Test file A test.csv file with sample data in each configured S3 folder.
Required IAM policy:
{
"Version" : "2012-10-17" ,
"Statement" : [
{
"Effect" : "Allow" ,
"Action" : [ "s3:GetObject" , "s3:ListBucket" ],
"Resource" : [
"arn:aws:s3:::your-bucket-name" ,
"arn:aws:s3:::your-bucket-name/*"
]
}
]
}
Requirement Details Quality AI Express Feature enabled in platform settings. Agents All agents onboarded with valid, matching email addresses. Queues Service queues configured and ready for mapping. Permissions You have Integrations & Extensions access.
Supported Recording Types
Type Format Files per Conversation Channel Assignment Analytics Stereo Voice WAV/MP3 1 Left = Agent, Right = Customer Full Analytics Mono Voice WAV/MP3 2 (separate agent/customer files) N/A Enhanced Analytics Voice Transcripts JSON 1 Pre-transcribed audio Text Analytics Chat Scripts JSON 1 Message-level attribution Full Text Analytics
Mono Recording Requirement
Mono recordings require two separate audio files — one for the agent and one for the customer. A single mixed mono file is not supported and significantly reduces transcription accuracy.
Supported Not Supported conv-123456-agent.wav (agent only) + conv-123456-customer.wav (customer only)conv-123456-mixed.wav (both speakers combined)
Data Flow
[S3 Bucket]
└── Audio + Metadata Files
│
▼
[S3 Connector]
└── Validation + Ingestion
│
▼
[Quality AI Express]
└── Transcription + Analysis
│
▼
[Analytics Dashboard]
├── Quality Score
├── Sentiment
└── Topics
Each recording type requires specific CSV fields. The core fields are the same across all types, only the recording-specific fields differ.
Stereo Voice Recordings
Configuration : recordingType = stereo, channelType = voice
Field Required Type Example Notes conversationIdRequired String conv-123456Unique identifier, max 50 chars. agentEmailRequired String johndoe@example.comMust match a platform user account. conversationStartTimeRequired String 2025-04-10T14:30:00ZISO 8601, UTC timezone. conversationEndTimeRequired String 2025-04-10T14:32:45ZEnter a time after start time. channelTypeRequired String voiceAlways voice for audio. recordingTypeRequired String stereoAlways stereo for this format. chatScriptUrlRequired String https://your-domain.com/path/to/chat-transcript.jsonFull HTTPS URL to the JSON transcript file. recordingUrlRequired String https://your-domain.com/path/to/recording.wavHTTPS URL to the audio file. transcriptUrlRequired String https://your-bucket.s3.amazonaws.com/transcripts/voice-123.jsonFull HTTPS URL to the JSON transcript file. queueIdRequired String support-tier1Must exist in queue mapping. agentChannelRequired Integer 0Agent audio channel (0 = left, 1 = right). customerChannelRequired Integer 1Customer audio channel (0 = left, 1 = right). languageOptional String enISO 639-1 format, defaults to en. asproviderOptional String microsoftAudio service provider.
Mono Voice Recordings
Configuration : recordingType = mono, channelType = voice
Mono recordings require two separate CSV rows and two audio files per conversation — one for the agent and one for the customer. Use the same conversationId for both rows.
Field Required Type Example Notes conversationIdRequired String conv-123456Same ID for both agent and customer rows. agentEmailRequired String johndoe@example.comMust match a platform user account. conversationStartTimeRequired String 2025-04-10T14:30:00ZISO 8601, UTC timezone. conversationEndTimeRequired String 2025-04-10T14:32:45ZMust be after start time. channelTypeRequired String voiceAlways voice for audio. recordingTypeRequired String monoAlways mono for this format. agentRecordingsRequired String https://s3.amazonaws.com/bucket/conv-123456-agent.wavURL to the agent audio file. customerRecordingsRequired String https://s3.amazonaws.com/bucket/conv-123456-customer.wavURL to the customer audio file. queueIdRequired String support-tier1Must exist in queue mapping. agentIdOptional String agent-789Internal agent identifier. languageOptional String enISO 639-1 format, defaults to en. asProviderOptional String microsoftTranscription provider.
Voice Transcripts (Pre-transcribed Audio)
Configuration : recordingType = transcription, channelType = voice
Use this format when you have transcribed voice recordings and want to import the text for analysis without reprocessing the audio.
Field Required Type Example Notes conversationIdRequired String conv-123456Unique identifier, max 50 chars. agentEmailRequired String johndoe@example.comMust match a platform user account. conversationStartTimeRequired String 2025-04-10T14:30:00ZISO 8601, UTC timezone. conversationEndTimeRequired String 2025-04-10T14:32:45ZMust be after start time. channelTypeRequired String voiceAlways voice for audio transcripts. recordingTypeRequired String transcriptionAlways transcription for this format. transcriptPathRequired String transcripts/voice-123.jsonPath to the JSON transcript file. queueIdRequired String support-tier1Must exist in queue mapping. languageOptional String enISO 639-1 format, defaults to en. asProviderOptional String microsoftOriginal audio service provider.
Chat Scripts (Live Chat Interactions)
Configuration : recordingType = transcription, channelType = chat
Use this format for live chat interactions from web chat, WhatsApp, Facebook Messenger, and other messaging platforms.
For conversations involving agent or queue transfers, use the queueId of the queue where the conversation ended and the agentEmail of the agent who closed the conversation.
Field Required Type Example Notes conversationIdRequired String conv-123456Unique identifier, max 50 chars. agentEmailRequired String johndoe@example.comMust match a platform user account. conversationStartTimeRequired String 2025-04-10T14:30:00ZISO 8601, UTC timezone. conversationEndTimeRequired String 2025-04-10T14:45:00ZMust be after start time. channelTypeRequired String chatAlways chat for text interactions. recordingTypeRequired String transcriptionAlways transcription for chat. transcriptPathRequired String transcripts/chat-123.jsonPath to the JSON transcript file. queueIdRequired String support-tier1Must exist in queue mapping. languageOptional String en-USDefaults to en if not specified.
Agent and Queue Validation
The system validates agent and queue combinations as follows:
Process the conversation when the system has onboarded the queue, even if the system hasn’t onboarded the agent.
Process the conversation when the system has onboarded both the agent and the queue, even if the agent isn’t mapped to the queue.
Reject the conversation when the system hasn’t onboarded or configured the queue.
JSON Transcript Schemas
Full example:
{
"recognizedPhrases" : [
{
"recognitionStatus" : "Success" ,
"channel" : 0 ,
"offset" : "PT14S" ,
"duration" : "PT2.4S" ,
"offsetInTicks" : 140000000.0 ,
"durationInTicks" : 24000000.0 ,
"durationMilliseconds" : 2400 ,
"offsetMilliseconds" : 14000 ,
"nBest" : [
{
"confidence" : 0.8205426 ,
"lexical" : "yes one four three four two six" ,
"itn" : "yes 143426" ,
"maskedITN" : "yes one four three four two six" ,
"display" : "Yes, 143426." ,
"words" : [
{
"word" : "yes" ,
"offset" : "PT14S" ,
"duration" : "PT0.32S" ,
"offsetInTicks" : 140000000.0 ,
"durationInTicks" : 3200000.0 ,
"durationMilliseconds" : 320 ,
"offsetMilliseconds" : 14000 ,
"confidence" : 0.51653963
}
]
}
]
}
]
}
Required fields only:
{
"recognizedPhrases" : [
{
"channel" : 0 ,
"offsetInTicks" : 140000000.0 ,
"nBest" : [
{
"lexical" : "yes one four three four two six" ,
"words" : [
{
"word" : "yes" ,
"offsetInTicks" : 140000000.0 ,
"durationInTicks" : 3200000.0 ,
"confidence" : 0.51653963
}
]
}
]
}
]
}
Example:
{
"1" : {
"type" : "AGENT" ,
"text" : "Good afternoon, how can I help you today?" ,
"timestamp" : 1749562206000 ,
"userId" : "johndoe@example.com"
},
"2" : {
"type" : "USER" ,
"text" : "I need help with my account balance." ,
"timestamp" : 1749562253142 ,
"userId" : "customer_12345"
}
}
Required fields:
Field Values Notes typeAGENT, USER, or SYSTEMIdentifies the speaker. textMessage content The message text. timestampUnix timestamp in milliseconds Message send time. userIdParticipant identifier Agent email or customer ID.
Storage Structure Options
Choose a folder structure for your S3 bucket before setting up the connector.
Option A: Unified Path
s3://your-bucket/conversations/
├── metadata.csv # All interaction metadata
├── audio/
│ ├── conv-123456.wav # Stereo recording
│ ├── conv-123457-agent.wav # Mono - agent only
│ ├── conv-123457-customer.wav # Mono - customer only
│ └── conv-123458.wav # Stereo recording
├── chat/
│ ├── chat-123459.json # Chat transcript
│ └── chat-123460.json # Chat transcript
└── test.csv # Required for validation
Option B: Separate Voice and Chat Paths
s3://your-bucket/
├── voice-interactions/
│ ├── voice_metadata.csv
│ ├── recordings/
│ │ ├── conv-123456.wav # Stereo
│ │ ├── conv-123457-agent.wav # Mono agent
│ │ └── conv-123457-customer.wav # Mono customer
│ └── test.csv
└── chat-interactions/
├── chat_metadata.csv
├── transcripts/
│ ├── chat-123459.json
│ └── chat-123460.json
└── test.csv
Set Up the Connector
Before you begin, verify that your S3 bucket is ready:
All audio files are available via HTTPS URLs.
.csv files contain the required fields with correct column headers.
Mono recordings have separate agent and customer files.
A test.csv file exists in each configured folder.
File and folder names shouldn’t contain any spaces or special characters.
Step 1: Create the Connector
Navigate to Quality AI > Configure > Connectors .
Select + Add Connector > Amazon S3 > Connect .
Enter a Name for the connector.
Select your AWS Region .
Choose an Auth Type and enter your credentials:
Access Keys : Enter your Access Key and Secret Key .
IAM Role : Enter the IAM Role ARN.
Set the folder path:
Unified Path : Enter a single path for both voice and chat (for example, s3://your-bucket/conversations/).
Separate Paths : Enter a Voice Path and a Chat Path .
Step 2: Test the Connection
Select the Test tab in the connector configuration.
Select Test to validate the configuration. If validation fails, correct the issue and select Re-Test .
Verify that the following validation checks pass:
Authentication : Validates your AWS credentials and S3 bucket access.
Filepath Availability & Accessibility : Verifies that the configured S3 path exists and is available.
File Format Validation : Verifies that supported file formats are present and readable.
Metadata Validation : Verifies that required metadata fields are present and valid.
The Test tab displays validation progress and any errors.
The connector must pass all validation checks before you can proceed with queue mapping and activation.
Navigate to the Queue tab.
Map each queueId from the .csv files to a Quality AI Express queue. Values must match exactly.
Navigate to the Schedule tab.
Set the Interval (minutes, hours, or days) and Start Time (UTC).
Select Save to activate the connector.
After saving, verify the setup is complete:
The system saves and validates the queue mappings.
The processing schedule is active.
The first ingestion job displays in the Log tab.
The system reports no processing errors.
The setup is complete when conversations appear in Quality AI Express dashboards and analytics data is available for ingested interactions.
Troubleshooting
Authentication
Problem Symptom Resolution Invalid Credentials Authentication failed. Verify the access key, secret key, IAM role ARN, and credential expiration. Permission Denied Access denied to S3 bucket. Add S3 read permissions to the IAM user or role. Verify the bucket policy and region.
File Validation
Problem Symptom Resolution File Access Error File or path not found. Verify the bucket name, region, folder path, and file availability. Invalid File Format or Metadata Validation failed. Confirm that test.csv exists and contains the required structure, column headers, and timestamps.
Data Processing
Problem Symptom Resolution Timestamp Errors Invalid timestamp format. Use ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ) in UTC. Verify that the end time is after the start time.
Processing time depends on conversation length, file size, ASR transcription latency for voice interactions, and LLM processing time.