Use this file to discover all available pages before exploring further.
Back to Search AI connectors listYouTube is a video-sharing platform that allows users to upload, view, and manage video content organized into channels and playlists. You can configure Search AI to connect to YouTube to enable users to fetch query results using video metadata, descriptions, and transcripts from your YouTube channel.
Search AI supports OAuth 2.0 (Google) authentication for communicating with YouTube. Authentication requires a Client ID and Client Secret obtained from the Google Cloud Console.
Go to Connectors under the Sources page and select YouTube. On the Authentication tab, enter the following details and click Connect.
Field
Description
Name
A unique name for this connector instance.
Authorization type
OAuth 2.0 (pre-selected).
Grant type
Authorization Code (pre-selected).
Client ID
Your Google OAuth 2.0 Client ID (for example: 123456789-abcdef.apps.googleusercontent.com).
Client Secret
Your Google OAuth 2.0 Client Secret (for example: GOCSPX-xxxxxxxxxxxxxxxxxxxxxx).
After entering credentials and clicking Connect, you will be redirected to Google’s consent screen to authorize Search AI. Tokens are stored securely and refreshed automatically when they expire. Once successfully authenticated, the status shows as Connected.
Go to the Permissions tab to configure access control for the ingested content. The following options are available.
Same users as in the source system (Restricted Access) - Applies RACL-based access control. When enabled, the connector uses the permissions field from the video data source. Since YouTube videos are inherently public (private videos are skipped), RACL effectively defaults to open access for all ingested content.
Everyone (Public Access) - All users are granted full access to the ingested content.
RACL Sync scheduling isn’t applicable for the YouTube connector since YouTube doesn’t have a user/group permission model for public video content.
Advanced filters let you narrow down which videos are ingested based on specific criteria. These filters are applied before transcript fetching, so filtered-out videos don’t consume API quota.
Filter Parameter
Description
Example Value
publishedAfter
Only include videos published after this date.
2024-01-01 or 2024-01-01T00:00:00Z
publishedBefore
Only include videos published before this date.
2025-12-31
minViewCount
Only include videos with at least this many views.
1000
minDurationMinutes
Only include videos longer than this (in minutes).
5
maxDurationMinutes
Only include videos shorter than this (in minutes).
60
Validation Rules:
publishedAfter must be earlier than publishedBefore
minDurationMinutes must be less than maxDurationMinutes
View count and duration values must be non-negative
Dates accept ISO 8601 format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SSZ)
Example: “Only sync tutorial videos published after January 2024 that are between 5 and 60 minutes long and have at least 500 views.”
Fetches all videos from the configured playlists, processes them, and updates the index. A delete queue runs after ingestion to remove any previously indexed videos that are no longer present.
Scheduling
You can configure an automatic sync schedule to keep content up to date.
Shorts Handling
YouTube Shorts (videos under 60 seconds) are excluded by default.
Only public videos are synced. Private and unlisted videos are automatically skipped.
Once the sync is complete, go to the Content tab to review the ingested content. The tab displays the count of files that are Successful, Failed, and Skipped, along with the total number of Accessible Files.Each synced video is converted into a single markdown document. The following key fields are extracted and indexed.
title - The video title.
content - The full transcript organized by chapters (if available) with timestamps, plus the first 2,000 characters of the video description.
doc_id - A unique identifier for the ingested video document, derived from the YouTube video ID.
doc_source_type - Identifies the source as a YouTube video.
doc_created_on / doc_updated_on - Timestamps capturing when the video was published and last updated.
url - Direct link to the video on YouTube (https://www.youtube.com/watch?v=<videoId>).
tags - Video tags extracted from YouTube metadata.
type - Content type classification (for example: tutorial (85%)).
The connector fetches video captions/transcripts using the YouTube Captions API. Transcript selection follows this priority order:
Priority
Transcript Type
1st
Manual (human-uploaded) captions in English
2nd
Auto-generated captions in English
3rd
Manual captions in any language
4th
Auto-generated captions in any language
If a video has no captions at all, it’s still synced - the video metadata and description are indexed without a transcript.
Chapter markers in the video description (for example: 0:00 Introduction, 5:30 Setup) are automatically detected and used to organize the transcript into sections.
The YouTube Data API v3 has a daily quota limit of 10,000 units per Google Cloud project. The connector is designed to be quota-efficient:
Operation
Quota Cost
List channels / playlists / videos
1 unit per request
List captions for a video
50 units per request
Download a caption track (transcript)
200 units per request
The connector reserves a buffer of 1,000 units and stops processing if the quota budget (9,000 units) is reached. If quota is exhausted mid-sync, the job completes with a “limit exceeded” status and already-processed videos are retained.
Tip: If you have a large channel (hundreds of videos with transcripts), consider requesting a quota increase from Google through the Cloud Console Quotas page.
Search AI provides access control support for content ingested from YouTube. The sys_racl field is used to enforce access control for the ingested content.
Feature
Status
RACL Support
Available (can be enabled)
RACL Sync
Not applicable
When RACL is disabled (default), all synced content is publicly accessible to all users searching the index (permission set to *).
When RACL is enabled, the connector uses the permissions field from the video data source. Since YouTube videos are inherently public (private videos are skipped), RACL effectively defaults to open access (['*']) for all ingested content.
RACL Sync scheduling isn’t applicable for the YouTube connector since YouTube doesn’t have a user/group permission model for public video content.