Skip to content

Multilingual Support in SearchAI

SearchAI offers multilingual capabilities to enhance accessibility and deliver a seamless experience for users interacting in different languages. This feature enables users to engage with the platform in their preferred language, resulting in more intuitive and personalized interactions.

Core Capabilities

SearchAI's multilingual support enables you to:

  • Add and manage content in multiple languages.
  • Submit queries and receive responses in supported languages.
  • Get search results and answers in the same language as the query

Key Highlights:

  • 100+ Languages Supported for indexing, querying, and answer generation.
  • Works with any language supported by your chosen LLM and vector generation model, using the Text Extraction strategy and Vector Retrieval method.
  • No additional configuration required.

Commonly Supported Languages

SearchAI supports the languages commonly handled by advanced LLMs and embedding models like BGE-M3. The following are among the most widely supported and globally significant languages in terms of usage and application support:

English Spanish Arabic French Chinese
Hindi Portuguese Russian Bengali Urdu
Japanese German Turkish Korean Italian
Vietnamese Persian Swahili Thai Malay
Afrikaans Albanian Amharic Armenian Assamese
Asturian Avaric Azerbaijani Bashkir Basque
Bavarian Belarusian Bihari Bishnupriya Bosnian
Breton Bulgarian Burmese Cantonese Catalan
Central Bikol Central Kurdish Chavacano Chechen Cebuano
Chuvash Cornish Corsican Croatian Danish
Dhivehi Doteli Dutch Egyptian Arabic Emilian-Romagnol
Erzya Esperanto Estonian Fiji Hindi Finnish
Galician Georgian Goan Konkani Greek Gujarati
Haitian Creole Hebrew Hill Mari Hungarian Ido
Icelandic Ilocano Indonesian Interlingua Interlingue
Irish Javanese Kannada Karachay-Balkar Kazakh
Khmer Komi Kurdish Kyrgyz Lao
Latin Latvian Lezghian Limburgish Lithuanian
Lojban Lombard Low German Lower Sorbian Luxembourgish
Macedonian Maithili Malagasy Malayalam Maltese
Manx Marathi Mazanderani Meadow Mari Mingrelian

Refer to the official documentation of your LLM or vector generation model for a comprehensive list of supported languages.

Language-Specific Configuration

While core multilingual support is comprehensive, certain modules within SearchAI are language-sensitive and require different strategies or models depending on the language used. The sections below provide support details for some of the most widely used languages by key components:

  • Extraction Strategies
  • Vector Configuration Models
  • Retrieval Strategies
  • Answer Generation Models

Use this guidance to ensure your multilingual setup is aligned with the most effective techniques for each language or model.

Language-Specific Extraction Capabilities

The table below outlines the supported content extraction methods for widely used languages, enabling you to select the most effective approach for processing multilingual content.

Language Support
Text Extraction All languages listed above
Layout Aware Extraction English, Ukrainian
Image Extraction English, Ukrainian, Spanish, Russian, German
Advanced HTML Extraction English, Ukrainian, German
Markdown Extraction English, Ukrainian, Spanish, Russian, German, Hungarian, Chinese

Language-Specific Vector Generation Support

Vector generation model support varies by language. Use the following models for optimal performance:

  • English: MPNet, E5, BGE-M3, LaBSE
  • Non-English Languages: BGE-M3 and LaBSE

Note: BGE-M3 supports a wide range of languages. Their training data includes many commonly spoken languages; however, performance may be lower for low-resource or underrepresented languages.

Language-Specific Retrieval Strategy Support

  • English: Vector Retrieval and Hybrid Retrieval
  • Non-English: Vector Retrieval

Supported Answer Generation Models

Answer generation quality depends on the language capabilities of the underlying LLM. Please refer to the official list of supported languages from the LLM provider.

Recommendations

To optimize multilingual performance:

  • Choose the right LLM - Select models with strong support for your target languages. Refer to the official list of languages supported by the LLM.
  • Customize prompts - Create language-specific prompts to improve answer quality and relevance.
  • Test performance - Evaluate different LLMs for your specific use case in your target language.
  • Monitor quality - Regularly assess answer quality across languages and adjust configurations as needed.