Multilingual Support in SearchAI¶
SearchAI offers multilingual capabilities to enhance accessibility and deliver a seamless experience for users interacting in different languages. This feature enables users to engage with the platform in their preferred language, resulting in more intuitive and personalized interactions.
Core Capabilities
SearchAI's multilingual support enables you to:
- Add and manage content in multiple languages.
- Submit queries and receive responses in supported languages.
- Get search results and answers in the same language as the query
Key Highlights:
- 100+ Languages Supported for indexing, querying, and answer generation.
- Works with any language supported by your chosen LLM and vector generation model, using the Text Extraction strategy and Vector Retrieval method.
- No additional configuration required.
Commonly Supported Languages¶
SearchAI supports the languages commonly handled by advanced LLMs and embedding models like BGE-M3. The following are among the most widely supported and globally significant languages in terms of usage and application support:
English | Spanish | Arabic | French | Chinese |
Hindi | Portuguese | Russian | Bengali | Urdu |
Japanese | German | Turkish | Korean | Italian |
Vietnamese | Persian | Swahili | Thai | Malay |
Afrikaans | Albanian | Amharic | Armenian | Assamese |
Asturian | Avaric | Azerbaijani | Bashkir | Basque |
Bavarian | Belarusian | Bihari | Bishnupriya | Bosnian |
Breton | Bulgarian | Burmese | Cantonese | Catalan |
Central Bikol | Central Kurdish | Chavacano | Chechen | Cebuano |
Chuvash | Cornish | Corsican | Croatian | Danish |
Dhivehi | Doteli | Dutch | Egyptian Arabic | Emilian-Romagnol |
Erzya | Esperanto | Estonian | Fiji Hindi | Finnish |
Galician | Georgian | Goan Konkani | Greek | Gujarati |
Haitian Creole | Hebrew | Hill Mari | Hungarian | Ido |
Icelandic | Ilocano | Indonesian | Interlingua | Interlingue |
Irish | Javanese | Kannada | Karachay-Balkar | Kazakh |
Khmer | Komi | Kurdish | Kyrgyz | Lao |
Latin | Latvian | Lezghian | Limburgish | Lithuanian |
Lojban | Lombard | Low German | Lower Sorbian | Luxembourgish |
Macedonian | Maithili | Malagasy | Malayalam | Maltese |
Manx | Marathi | Mazanderani | Meadow Mari | Mingrelian |
Refer to the official documentation of your LLM or vector generation model for a comprehensive list of supported languages.
Language-Specific Configuration¶
While core multilingual support is comprehensive, certain modules within SearchAI are language-sensitive and require different strategies or models depending on the language used. The sections below provide support details for some of the most widely used languages by key components:
- Extraction Strategies
- Vector Configuration Models
- Retrieval Strategies
- Answer Generation Models
Use this guidance to ensure your multilingual setup is aligned with the most effective techniques for each language or model.
Language-Specific Extraction Capabilities¶
The table below outlines the supported content extraction methods for widely used languages, enabling you to select the most effective approach for processing multilingual content.
Language Support | |
Text Extraction | All languages listed above |
Layout Aware Extraction | English, Ukrainian |
Image Extraction | English, Ukrainian, Spanish, Russian, German |
Advanced HTML Extraction | English, Ukrainian, German |
Markdown Extraction | English, Ukrainian, Spanish, Russian, German, Hungarian, Chinese |
Language-Specific Vector Generation Support¶
Vector generation model support varies by language. Use the following models for optimal performance:
- English: MPNet, E5, BGE-M3, LaBSE
- Non-English Languages: BGE-M3 and LaBSE
Note: BGE-M3 supports a wide range of languages. Their training data includes many commonly spoken languages; however, performance may be lower for low-resource or underrepresented languages.
Language-Specific Retrieval Strategy Support¶
- English: Vector Retrieval and Hybrid Retrieval
- Non-English: Vector Retrieval
Supported Answer Generation Models¶
Answer generation quality depends on the language capabilities of the underlying LLM. Please refer to the official list of supported languages from the LLM provider.
Recommendations¶
To optimize multilingual performance:
- Choose the right LLM - Select models with strong support for your target languages. Refer to the official list of languages supported by the LLM.
- Customize prompts - Create language-specific prompts to improve answer quality and relevance.
- Test performance - Evaluate different LLMs for your specific use case in your target language.
- Monitor quality - Regularly assess answer quality across languages and adjust configurations as needed.