Documentation Index Fetch the complete documentation index at: https://koreai.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Back to NLP Topics
NLP training ensures your assistant accurately identifies user intent. The platform uses multiple engines—ML, FM, KG, Traits, and Ranking & Resolver—each suited to different scenarios.
NLP Preprocessing
Before intent detection, every utterance is preprocessed:
Step Description Tokenization Split utterance into sentences, then words. TreeBank Tokenizer for English. toLower() Convert to lowercase (not for German). ML and KG engines only. Stop word removal Remove low-signal words. Language-specific list; optional, disabled by default. Stemming Reduce to stem (e.g., “Running” → “run”). Output may not be a real word. Lemmatization Reduce to base dictionary form (e.g., “housing” → “house”). N-grams Combine co-occurring words for context (e.g., “New York City” as a tri-gram).
Scoping Your Assistant
Before training, define your assistant’s scope:
Define the problem — what the assistant must accomplish; align with BAs and developers.
List intents — identify key results for each; focus on user needs.
Sketch example conversations — user utterances and responses; include edge cases and follow-ups.
Brainstorm alternate utterances — include idioms and slang for each intent.
Choosing an Engine
Engine Best For ML Large corpus; diverse utterances; flexible and auto-learning. Recommended as the primary training method. KG Query-type intents; document-based answers; many intents with limited alternate utterances. FM Idiomatic/command-like sentences; acceptable tolerance for false positives.
Go to Automation > Natural Language :
Section Purpose Training Add ML utterances, synonyms, concepts, patterns. Engine Tuning Set recognition confidence levels, thresholds. Advanced Settings Auto-training settings, negative intent patterns.
NLP Version 3 (default for new VAs from v10.0):
Improved Traits Engine accuracy.
Transformer and KAEN models for English; Transformer for other languages.
Enables Zero-shot and Few-shot ML models.
As of January 21, 2024, all existing VAs are on Version 3.
For per-engine training and configuration: