NLP Training Overview - Kore.ai Docs

Back to NLP Topics NLP training ensures your assistant accurately identifies user intent. The platform uses multiple engines—ML, FM, KG, Traits, and Ranking & Resolver—each suited to different scenarios.

NLP Preprocessing

Before intent detection, every utterance is preprocessed:

Step	Description
Tokenization	Split utterance into sentences, then words. TreeBank Tokenizer for English.
toLower()	Convert to lowercase (not for German). ML and KG engines only.
Stop word removal	Remove low-signal words. Language-specific list; optional, disabled by default.
Stemming	Reduce to stem (e.g., “Running” → “run”). Output may not be a real word.
Lemmatization	Reduce to base dictionary form (e.g., “housing” → “house”).
N-grams	Combine co-occurring words for context (e.g., “New York City” as a tri-gram).

Scoping Your Assistant

Before training, define your assistant’s scope:

Define the problem — what the assistant must accomplish; align with BAs and developers.
List intents — identify key results for each; focus on user needs.
Sketch example conversations — user utterances and responses; include edge cases and follow-ups.
Brainstorm alternate utterances — include idioms and slang for each intent.

Choosing an Engine

Engine	Best For
ML	Large corpus; diverse utterances; flexible and auto-learning. Recommended as the primary training method.
KG	Query-type intents; document-based answers; many intents with limited alternate utterances.
FM	Idiomatic/command-like sentences; acceptable tolerance for false positives.

NLP Configuration in the Platform

Go to Automation > Natural Language:

Section	Purpose
Training	Add ML utterances, synonyms, concepts, patterns.
Engine Tuning	Set recognition confidence levels, thresholds.
Advanced Settings	Auto-training settings, negative intent patterns.

NLP Version 3 (default for new VAs from v10.0):

Improved Traits Engine accuracy.
Transformer and KAEN models for English; Transformer for other languages.
Enables Zero-shot and Few-shot ML models.
As of January 21, 2024, all existing VAs are on Version 3.

For per-engine training and configuration:

Documentation Index

​NLP Preprocessing

​Scoping Your Assistant

​Choosing an Engine

​NLP Configuration in the Platform

NLP Preprocessing

Scoping Your Assistant

Choosing an Engine

NLP Configuration in the Platform