Documentation Index Fetch the complete documentation index at: https://koreai.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Back to API List
This API deploys an open-source or fine-tuned model in the Ready to Deploy state. Users can configure deployment parameters, including hyperparameters, scaling, and optimization settings, allowing for flexible model scaling and performance tuning.
The API response includes the model ID and the model deployment status . After receiving the response, use the dockStatusId to call the Get Dock Status API and verify the successful deployment of the model.
Method POST Endpoint https://{host}/api/public/models/:{modelId}/deploy?modelType={modelType}Content Type application/json Authorization X-api-key – The API key used for authentication.
Where can I find the API key?
To use the API, you will need an API key. Learn more .
Query Parameters
PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL ENUM VALUES host The environment URL. For example, https://ai-for-process.domain.ai/. String Required N/A modelId The model ID to deploy. String Required N/A modelType Type of model being deployed. String Required [“openSource”, “fineTune”]
Sample Request
For an Opensource Model Source
curl -- location 'https://{host}/api/public/models/cm-2xxxxxxxxxxxxxxxxxx0/deploy?modelType=openSource'
-- header 'x-api-key: kg-axxxxxxx-5xx3-5xx8-bxxb-9xxxxxxxxxx-ebxxxxxx-5xxb-4xxb-9xx5-cxxxxxxxxx3'
-- header 'Content-Type: application/json'
-- data ' {
"name" : "Flant5_model" ,
"hyperParameters" : {
"temperature" : 1 ,
"maxTokens" : 512 ,
"topP" : 1 ,
"topK" : 50 ,
"stopSequence" : []
},
"scalingParameters" : {
"maxBatchSize" : 10 ,
"minReplicas" : 1 ,
"maxReplicas" : 2 ,
"scaleUpDelay" : 30 ,
"scaleDownDelay" : 600
},
"deviceType" : "g5.xlarge" ,
"optimizationInfo" : {
"optimizationType" : "" ,
"quantizationType" : ""
},
"isDeployedPreviously" : true
} '
For a Fine-tune Model Source
curl -- location ' https://{host}/api/public/models/cm-6xxxxxxxxxxxxxxxxxx9/deploy?modelType=fineTune'
-- header 'x-api-key: kg-2xxxxxxxxxxxxxxxxxxf-7xxxxxxx-7xx8-4xxf-8xx7-dxxxxxxxxxx3'
-- header 'Content-Type: application/json'
-- data ' {
"name" : "gpt2" ,
"hyperParameters" : {
"temperature" : 1 ,
"maxTokens" : 512 ,
"topP" : 1 ,
"topK" : 50 ,
"stopSequence" : []
},
"scalingParameters" : {
"maxBatchSize" : 10 ,
"minReplicas" : 1 ,
"maxReplicas" : 2 ,
"scaleUpDelay" : 30 ,
"scaleDownDelay" : 600
},
"deviceType" : "g5.xlarge" ,
"optimizationInfo" : {
"optimizationType" : "" ,
"quantizationType" : ""
},
"isDeployedPreviously" : true
} '
Body Parameters
The following deployment parameters can be configured and passed in the body:
General Parameters
PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL ENUM VALUES name Name of the model to deploy. String Required N/A isDeployedPreviously Indicates if the model was deployed before. Boolean Optional [true, false]
Hyperparameters
PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL ENUM VALUES temperature Controls randomness of output. Float Required 0–2 maxTokens Maximum tokens allowed. Int Required 0–512 topP Controls nucleus sampling. Float Required 0–1 topK Controls top-K sampling. Int Required 1–100 stopSequence Stop sequences for the model. Array Optional N/A
Scaling Parameters
PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL RANGE maxBatchSize Maximum batch size. Int Optional 1–256 minReplicas Minimum replicas. Int Optional 1–10 maxReplicas Maximum replicas. Int Optional 1–50 scaleUpDelay Delay before scaling up (ms). Int Optional 1–1000 scaleDownDelay Delay before scaling down (ms). Int Optional 50–2000
Deployment Device & Optimization
PARAMETER DESCRIPTION TYPE REQUIRED/OPTIONAL ENUM VALUES deviceType Device type for deployment. String Required [“g4dn.xlarge”, “g5.xlarge”, “g5.2xlarge”, “g6e.xlarge”, “g4dn.12xlarge”, “g5.12xlarge”, “g5.48xlarge”, “g4dn.metal”] optimizationInfo Optimization details. Object Optional N/A optimizationType Type of optimization. String Optional [“ctranslate2”, “vllm”] quantizationType Type of quantization. String Optional [“no_quantization”, “int8_float16”]
Sample Response
{
"dock-statusId" : "ds-d0xxxxxd-bxx9-5xx0-8xx5-5bxxxxxxxxx1" ,
"modelId" : "cm-77xxxxxb-exx9-5xxc-8xx6-5xxxxxxxxxx1" ,
"jobType" : "MODELS" ,
"action" : "DEPLOY" ,
"status" : "IN_PROGRESS"
}
Response Parameters
PARAMETER DESCRIPTION TYPE dockStatusId The unique identifier for tracking the model deployment. String modelId The model that was deployed. String jobType Specifies the type of job (e.g., MODELS). String action Indicates the performed action (DEPLOY). String status Deployment status (SUCCESS, IN_PROGRESS, or FAILED). String