Model Training

LTprophecy provides a managed ML training pipeline supporting XGBoost, LightGBM, CatBoost, Prophet, and ensemble methods with automated hyperparameter optimization via Optuna.

Available Algorithms

Algorithm	Best For	Plan
Prophet	Seasonal time series with holiday effects	All
XGBoost	Tabular data with many features	All
LightGBM	Large datasets, faster training	Growth+
CatBoost	High-cardinality categoricals	Growth+
Ensemble	Maximum accuracy via model stacking	Enterprise
Custom (BYOM)	Your own scikit-learn pipeline	Enterprise

Training Configuration

Basic Options

Algorithm — select from the table above
Dataset & version — pin to a specific dataset version
Target column — the numeric column to forecast
Feature columns — additional inputs to the model
Validation split — train/val/test ratio (default 70/15/15)
Forecast horizon — number of periods to forecast ahead

Hyperparameter Optimization

Enable Auto-tune (Optuna) to let the platform search for optimal hyperparameters. Configure:

n_trials — number of Optuna trials (default: 50)
timeout_minutes — maximum search time
Metric — RMSE, MAE, MAPE, or sMAPE
CV folds — time-series cross-validation folds (default: 5)
Pruning — early stopping of unpromising trials

Training Jobs & GPU Queue

Training jobs are queued through Celery workers. CPU workers handle most models; GPU workers (if provisioned) accelerate deep learning and large LightGBM runs. You can monitor job status in real-time via Models → Training Runs.

Typical training times:

Prophet (≤ 100k rows): < 2 min
XGBoost with Optuna (100k rows, 50 trials): 5–15 min
Ensemble stack: 20–45 min

Experiment Tracking (MLflow)

Every training run is automatically logged to MLflow with:

All hyperparameters
Evaluation metrics (RMSE, MAE, R², etc.)
Feature importance plots
Confusion matrices and residual charts
Serialized model artifact (stored in MinIO)

Access the MLflow UI via Models → MLflow Dashboard (admins only in production).

Model Registry & Promotion

After training, models pass through lifecycle stages:

Staging — trained, under evaluation
Production — promoted by an admin, used for forecasts
Archived — retired, artifacts retained

Only Production models can be selected when creating new forecasts. Promoting a model requires org:models:manage permission.

Model Evaluation

The evaluation panel shows:

Hold-out test set metrics
Backtested forecasts vs actuals chart
Shapley value feature importance (XGBoost/LightGBM)
Drift detection against training data distribution
Calibration curves for probabilistic forecasters