Data Management
LTprophecy supports structured time-series and tabular datasets. This guide covers uploading, validating, transforming, and versioning your data.
Supported File Formats
.csvβ comma-separated values (UTF-8 encoding required).jsonβ JSON array of row objects.parquetβ columnar binary format (recommended for large files)- Direct database connections (PostgreSQL, BigQuery) β Enterprise plan only
Maximum file size: 500 MB (Growth), 5 GB (Enterprise). Files are stored encrypted at rest in MinIO (S3-compatible).
Schema Requirements
Every dataset must have:
- At least one date/time column parseable as ISO 8601 or UNIX timestamp
- At least one numeric target column (the value to forecast)
Optional but recommended:
- Categorical feature columns (e.g.,
region,channel) - Numeric feature columns (e.g.,
marketing_spend,headcount)
Data Quality Report
After upload, LTprophecy automatically runs an Evidently-powered quality report that checks:
- Missing value rates per column
- Outlier detection (IQR and z-score methods)
- Data drift indicators (if a previous version exists)
- Distribution statistics (mean, median, std, min, max)
- Duplicate row count
The report is available under Dataset β Quality Report. Issues are classified as error, warning, or info.
Column Transformations
Under Dataset β Transformations, you can apply preprocessing steps that are versioned alongside your data:
- Imputation β mean, median, forward-fill, backward-fill, constant
- Scaling β standard scaler, min-max, robust scaler
- Encoding β one-hot, ordinal, target encoding for categoricals
- Lag features β automatically create lagged versions of columns
- Rolling aggregates β rolling mean/std over configurable windows
- Custom expressions β safe arithmetic expressions (no code injection)
Dataset Versioning
Every upload creates a new version of a dataset. Versions are immutable and identified by a content hash. You can:
- Compare versions side-by-side in the quality report
- Pin a model to a specific dataset version for reproducibility
- Promote a version to active to use it in new training runs
Soft Delete & Retention
Datasets are soft-deleted when removed. Depending on your plan, deleted datasets are purged from storage after:
- Free β 7 days
- Growth β 30 days
- Enterprise β configurable (default 90 days)
API Usage
# Upload a dataset via CLI
curl -X POST https://api.ltprophecy.com/api/datasets \
-H "Authorization: Bearer $API_KEY" \
-F "file=@sales_data.csv" \
-F "name=Q1 Sales" \
-F "target_column=revenue"