Data Management

LTprophecy supports structured time-series and tabular datasets. This guide covers uploading, validating, transforming, and versioning your data.

Supported File Formats

  • .csv β€” comma-separated values (UTF-8 encoding required)
  • .json β€” JSON array of row objects
  • .parquet β€” columnar binary format (recommended for large files)
  • Direct database connections (PostgreSQL, BigQuery) β€” Enterprise plan only

Maximum file size: 500 MB (Growth), 5 GB (Enterprise). Files are stored encrypted at rest in MinIO (S3-compatible).

Schema Requirements

Every dataset must have:

  • At least one date/time column parseable as ISO 8601 or UNIX timestamp
  • At least one numeric target column (the value to forecast)

Optional but recommended:

  • Categorical feature columns (e.g., region, channel)
  • Numeric feature columns (e.g., marketing_spend, headcount)

Data Quality Report

After upload, LTprophecy automatically runs an Evidently-powered quality report that checks:

  • Missing value rates per column
  • Outlier detection (IQR and z-score methods)
  • Data drift indicators (if a previous version exists)
  • Distribution statistics (mean, median, std, min, max)
  • Duplicate row count

The report is available under Dataset β†’ Quality Report. Issues are classified as error, warning, or info.

Column Transformations

Under Dataset β†’ Transformations, you can apply preprocessing steps that are versioned alongside your data:

  • Imputation β€” mean, median, forward-fill, backward-fill, constant
  • Scaling β€” standard scaler, min-max, robust scaler
  • Encoding β€” one-hot, ordinal, target encoding for categoricals
  • Lag features β€” automatically create lagged versions of columns
  • Rolling aggregates β€” rolling mean/std over configurable windows
  • Custom expressions β€” safe arithmetic expressions (no code injection)

Dataset Versioning

Every upload creates a new version of a dataset. Versions are immutable and identified by a content hash. You can:

  • Compare versions side-by-side in the quality report
  • Pin a model to a specific dataset version for reproducibility
  • Promote a version to active to use it in new training runs

Soft Delete & Retention

Datasets are soft-deleted when removed. Depending on your plan, deleted datasets are purged from storage after:

  • Free β€” 7 days
  • Growth β€” 30 days
  • Enterprise β€” configurable (default 90 days)

API Usage

# Upload a dataset via CLI
curl -X POST https://api.ltprophecy.com/api/datasets \
  -H "Authorization: Bearer $API_KEY" \
  -F "file=@sales_data.csv" \
  -F "name=Q1 Sales" \
  -F "target_column=revenue"