Datasets & the Data Vault

The Data Vault brings together all datasets from every pipeline and project in your organization into a single place. Instead of navigating to individual pipelines to find their data, the Data Vault gives you one unified view to browse, search, export, and analyze all of your collected and imported data.

Accessing the Data Vault

Navigate to Data Vault in the sidebar. The Data Vault is available to Sys Admins, Org Admins, and Project Admins. Contributors do not have access to the Data Vault.

Project Admins can only see datasets that belong to projects they are assigned to.

Dataset sources

Every dataset has a source that determines how its data was created:

Source	Description
Pipeline	Created automatically when a pipeline becomes active. Rows come from task data and update live as tasks progress.
File Upload	Imported from a CSV or JSON file.
HuggingFace	Imported from the HuggingFace Hub via a search-and-configure flow.

The source determines which features are available on the dataset:

Feature	Pipeline datasets	Imported datasets
Overview tab	Pipeline link, contributor breakdown, evaluation criteria summary	Row/column counts, source file info, import date
Dataset Content tab	Shows task ID, status, created date, and all field values	Shows imported columns and values
Analytics tab	Evaluation field analytics, comparison insights, inter-rater agreement	Coming soon
Studio tab	Full charting and aggregation workspace	Coming soon
Export	Async export with progress tracking (CSV, JSON, ZIP)	Synchronous download (CSV, JSON, ZIP)
Sharing	Share via link with access controls	Not available
Rename	Not available (name syncs from pipeline)	Click the dataset title to rename inline

Dataset statuses

Every dataset has a status displayed as a badge in the Data Vault listing and detail pages:

Status	Description
Collecting	The source pipeline is active and data is flowing in. Datasets in this status cannot be archived or deleted.
Draft	The source pipeline is in draft or paused. No new data is being collected.
Complete	The source pipeline has been completed or archived, or the dataset was imported. The dataset is finalized.
Orphaned	The source pipeline was deleted. The dataset and its data are preserved but read-only.

For pipeline-backed datasets, the status is automatically synced from the pipeline's lifecycle:

Pipeline status	Dataset status
Draft	Draft
Active	Collecting
Paused	Draft (but never downgraded from Collecting)
Completed	Complete
Archived	Complete

Imported datasets are created with a Complete status since their data is fully loaded at import time.

The Data Vault listing

The listing page shows all datasets in your organization in a paginated table. Columns include:

Name — click to open the dataset detail page.
Status — badge showing the current lifecycle status.
Rows — number of rows (computed live for pipeline datasets).
Source — icon and label indicating Pipeline, File Upload, HuggingFace, or Export.
Project — the project this dataset belongs to, if any.
Source Pipeline — the pipeline that generated this dataset (hidden by default).
Last Updated — relative timestamp of the last update.
Archived — archived badge (hidden by default).

Use the search bar to filter datasets by name. Use the Status and Source dropdown filters to narrow results. Click Show archived in the toolbar to include archived datasets.

Pipeline-generated datasets

When a pipeline becomes active, a dataset is automatically created in the Data Vault. Each task in the pipeline becomes a row in the dataset, with columns corresponding to the form fields defined in the pipeline's nodes.

These datasets update in real time — as tasks progress through the pipeline, the dataset content reflects their current state, including task status (pending, in progress, finished, etc.).

If a pipeline-backed dataset is deleted and the pipeline later becomes active again, a new dataset is automatically created to ensure data collection continues.

Dataset detail page

Click on any dataset to open its detail page. The page has four tabs:

Overview — summary statistics (row count, column count, dates), source information, and pipeline link for pipeline-backed datasets.
Dataset Content — paginated data table showing all rows and fields. Supports column visibility controls and text wrapping toggle.
Analytics — evaluation field analytics, contributor breakdowns, and inter-rater agreement metrics (pipeline datasets only).
Studio — interactive charting and exploration workspace (pipeline datasets only). See Analytics Studio.

The toolbar at the top of the detail page includes:

Share — create a share link (pipeline datasets only). See Sharing.
Export — download the dataset. See Exporting Data.
Archive / Unarchive — toggle the archived state (not available for datasets with "collecting" status).
Delete — permanently delete the dataset (not available for datasets with "collecting" status).

Accessing the Data Vault

Dataset sources

Dataset statuses

The Data Vault listing

Pipeline-generated datasets

Dataset detail page

Managing datasets

Archiving

Deleting

Renaming

On this page