Datasets & the Data Vault
Understand how datasets work, where they come from, and how to manage them.
The Data Vault brings together all datasets from every pipeline and project in your organization into a single place. Instead of navigating to individual pipelines to find their data, the Data Vault gives you one unified view to browse, search, export, and analyze all of your collected and imported data.
Accessing the Data Vault
Navigate to Data Vault in the sidebar. The Data Vault is available to Sys Admins, Org Admins, and Project Admins. Contributors do not have access to the Data Vault.
Project Admins can only see datasets that belong to projects they are assigned to.
Dataset sources
Every dataset has a source that determines how its data was created:
| Source | Description |
|---|---|
| Pipeline | Created automatically when a pipeline becomes active. Rows come from task data and update live as tasks progress. |
| File Upload | Imported from a CSV or JSON file. |
| HuggingFace | Imported from the HuggingFace Hub via a search-and-configure flow. |
The source determines which features are available on the dataset:
| Feature | Pipeline datasets | Imported datasets |
|---|---|---|
| Overview tab | Pipeline link, contributor breakdown, evaluation criteria summary | Row/column counts, source file info, import date |
| Dataset Content tab | Shows task ID, status, created date, and all field values | Shows imported columns and values |
| Analytics tab | Evaluation field analytics, comparison insights, inter-rater agreement | Coming soon |
| Studio tab | Full charting and aggregation workspace | Coming soon |
| Export | Async export with progress tracking (CSV, JSON, ZIP) | Synchronous download (CSV, JSON, ZIP) |
| Sharing | Share via link with access controls | Not available |
| Rename | Not available (name syncs from pipeline) | Click the dataset title to rename inline |
Dataset statuses
Every dataset has a status displayed as a badge in the Data Vault listing and detail pages:
| Status | Description |
|---|---|
| Collecting | The source pipeline is active and data is flowing in. Datasets in this status cannot be archived or deleted. |
| Draft | The source pipeline is in draft or paused. No new data is being collected. |
| Complete | The source pipeline has been completed or archived, or the dataset was imported. The dataset is finalized. |
| Orphaned | The source pipeline was deleted. The dataset and its data are preserved but read-only. |
For pipeline-backed datasets, the status is automatically synced from the pipeline's lifecycle:
| Pipeline status | Dataset status |
|---|---|
| Draft | Draft |
| Active | Collecting |
| Paused | Draft (but never downgraded from Collecting) |
| Completed | Complete |
| Archived | Complete |
Imported datasets are created with a Complete status since their data is fully loaded at import time.
The Data Vault listing
The listing page shows all datasets in your organization in a paginated table. Columns include:
- Name — click to open the dataset detail page.
- Status — badge showing the current lifecycle status.
- Rows — number of rows (computed live for pipeline datasets).
- Source — icon and label indicating Pipeline, File Upload, HuggingFace, or Export.
- Project — the project this dataset belongs to, if any.
- Source Pipeline — the pipeline that generated this dataset (hidden by default).
- Last Updated — relative timestamp of the last update.
- Archived — archived badge (hidden by default).
Use the search bar to filter datasets by name. Use the Status and Source dropdown filters to narrow results. Click Show archived in the toolbar to include archived datasets.
Pipeline-generated datasets
When a pipeline becomes active, a dataset is automatically created in the Data Vault. Each task in the pipeline becomes a row in the dataset, with columns corresponding to the form fields defined in the pipeline's nodes.
These datasets update in real time — as tasks progress through the pipeline, the dataset content reflects their current state, including task status (pending, in progress, finished, etc.).
If a pipeline-backed dataset is deleted and the pipeline later becomes active again, a new dataset is automatically created to ensure data collection continues.
Dataset detail page
Click on any dataset to open its detail page. The page has four tabs:
- Overview — summary statistics (row count, column count, dates), source information, and pipeline link for pipeline-backed datasets.
- Dataset Content — paginated data table showing all rows and fields. Supports column visibility controls and text wrapping toggle.
- Analytics — evaluation field analytics, contributor breakdowns, and inter-rater agreement metrics (pipeline datasets only).
- Studio — interactive charting and exploration workspace (pipeline datasets only). See Analytics Studio.
The toolbar at the top of the detail page includes:
- Share — create a share link (pipeline datasets only). See Sharing.
- Export — download the dataset. See Exporting Data.
- Archive / Unarchive — toggle the archived state (not available for datasets with "collecting" status).
- Delete — permanently delete the dataset (not available for datasets with "collecting" status).
Managing datasets
Archiving
Archive a dataset to hide it from the default Data Vault listing. Archived datasets are not deleted — they can be restored at any time using Unarchive. Datasets with "collecting" status (i.e., their pipeline is active) cannot be archived.
If a pipeline becomes active and its dataset was previously archived, the dataset is automatically un-archived.
Deleting
Deleting a dataset is permanent. The confirmation dialog warns that the action cannot be undone. If the associated pipeline later becomes active, a new (empty) dataset will be created automatically.
Datasets with "collecting" status cannot be deleted — pause or complete the pipeline first.
Renaming
Imported datasets can be renamed by clicking the dataset title on the detail page. Pipeline-backed datasets inherit their name from the pipeline and cannot be renamed directly.