Pipelines Docs is in beta — content is actively being added.
Platform GuideDatasets & Data Vault

Datasets & the Data Vault

Understand how datasets work, where they come from, and how to manage them.

The Data Vault brings together all datasets from every pipeline and project in your organization into a single place. Instead of navigating to individual pipelines to find their data, the Data Vault gives you one unified view to browse, search, export, and analyze all of your collected and imported data.

Accessing the Data Vault

Navigate to Data Vault in the sidebar. The Data Vault is available to Sys Admins, Org Admins, and Project Admins. Contributors do not have access to the Data Vault.

Project Admins can only see datasets that belong to projects they are assigned to.

Dataset sources

Every dataset has a source that determines how its data was created:

SourceDescription
PipelineCreated automatically when a pipeline becomes active. Rows come from task data and update live as tasks progress.
File UploadImported from a CSV or JSON file.
HuggingFaceImported from the HuggingFace Hub via a search-and-configure flow.

The source determines which features are available on the dataset:

FeaturePipeline datasetsImported datasets
Overview tabPipeline link, contributor breakdown, evaluation criteria summaryRow/column counts, source file info, import date
Dataset Content tabShows task ID, status, created date, and all field valuesShows imported columns and values
Analytics tabEvaluation field analytics, comparison insights, inter-rater agreementComing soon
Studio tabFull charting and aggregation workspaceComing soon
ExportAsync export with progress tracking (CSV, JSON, ZIP)Synchronous download (CSV, JSON, ZIP)
SharingShare via link with access controlsNot available
RenameNot available (name syncs from pipeline)Click the dataset title to rename inline

Dataset statuses

Every dataset has a status displayed as a badge in the Data Vault listing and detail pages:

StatusDescription
CollectingThe source pipeline is active and data is flowing in. Datasets in this status cannot be archived or deleted.
DraftThe source pipeline is in draft or paused. No new data is being collected.
CompleteThe source pipeline has been completed or archived, or the dataset was imported. The dataset is finalized.
OrphanedThe source pipeline was deleted. The dataset and its data are preserved but read-only.

For pipeline-backed datasets, the status is automatically synced from the pipeline's lifecycle:

Pipeline statusDataset status
DraftDraft
ActiveCollecting
PausedDraft (but never downgraded from Collecting)
CompletedComplete
ArchivedComplete

Imported datasets are created with a Complete status since their data is fully loaded at import time.

The Data Vault listing

The listing page shows all datasets in your organization in a paginated table. Columns include:

  • Name — click to open the dataset detail page.
  • Status — badge showing the current lifecycle status.
  • Rows — number of rows (computed live for pipeline datasets).
  • Source — icon and label indicating Pipeline, File Upload, HuggingFace, or Export.
  • Project — the project this dataset belongs to, if any.
  • Source Pipeline — the pipeline that generated this dataset (hidden by default).
  • Last Updated — relative timestamp of the last update.
  • Archived — archived badge (hidden by default).

Use the search bar to filter datasets by name. Use the Status and Source dropdown filters to narrow results. Click Show archived in the toolbar to include archived datasets.

Pipeline-generated datasets

When a pipeline becomes active, a dataset is automatically created in the Data Vault. Each task in the pipeline becomes a row in the dataset, with columns corresponding to the form fields defined in the pipeline's nodes.

These datasets update in real time — as tasks progress through the pipeline, the dataset content reflects their current state, including task status (pending, in progress, finished, etc.).

If a pipeline-backed dataset is deleted and the pipeline later becomes active again, a new dataset is automatically created to ensure data collection continues.

Dataset detail page

Click on any dataset to open its detail page. The page has four tabs:

  • Overview — summary statistics (row count, column count, dates), source information, and pipeline link for pipeline-backed datasets.
  • Dataset Content — paginated data table showing all rows and fields. Supports column visibility controls and text wrapping toggle.
  • Analytics — evaluation field analytics, contributor breakdowns, and inter-rater agreement metrics (pipeline datasets only).
  • Studio — interactive charting and exploration workspace (pipeline datasets only). See Analytics Studio.

The toolbar at the top of the detail page includes:

  • Share — create a share link (pipeline datasets only). See Sharing.
  • Export — download the dataset. See Exporting Data.
  • Archive / Unarchive — toggle the archived state (not available for datasets with "collecting" status).
  • Delete — permanently delete the dataset (not available for datasets with "collecting" status).

Managing datasets

Archiving

Archive a dataset to hide it from the default Data Vault listing. Archived datasets are not deleted — they can be restored at any time using Unarchive. Datasets with "collecting" status (i.e., their pipeline is active) cannot be archived.

If a pipeline becomes active and its dataset was previously archived, the dataset is automatically un-archived.

Deleting

Deleting a dataset is permanent. The confirmation dialog warns that the action cannot be undone. If the associated pipeline later becomes active, a new (empty) dataset will be created automatically.

Datasets with "collecting" status cannot be deleted — pause or complete the pipeline first.

Renaming

Imported datasets can be renamed by clicking the dataset title on the detail page. Pipeline-backed datasets inherit their name from the pipeline and cannot be renamed directly.