Dataset Insight Portal

Welcome! This portal helps you explore and manage datasets from our Hugging Face organization.

What is this space for?

This space provides a table of datasets along with metadata. You can:

  • Browse datasets with pagination.
  • Search datasets by various fields.
  • Assign responsibility for reviewing datasets (assigned_to).
  • Track progress using status.
  • update the parquet file and push to git automatically every 20mins. So if you see restarting/building pls wait for 5mins.

Why the table?

The table gives a structured view of all datasets, making it easy to sort, filter, and update information for each dataset. It consists of all datasets until 20-09-2025.

What does the table contain?

Each row represents a dataset. Columns include:

  • dataset_id: Unique identifier of the dataset.
  • dataset_url: Link to the dataset page on Hugging Face.
  • downloads: Number of downloads.
  • author: Dataset author.
  • license: License type.
  • tags: Tags describing the dataset. Obtained from the dataset card.
  • task_categories: Categories of tasks the dataset is useful for. Obtained from the dataset card.
  • last_modified: Date of last update.
  • field, keyword: Metadata columns describing dataset purpose based on heuristics. Use the field and keyword to filter for science based datasets.
  • category: Category of the dataset (rich means it is good dataset card. minimal means it needs improvement for the reasons below).
  • reason: Reason why the dataset is classified as minimal. Options: Failed to load card, No metadata and no description, No metadata and has description, Short description.
  • usedStorage: Storage used by the dataset (bytes).
  • assigned_to: Person responsible for the dataset (editable).
  • status: Progress status (editable). Options: todo, inprogress, PR submitted, PR merged.

How to use search

  • Select a column from the dropdown.
  • If the column is textual, type your query in the text box.
  • If the column is a dropdown (like assigned_to or status), select the value from the dropdown.
  • Click Search to filter the table.

How to add or update assigned_to and status

  1. Search for the dataset_id initially.
  2. Then, select the dataset_id from the dropdown below the table.
  3. Choose the person responsible in Assigned To. If you are a member of the organization, your username should appear in the list. Else refresh and try again.
  4. Select the current status in Status.
  5. Click Save Changes to update the table and persist the changes.
  6. Use Refresh All to reload the table and the latest members list. This portal makes it easy to keep track of dataset reviews, assignments, and progress all in one place.

Total Pages: 10064

Column to Search
Select dataset_id
Assigned To
Status