FEAT Add VLGuard multimodal safety dataset loader#1447
Open
romanlutz wants to merge 2 commits intoAzure:mainfrom
Open
FEAT Add VLGuard multimodal safety dataset loader#1447romanlutz wants to merge 2 commits intoAzure:mainfrom
romanlutz wants to merge 2 commits intoAzure:mainfrom
Conversation
Add support for the VLGuard dataset (ICML 2024) which contains image-instruction pairs for evaluating vision-language model safety across 4 categories (Privacy, Risky Behavior, Deception, Hateful Speech) with 8 subcategories. Supports three evaluation subsets: - unsafes: unsafe images with instructions (tests refusal) - safe_unsafes: safe images with unsafe instructions (tests refusal) - safe_safes: safe images with safe instructions (tests helpfulness) Downloads from HuggingFace (gated dataset, requires token and terms acceptance). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cac3cad to
255dd50
Compare
ValbuenaVC
reviewed
Mar 10, 2026
|
|
||
|
|
||
| class VLGuardCategory(Enum): | ||
| """Categories in the VLGuard dataset.""" |
Contributor
There was a problem hiding this comment.
Nit: Can we add a brief explainer or example of each category?
| models refuse unsafe content while remaining helpful on safe content. | ||
|
|
||
| The dataset covers 4 categories (Privacy, Risky Behavior, Deception, Hateful Speech) | ||
| with 8 subcategories (Personal Data, Professional Advice, Political, Sexually Explicit, |
Contributor
There was a problem hiding this comment.
Why aren't subcategories in the enum? They don't have to be, just curious
| prompts.append(text_prompt) | ||
| prompts.append(image_prompt) | ||
|
|
||
| if self.max_examples is not None and len(prompts) >= self.max_examples * 2: |
Contributor
There was a problem hiding this comment.
Suggested change
| if self.max_examples is not None and len(prompts) >= self.max_examples * 2: | |
| # Note that len(prompts) is divided by two since each prompt includes one image and one text field. | |
| if self.max_examples is not None and len(prompts) >= self.max_examples * 2: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for the VLGuard dataset (ICML 2024), a vision-language safety benchmark that evaluates whether multimodal models refuse unsafe content while remaining helpful on safe content.
What is VLGuard?
VLGuard contains ~2,000 image-instruction pairs across 4 categories (Privacy, Risky Behavior, Deception, Hateful Speech) and 8 subcategories (Personal Data, Professional Advice, Political, Sexually
Explicit, Violence, Disinformation, Discrimination by Sex, Discrimination by Race).
It supports three evaluation subsets:
Usage
from pyrit.datasets.seed_datasets.remote import _VLGuardDataset, VLGuardCategory, VLGuardSubset
Load unsafe image examples (default)
loader = VLGuardDataset(token="hf...")
dataset = await loader.fetch_dataset()
Load safe images with unsafe instructions, filtered to Privacy category
loader = VLGuardDataset(
subset=VLGuardSubset.SAFE_UNSAFES,
categories=[VLGuardCategory.PRIVACY],
token="hf...",
)
dataset = await loader.fetch_dataset()
Note: This is a gated dataset on HuggingFace. Users must accept the terms at https://huggingface.co/datasets/ys-zong/VLGuard and provide a HuggingFace token.
Changes