Skip to content

[DRAFT] FEAT: Additional Attack Analysis and Visualization#1439

Draft
jbolor21 wants to merge 17 commits intoAzure:mainfrom
jbolor21:bjagdagdorj/attack_visualization
Draft

[DRAFT] FEAT: Additional Attack Analysis and Visualization#1439
jbolor21 wants to merge 17 commits intoAzure:mainfrom
jbolor21:bjagdagdorj/attack_visualization

Conversation

@jbolor21
Copy link
Contributor

@jbolor21 jbolor21 commented Mar 5, 2026

Description

Branching off of @romanlutz #1362 to add additional analysis (grouping by harm category) and some initial visualization of analytics (basic table view)

  • Adding graph views

Tests and Documentation

Added unit tests and cells in notebook

Copilot AI and others added 15 commits March 2, 2026 13:56
Co-authored-by: romanlutz <10245648+romanlutz@users.noreply.github.com>
Co-authored-by: romanlutz <10245648+romanlutz@users.noreply.github.com>
Co-authored-by: romanlutz <10245648+romanlutz@users.noreply.github.com>
…icts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 5, 2026 02:21
@jbolor21 jbolor21 marked this pull request as draft March 5, 2026 02:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends PyRIT’s attack result analytics to support richer analysis and initial visualization workflows by adding flexible, dimension-based grouping (including harm categories) and a DataFrame export surface, along with accompanying docs and unit tests.

Changes:

  • Refactors analyze_results to return a structured AnalysisResult with per-dimension and composite-dimension breakdowns, plus to_dataframe() export.
  • Adds new built-in grouping dimensions (notably harm_category) and support for custom dimension extractors.
  • Adds/updates unit tests and introduces a new analytics documentation notebook (and wires it into the docs TOC).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pyrit/analytics/result_analysis.py Introduces AnalysisResult, dimension extractors, composite grouping, and to_dataframe() export.
pyrit/analytics/__init__.py Exposes new analytics API symbols (AnalysisResult, DimensionExtractor).
tests/unit/analytics/test_result_analysis.py Expands test coverage for new grouping dimensions, composite grouping, deprecation aliasing, and DataFrame export.
doc/code/analytics/1_result_analysis.py Adds a new result analysis documentation page with examples, including DataFrame output.
doc/code/analytics/1_result_analysis.ipynb Notebook version of the new documentation page for Jupytext/Jupyter Book execution.
doc/_toc.yml Adds the new analytics doc page to the documentation navigation.

Comment on lines +309 to +315
def analyze_results(
attack_results: list[AttackResult],
*,
group_by: list[Union[str, tuple[str, ...]]] | None = None,
custom_dimensions: dict[str, DimensionExtractor] | None = None,
) -> AnalysisResult:
"""
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analyze_results is exported from pyrit.analytics and previously returned a plain dict structure; it now returns AnalysisResult with a different shape. If this function is part of the public API, this is a breaking change for external consumers. Consider providing a backwards-compatible path (e.g., keep the legacy dict return behind an option or provide an as_legacy_dict() helper and deprecate the old shape) and document the migration clearly.

Copilot uses AI. Check for mistakes.
# # Result Analysis
#
# The `analyze_results` function computes attack success rates from a list of `AttackResult` objects.
# It supports flexible grouping across built-in dimensions (`attack_type`, `converter_type`, `label`)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opening description lists built-in dimensions as attack_type, converter_type, and label, but the implementation/docs below also support harm_category as a built-in dimension. Please update the introductory text (and the later “Default Behavior” section) to include harm_category so the documentation matches the API.

Suggested change
# It supports flexible grouping across built-in dimensions (`attack_type`, `converter_type`, `label`)
# It supports flexible grouping across built-in dimensions (`attack_type`, `converter_type`, `label`, `harm_category`)

Copilot uses AI. Check for mistakes.
"# Result Analysis\n",
"\n",
"The `analyze_results` function computes attack success rates from a list of `AttackResult` objects.\n",
"It supports flexible grouping across built-in dimensions (`attack_type`, `converter_type`, `label`)\n",
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook’s intro lists built-in dimensions as (attack_type, converter_type, label), but the example content/API also treats harm_category as built-in. Update the intro (and any later “Default Behavior” wording) to include harm_category to keep the notebook consistent with the code.

Suggested change
"It supports flexible grouping across built-in dimensions (`attack_type`, `converter_type`, `label`)\n",
"It supports flexible grouping across built-in dimensions (`attack_type`, `converter_type`, `label`, `harm_category`)\n",

Copilot uses AI. Check for mistakes.
Comment on lines +374 to +399
if group_by is None:
group_by = list(extractors.keys())

# Resolve deprecated aliases and validate dimension names
group_by = [_resolve_dimension_spec(spec=spec, extractors=extractors) for spec in group_by]

# Accumulators
overall_counts: defaultdict[str, int] = defaultdict(int)
by_type_counts: defaultdict[str, defaultdict[str, int]] = defaultdict(lambda: defaultdict(int))
dim_counts: dict[
Union[str, tuple[str, ...]],
defaultdict[Union[str, tuple[str, ...]], defaultdict[str, int]],
] = {spec: defaultdict(lambda: defaultdict(int)) for spec in group_by}

# Single pass over results
for attack in attack_results:
if not isinstance(attack, AttackResult):
raise TypeError(f"Expected AttackResult, got {type(attack).__name__}: {attack!r}")

outcome = attack.outcome
attack_type = attack.attack_identifier.class_name if attack.attack_identifier else "unknown"

if outcome == AttackOutcome.SUCCESS:
overall_counts["successes"] += 1
by_type_counts[attack_type]["successes"] += 1
elif outcome == AttackOutcome.FAILURE:
overall_counts["failures"] += 1
by_type_counts[attack_type]["failures"] += 1
else:
overall_counts["undetermined"] += 1
by_type_counts[attack_type]["undetermined"] += 1

overall_stats = _compute_stats(
successes=overall_counts["successes"],
failures=overall_counts["failures"],
undetermined=overall_counts["undetermined"],
)
key = _outcome_key(attack.outcome)
overall_counts[key] += 1

by_type_stats = {
attack_type: _compute_stats(
successes=counts["successes"],
failures=counts["failures"],
undetermined=counts["undetermined"],
)
for attack_type, counts in by_type_counts.items()
}
for spec in group_by:
if isinstance(spec, str):
for dim_value in extractors[spec](attack):
dim_counts[spec][dim_value][key] += 1
else:
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After resolving deprecated aliases, group_by can contain duplicate canonical specs (e.g., user passes both "attack_type" and deprecated "attack_identifier"). Because the code iterates for spec in group_by, duplicates will double-count the same dimension even though dim_counts only has one entry for that spec. Deduplicate the resolved group_by list (preserving order) before building dim_counts/iterating, and add a unit test for the mixed alias+canonical case to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +89
import plotly.graph_objects as go
import plotly.io as pio

return go, pio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plotly is most likely a no-go here. It pulls in a bunch more dependencies.

If we want interactive visualizations we should likely add them in the frontend. The analytics module is probably best suited for static artifacts.

Matplotlib is already a dependency and should be sufficient for static ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants