Merged
Conversation
rlundeen2
reviewed
Feb 26, 2026
rlundeen2
reviewed
Feb 26, 2026
rlundeen2
reviewed
Feb 26, 2026
jsong468
reviewed
Feb 26, 2026
rlundeen2
reviewed
Feb 26, 2026
rlundeen2
reviewed
Feb 26, 2026
rlundeen2
reviewed
Feb 26, 2026
rlundeen2
reviewed
Feb 26, 2026
rlundeen2
reviewed
Feb 26, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a standardized “component initializer” pattern for AIRT setup by adding a scorer initializer alongside the existing target initializer approach, and refactors the scorer-evaluation script to rely on registry-registered scorers.
Changes:
- Added
pyrit/setup/initializers/components/with dedicated target/scorer initializer modules and updated package exports. - Introduced
AIRTScorerInitializerwith a centralizedSCORER_CONFIGSlist for evaluation scorers. - Refactored
build_scripts/evaluate_scorers.pyto initialize via AIRT initializers and iterate scorers fromScorerRegistry.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/setup/test_airt_targets_initializer.py | Updates imports to the new components.targets module path. |
| tests/unit/setup/test_airt_scorer_initializer.py | Adds unit tests for AIRTScorerInitializer behavior and config coverage. |
| pyrit/setup/initializers/components/targets.py | New module defining TARGET_CONFIGS and AIRTTargetInitializer registration logic. |
| pyrit/setup/initializers/components/scorers.py | New module defining SCORER_CONFIGS and AIRTScorerInitializer registration logic. |
| pyrit/setup/initializers/components/init.py | Exposes component initializer types via __all__. |
| pyrit/setup/initializers/init.py | Re-exports AIRTScorerInitializer and updates AIRTTargetInitializer import path. |
| build_scripts/evaluate_scorers.py | Uses AIRTScorerInitializer + ScorerRegistry instead of hand-built scorer instances. |
Comments suppressed due to low confidence (1)
build_scripts/evaluate_scorers.py:72
Scorer.evaluate_async()defaults toupdate_registry_behavior=SKIP_IF_EXISTS, so this script may return cached metrics without re-running evaluations (and still prints “Evaluation complete and saved!”). If the intent is to benchmark scorers on each run, passupdate_registry_behavior=RegistryUpdateBehavior.ALWAYS_UPDATE(and import the enum) or adjust the status messaging to reflect when cached results were used.
try:
print(" Status: Running evaluations...")
results = await scorer.evaluate_async(
num_scorer_trials=3,
max_concurrency=10,
)
rlundeen2
reviewed
Mar 3, 2026
rlundeen2
reviewed
Mar 3, 2026
40e5a8c to
2667502
Compare
rlundeen2
reviewed
Mar 4, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
build_scripts/evaluate_scorers.py:72
- This script now iterates over all registered scorers, but many registered scorers (e.g.,
TrueFalseCompositeScorer,FloatScaleThresholdScorer) don't configureevaluation_file_mapping, soevaluate_async()will raiseValueErrorfor each of them. Consider filteringscorer_namesup-front to only scorers with an evaluation mapping (or passing an explicit mapping), and optionally report which scorers were skipped due to missing evaluation datasets.
registry = ScorerRegistry.get_registry_singleton()
scorer_names = registry.get_names()
if not scorer_names:
print("No scorers registered. Check environment variable configuration.")
return
print(f"\nEvaluating {len(scorer_names)} scorer(s)...\n")
# Use tqdm for progress tracking across all scorers
scorer_iterator = (
tqdm(enumerate(scorer_names, 1), total=len(scorer_names), desc="Scorers")
if tqdm
else enumerate(scorer_names, 1)
)
# Evaluate each scorer
for i, scorer_name in scorer_iterator:
scorer = registry.get_instance_by_name(scorer_name)
print(f"\n[{i}/{len(scorer_names)}] Evaluating {scorer_name}...")
print(" Status: Starting evaluation (this may take several minutes)...")
start_time = time.time()
try:
print(" Status: Running evaluations...")
results = await scorer.evaluate_async(
num_scorer_trials=3,
max_concurrency=10,
)
jsong468
reviewed
Mar 4, 2026
jsong468
reviewed
Mar 4, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
54b81bf to
ff06070
Compare
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
rlundeen2
reviewed
Mar 5, 2026
romanlutz
approved these changes
Mar 7, 2026
…te-evaluate_scorers
rlundeen2
approved these changes
Mar 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Creates a standardized scorer initialization pattern mirroring the existing
AIRTTargetInitializerapproach.pyrit/setup/initializers/components/subdirectoryairt_targets.py→components/targets.py, renamedTargetConfig→AIRTTargetConfigcomponents/scorers.pywithAIRTScorerInitializerand 21 scorer configs__init__.pyexports for new module pathsevaluate_scorers.pyto useTargetRegistryfor base targets and wire in both initializersTests and Documentation
tests/unit/setup/test_airt_scorer_initializer.pytests/unit/setup/test_airt_targets_initializer.pyfor new import paths