[DRAFT] FEAT: Chain-of-Thought Hijacking Attack Strategy and Test Coverage by riyosha · Pull Request #1438 · Azure/PyRIT

riyosha · 2026-03-05T01:05:33Z

Description

[DRAFT] FEAT: Chain-of-Thought Hijacking Attack Strategy and Test Coverage

Related to issue #897

This PR introduces the Chain-of-Thought (CoT) Hijacking attack strategy, as described in Zhao et al. (2025). The changes include:

ADDED: Implementation of the CoT Hijacking attack strategy - pyrit/executor/attack/multi_turn/cot_hijacking.py
ADDED: YAML prompt templates for 6 puzzle types from the paper - pyrit/datasets/executors/cot_hijacking/puzzle_generation_{puzzle_type}.yaml
ADDED: Unit tests for CoT Hijacking attack - tests/unit/executor/attack/multi_turn/test_cot_hijacking.py

Related issues: #897

Tests and Documentation

Added unit tests in tests/unit/executor/attack/multi_turn/test_cot_hijacking.py
Tested the attack with local llama3:8b as the target model and attacker model as mistral:7b. (These LLMs lack advanced reasoning; suggestions for better, locally accessible LRM models are welcome!)

This is a draft PR and I want to get your thoughts on the implementation so far. I have planned these updates:

Currently I'm relying on the _fallback_score_response function to use pattern matching for generating a score. I want to replace this with either another auxiliary model as a scorer or use float scale scoring using Azure Content Safety API.
Currently the iterative feedback given to the attacker model ( _generate_attack_prompt_async) only includes the harm score and a static prompt to make the puzzle more complex. I'll update it to include the target's previous safe response as well.

Question:

I noticed a few other multi_attack strategies define async def _teardown_async even if unused. Should I also add it?

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

romanlutz

This is really good! While reading it, I couldn't shake the feeling that this is very similar to RedTeamingAttack with the big difference that it cycles through the system prompt templates, of course. I haven't had time to compare with it in detail to see if that would be doable. My hunch is that it would introduce considerable complexity and is probably not worth it but I'd like to be sure...

Other things:

needs mentioning in api.rst
needs example notebook (both ipynb and py files) somewhere in doc/executor/attack, which in turn needs to be mentioned in TOC file. Example notebook doesn't need to be elaborate.
needs integration test, perhaps just one that runs the example notebook. This may be auto-created by test_executor_notebooks.py I think...

romanlutz · 2026-03-06T06:17:34Z