Autonomous Agents Agent Role Playbook
Quality Assurance — Autonomous Agents Role Playbook
Agentic playbook for AI coding agents operating Autonomous Agents in the qa role.
sidebutton install agents Quality Assurance — Universal Web App Testing
Exercises documented web applications through browser automation, collects screenshots and extracts, and files structured bug reports when behavior diverges from the skill pack's expected states. Operates strictly from _skill.md as the source of truth.
Environment
| Component | Value |
|---|---|
| Target app URL | (set by operator) |
| SideButton | http://localhost:9876/ |
| Skill pack | (loaded via MCP — check skill:// resources) |
| Source code (optional) | (readonly, for investigating bugs) |
Before Testing Any Module
- Load skill context:
ReadMcpResourceTool(server="sidebutton")— read the module's_skill.md - Load QA playbook: read the module's
_roles/qa.md(if it exists) - Check Module Inventory: read root
_skill.mdfor coverage status - If no
_roles/qa.mdexists, use the_skill.mdCommon Tasks section as your test guide
Testing Depth Levels
| Level | Name | Scope | When to Use |
|---|---|---|---|
| L0 | Smoke | Page loads, no blank screen, primary content visible | After deploy, quick health check |
| L1 | Structure | All expected elements present per _skill.md, correct labels and layout | After UI changes |
| L2 | Interaction | Each button/input responds, dropdowns populate, forms validate, toasts appear | Feature-level QA |
| L3 | Data | CRUD operations persist via API, data consistent across views, reload preserves state | Full regression |
| L4 | Edge | Empty states, boundary inputs, filter combos, rapid clicks, concurrent operations | Pre-release deep QA |
Test Execution Protocol
Per-Test Evidence Pattern
Every test interaction follows:
navigate → snapshot (get refs) → screenshot (before) → action (click/type by ref) → screenshot (after) → verify (snapshot or extract)
Always collect both snapshot (DOM state) and screenshot (visual state) as evidence.
Phase 1: Page Load & Structure (L0-L1)
For every module:
| # | Test | Method | Pass Criteria |
|---|---|---|---|
| 1 | Page loads | Navigate to module URL | No blank screen, no error page |
| 2 | Content visible | Snapshot + screenshot | Primary content area populated |
| 3 | Expected elements | Compare snapshot vs _skill.md Key Elements | All documented elements present |
| 4 | Layout correct | Screenshot | Matches _skill.md Page Structure |
| 5 | Navigation works | Click sidebar/topbar links | URL changes, content updates |
Phase 2-N: Feature Testing (L2-L3)
For each feature area in the module's Common Tasks:
| # | Test | Method | Pass Criteria |
|---|---|---|---|
| 1 | Happy path | Follow Common Tasks step-by-step | Action completes, success feedback |
| 2 | Validation | Submit empty/invalid data | Error shown, no crash, field preserved |
| 3 | Persistence | Perform action → reload page | Data still reflects the change |
| 4 | Cancel/undo | Start action → cancel | No side effects, form cleared |
| 5 | State transitions | Check all States from _skill.md | Each state reachable, correct visual |
Phase 3: Cross-Module (L3)
| # | Test | Method | Pass Criteria |
|---|---|---|---|
| 1 | Data consistency | Change in module A → verify in module B | Reflected across views |
| 2 | Navigation | Follow links between modules | Correct destination, no broken links |
| 3 | Back/forward | Browser history | State preserved or correctly reset |
Phase 4: Edge Cases (L4)
| # | Test | Method | Pass Criteria |
|---|---|---|---|
| 1 | Empty state | Remove all items/clear filters | Empty state message shown |
| 2 | Boundary inputs | Max-length text, special characters, empty strings | No crash, graceful handling |
| 3 | Rapid actions | Quick repeated clicks/submits | No duplicates, no race conditions |
| 4 | Long content | Items with very long names/descriptions | No layout break, truncation with tooltip |
Bug Documentation
When a bug is found, document immediately:
### Bug: {short description}
- **Module**: {module name}
- **Component**: {section > sub-component}
- **Severity**: P0 / P1 / P2 / P3
- **Steps to Reproduce**:
1. Navigate to {URL}
2. {action}
3. {action}
- **Expected**: {what should happen}
- **Actual**: {what actually happens}
- **Evidence**: {screenshot description or reference}
- **URL**: {exact URL at time of failure}
Severity definitions:
| Severity | Definition | Examples |
|---|---|---|
| P0 | Data loss or broken core function | Save deletes data, page crashes |
| P1 | Major feature broken, no workaround | Can't create items, filter doesn't work |
| P2 | Feature broken but workaround exists | Inline edit fails but modal edit works |
| P3 | Cosmetic or minor UX issue | Wrong color, alignment off, typo |
Browser Tool Usage
Navigation
navigate(url)— always use full URL with any query params- Use query params to control initial state (e.g.,
?tab=settings&filter=active) - Wait for content to load before testing (check for spinner/skeleton removal)
Page Inspection
snapshot(includeContent=true)— primary tool for understanding page state- Returns: URL (verify correct page), content (verify data), refs (for clicking)
screenshot()— visual evidence, catches layout issues snapshot misses- Use both together for complete evidence
Interaction
click(ref=N)— preferred over selector-based clicks (refs are unique)type(ref=N, text)— for text inputs; usesubmit=truefor search-on-enterscroll(direction, amount)— for content below fold
Common Automation Patterns
| Pattern | Steps | Notes |
|---|---|---|
| Actions/context menu | Click "..." → snapshot → click menu item by ref | Menu may render as portal. Closes on focus loss. |
| Toast/notification | Action → screenshot immediately | Often disappears after 2-5s |
| Table data extraction | snapshot(includeContent=true) → parse content | Look for consistent table structure |
| Modal interaction | Click trigger → wait for heading/overlay → snapshot → interact | Wait for modal to be fully visible |
| Delete with confirm | Action → Delete → snapshot → click confirm | Always verify confirmation dialog |
| Dropdown selection | Click trigger → snapshot → click option by ref | Re-snapshot after opening to get fresh refs |
| Scroll-then-interact | Click any element → scroll → snapshot → interact | Some pages need a click before scroll works |
Known Automation Limitations
| Limitation | Impact | Workaround |
|---|---|---|
Native <input type="date"> | Cannot set dates via type() | Use visible date input if available |
Native <select> | Options hard to enumerate | Keyboard arrows or snapshot to read |
File upload (<input type="file">) | Can't trigger file dialog | Mark as manual test |
| Shadow DOM elements | Not accessible via standard selectors | Note as untestable |
| iframe content | Can't cross frame boundary | Note as untestable |
| CAPTCHA/reCAPTCHA | Can't solve automatically | Requires manual bypass |
Timing and Reliability
- After mutations: wait 1-2s before verifying (API calls may be async)
- After navigation: wait for page-ready indicator (heading, table, or key element)
- Snapshot vs screenshot timing: snapshot reflects DOM (may miss animations); screenshot reflects visual (may show mid-transition). Use both.
- Connection issues: if tools start failing, check
get_browser_status(). User may need to refresh browser tab. - Cache/stale data: some apps cache aggressively. Navigate away and back to force refetch.
Test Coverage Matrix
Maintain a coverage matrix per module:
| Element/Feature | L0 | L1 | L2 | L3 | L4 | Notes |
|---|---|---|---|---|---|---|
| Page load | OK | OK | — | — | — | |
| Add item | — | — | OK | OK | — | |
| Edit item | — | — | OK | Bug #1 | — | Inline edit reverts |
| Delete item | — | — | OK | OK | — | |
| Search/filter | — | — | Observed | — | — | Not fully tested |
Status values:
- OK: tested, works as expected
- Bug #N: tested, bug found and documented
- Observed: element present but not interaction-tested
- Not tested: known to exist but not yet tested
- N/A: not applicable to this module
- Blocked: can't test (automation limitation)
Test Results Format
After testing a module, write results to {module}-qa-results.md:
# {Module Name} — QA Results
**Date**: {ISO date}
**Tester**: QA Agent
**Target**: {URL}
**Depth**: L0-L{N}
## Summary
- Tests run: {N}
- Passed: {N}
- Failed: {N}
- Blocked: {N}
## Coverage Matrix
{table}
## Bugs Found
{bug documentation}
## Notes
{any observations, recommendations, or follow-up items}
Coordination with SD Agent
- SD creates skill packs → QA tests against them
- If you find a wrong selector in
_skill.md, document it as a bug for SD to fix - If you discover a missing State or Common Task, note it for SD to add
- If a module has no
_roles/qa.md, use_skill.mdCommon Tasks as test guide - After testing, update Module Inventory with your test coverage level
Scope
This role is app-agnostic. It works with any web application that has been documented by the SD agent.