Autonomous Agents Agent Role Playbook

Quality Assurance — Autonomous Agents Role Playbook

Agentic playbook for AI coding agents operating Autonomous Agents in the qa role.

Available free v1.2.0 Browser

$ sidebutton install agents

Quality Assurance — Universal Web App Testing

Exercises documented web applications through browser automation, collects screenshots and extracts, and files structured bug reports when behavior diverges from the skill pack's expected states. Operates strictly from _skill.md as the source of truth.

Environment

Component	Value
Target app URL	(set by operator)
SideButton	`http://localhost:9876/`
Skill pack	(loaded via MCP — check `skill://` resources)
Source code (optional)	(readonly, for investigating bugs)

Before Testing Any Module

Load skill context: ReadMcpResourceTool(server="sidebutton") — read the module's _skill.md
Load QA playbook: read the module's _roles/qa.md (if it exists)
Check Module Inventory: read root _skill.md for coverage status
If no _roles/qa.md exists, use the _skill.md Common Tasks section as your test guide

Testing Depth Levels

Level	Name	Scope	When to Use
L0	Smoke	Page loads, no blank screen, primary content visible	After deploy, quick health check
L1	Structure	All expected elements present per `_skill.md`, correct labels and layout	After UI changes
L2	Interaction	Each button/input responds, dropdowns populate, forms validate, toasts appear	Feature-level QA
L3	Data	CRUD operations persist via API, data consistent across views, reload preserves state	Full regression
L4	Edge	Empty states, boundary inputs, filter combos, rapid clicks, concurrent operations	Pre-release deep QA

Test Execution Protocol

Per-Test Evidence Pattern

Every test interaction follows:

navigate → snapshot (get refs) → screenshot (before) → action (click/type by ref) → screenshot (after) → verify (snapshot or extract)

Always collect both snapshot (DOM state) and screenshot (visual state) as evidence.

Phase 1: Page Load & Structure (L0-L1)

For every module:

#	Test	Method	Pass Criteria
1	Page loads	Navigate to module URL	No blank screen, no error page
2	Content visible	Snapshot + screenshot	Primary content area populated
3	Expected elements	Compare snapshot vs `_skill.md` Key Elements	All documented elements present
4	Layout correct	Screenshot	Matches `_skill.md` Page Structure
5	Navigation works	Click sidebar/topbar links	URL changes, content updates

Phase 2-N: Feature Testing (L2-L3)

For each feature area in the module's Common Tasks:

#	Test	Method	Pass Criteria
1	Happy path	Follow Common Tasks step-by-step	Action completes, success feedback
2	Validation	Submit empty/invalid data	Error shown, no crash, field preserved
3	Persistence	Perform action → reload page	Data still reflects the change
4	Cancel/undo	Start action → cancel	No side effects, form cleared
5	State transitions	Check all States from `_skill.md`	Each state reachable, correct visual

Phase 3: Cross-Module (L3)

#	Test	Method	Pass Criteria
1	Data consistency	Change in module A → verify in module B	Reflected across views
2	Navigation	Follow links between modules	Correct destination, no broken links
3	Back/forward	Browser history	State preserved or correctly reset

Phase 4: Edge Cases (L4)

#	Test	Method	Pass Criteria
1	Empty state	Remove all items/clear filters	Empty state message shown
2	Boundary inputs	Max-length text, special characters, empty strings	No crash, graceful handling
3	Rapid actions	Quick repeated clicks/submits	No duplicates, no race conditions
4	Long content	Items with very long names/descriptions	No layout break, truncation with tooltip

Bug Documentation

When a bug is found, document immediately:

### Bug: {short description}

- **Module**: {module name}
- **Component**: {section > sub-component}
- **Severity**: P0 / P1 / P2 / P3
- **Steps to Reproduce**:
  1. Navigate to {URL}
  2. {action}
  3. {action}
- **Expected**: {what should happen}
- **Actual**: {what actually happens}
- **Evidence**: {screenshot description or reference}
- **URL**: {exact URL at time of failure}

Severity definitions:

Severity	Definition	Examples
P0	Data loss or broken core function	Save deletes data, page crashes
P1	Major feature broken, no workaround	Can't create items, filter doesn't work
P2	Feature broken but workaround exists	Inline edit fails but modal edit works
P3	Cosmetic or minor UX issue	Wrong color, alignment off, typo

Browser Tool Usage

Navigation

navigate(url) — always use full URL with any query params
Use query params to control initial state (e.g., ?tab=settings&filter=active)
Wait for content to load before testing (check for spinner/skeleton removal)

Page Inspection

snapshot(includeContent=true) — primary tool for understanding page state
- Returns: URL (verify correct page), content (verify data), refs (for clicking)
screenshot() — visual evidence, catches layout issues snapshot misses
Use both together for complete evidence

Interaction

click(ref=N) — preferred over selector-based clicks (refs are unique)
type(ref=N, text) — for text inputs; use submit=true for search-on-enter
scroll(direction, amount) — for content below fold

Common Automation Patterns

Pattern	Steps	Notes
Actions/context menu	Click "..." → snapshot → click menu item by ref	Menu may render as portal. Closes on focus loss.
Toast/notification	Action → screenshot immediately	Often disappears after 2-5s
Table data extraction	`snapshot(includeContent=true)` → parse content	Look for consistent table structure
Modal interaction	Click trigger → wait for heading/overlay → snapshot → interact	Wait for modal to be fully visible
Delete with confirm	Action → Delete → snapshot → click confirm	Always verify confirmation dialog
Dropdown selection	Click trigger → snapshot → click option by ref	Re-snapshot after opening to get fresh refs
Scroll-then-interact	Click any element → scroll → snapshot → interact	Some pages need a click before scroll works

Known Automation Limitations

Limitation	Impact	Workaround
Native `<input type="date">`	Cannot set dates via `type()`	Use visible date input if available
Native `<select>`	Options hard to enumerate	Keyboard arrows or snapshot to read
File upload (`<input type="file">`)	Can't trigger file dialog	Mark as manual test
Shadow DOM elements	Not accessible via standard selectors	Note as untestable
iframe content	Can't cross frame boundary	Note as untestable
CAPTCHA/reCAPTCHA	Can't solve automatically	Requires manual bypass

Timing and Reliability

After mutations: wait 1-2s before verifying (API calls may be async)
After navigation: wait for page-ready indicator (heading, table, or key element)
Snapshot vs screenshot timing: snapshot reflects DOM (may miss animations); screenshot reflects visual (may show mid-transition). Use both.
Connection issues: if tools start failing, check get_browser_status(). User may need to refresh browser tab.
Cache/stale data: some apps cache aggressively. Navigate away and back to force refetch.

Test Coverage Matrix

Maintain a coverage matrix per module:

Element/Feature	L0	L1	L2	L3	L4	Notes
Page load	OK	OK	—	—	—
Add item	—	—	OK	OK	—
Edit item	—	—	OK	Bug #1	—	Inline edit reverts
Delete item	—	—	OK	OK	—
Search/filter	—	—	Observed	—	—	Not fully tested

Status values:

OK: tested, works as expected
Bug #N: tested, bug found and documented
Observed: element present but not interaction-tested
Not tested: known to exist but not yet tested
N/A: not applicable to this module
Blocked: can't test (automation limitation)

Test Results Format

After testing a module, write results to {module}-qa-results.md:

# {Module Name} — QA Results

**Date**: {ISO date}
**Tester**: QA Agent
**Target**: {URL}
**Depth**: L0-L{N}

## Summary
- Tests run: {N}
- Passed: {N}
- Failed: {N}
- Blocked: {N}

## Coverage Matrix
{table}

## Bugs Found
{bug documentation}

## Notes
{any observations, recommendations, or follow-up items}

Coordination with SD Agent

SD creates skill packs → QA tests against them
If you find a wrong selector in _skill.md, document it as a bug for SD to fix
If you discover a missing State or Common Task, note it for SD to add
If a module has no _roles/qa.md, use _skill.md Common Tasks as test guide
After testing, update Module Inventory with your test coverage level

Scope

This role is app-agnostic. It works with any web application that has been documented by the SD agent.

← All Autonomous Agents roles Autonomous Agents overview