Testing

Testing runs your solution, evaluation, and cleanup code against a real AWS account. It creates and deletes real resources in your own account. Before testing, review_project must pass.

Testing granularity

You can test at different levels of granularity depending on how much confidence you have in the code.

Request	What runs
`test solution 01.02`	Solution code for step 01.02 only
`test evaluation 01.02`	Evaluation code for step 01.02 only
`test step 01.02`	Solution then evaluation for step 01.02
`test objective 02`	Solution then evaluation for all steps in objective 02
`cleanup objective 02`	Cleanup code for objective 02
`cycle test objective 02`	Test objective 02, clean it up, test it again
`test project solutions`	Solution code for all objectives in sequence
`test project`	Solution then evaluation for all objectives
`test and cleanup project`	Solution, evaluation, then cleanup for all objectives
`test next`	The next untested step based on session state

Start with individual steps when testing for the first time. Once you’re confident the code is solid, run at objective or project scope.

Resource flow

Each test run follows this sequence:

Solution code          Test runner              Evaluation code
─────────────          ───────────              ───────────────
Creates AWS resources  Receives resources list  Receives generated handles
Returns {name,         Creates generated        as events (type + id only)
  type, id}            handles in session       Receives resolved handles
                                                (name + type + id)
                                                Validates configuration
                                                Returns {name, type, id, status}
                       Checks IDs:
                       match → promote to resolved
                       mismatch → hard fail
                       Next step's solution
                       gets all handles as
                       context["resolved"]

Generated handles are created from the solution’s return value. They are unvalidated — the test runner knows the resource exists but hasn’t confirmed it’s correctly configured.

Resolved handles are handles that have passed evaluation. They are available to subsequent steps via context["resolved"].

Evaluation code receives two views of the current step’s resources:

events — the generated handles as unqualified events (type + id, no name)
resolved — all previously validated handles from prior steps (name + type + id)

If evaluation returns a handle in its validated list with status: "found", the test runner promotes it from generated to resolved. If the ID in the evaluation result doesn’t match the generated handle’s ID, the runner hard-fails — this indicates the evaluation code found a different resource than the solution created.

Cycle tests

A cycle test runs an objective, cleans it up, then runs it again. This verifies two things: that the code works correctly, and that cleanup leaves the account in a state where the objective can be run again cleanly.

Cycle test sequence for objective 02:

test_solution scope=objective 02
test_evaluation scope=objective 02
test_cleanup scope=objective 02
test_solution scope=objective 02
test_evaluation scope=objective 02

If cleanup leaves orphaned resources, the second run will typically fail with a naming collision or dependency conflict — which is exactly what you want to catch before publishing.

Session state

The test runner maintains a session in .genlabz/test-session.json (project-local, gitignored). The session tracks:

Resource handles accumulated across steps (both generated and resolved)
Which steps have been tested (solution and evaluation)
Results for each step (success/failure, messages, timestamps)
Cleanup results per objective

The session enables resumability — if a test run is interrupted, you can pick up where you left off. Call get_test_session to inspect the current state.

When some objectives are already deployed from a prior session and you want to test the project, clarify whether to start from the next untested objective or roll back the deployed objectives first.

Failure handling

If any solution, evaluation, or cleanup script fails, stop immediately — do not run further tools. The test runner returns structured output including the error, logs, and handle state.

A typical failure summary looks like:

solution failed at step 01.02

Error: BucketAlreadyExists

Logs:
Failed to create bucket: An error occurred (BucketAlreadyExists)...
Handle Type ID State
01_01_bucket AWS::S3::Bucket my-bucket-a3f9c2d1 resolved
01_02_policy AWS::S3::BucketPolicy — not_found

Handle	Type	ID	State
01_01_bucket	AWS::S3::Bucket	my-bucket-a3f9c2d1	resolved
01_02_policy	AWS::S3::BucketPolicy	—	not_found

From here you have three options:

Investigate — read the failing code file, understand the error, fix it
Fix and re-run — edit the code, then re-run test solution 01.02 or test evaluation 01.02
Start objective over — run cleanup objective 01, then re-run from the beginning

Presenting results

After every test_* tool call, call get_test_summary scope=last and display the output verbatim. The test tools return a slim JSON response with success/failure status; get_test_summary returns the full formatted markdown with tables, logs, and handles.

To see the full project status at any point, call get_test_summary scope=session.

AWS credentials

The test runner resolves credentials in this order:

Explicit profile and region parameters on the tool call
Project config in config.local.toml under [test]
Default AWS credential chain

Account ID is verified via sts:GetCallerIdentity at session creation and logged so you know which account resources are being created in.

What’s next

After all tests pass, the project is ready for IAM policy generation and publishing preparation.

For the full tool signatures and parameter reference, see MCP tools.