test-framework
The test surface. How @test functions are declared, discovered,
run, and reported; how assertions fail with structured
diagnostics; how capability mocking and property tests integrate
with the rest of the language; and what the Arbitrary face
contract is.
Status: draft (v0). The annotation surface (
@test,@skip_laws) and the existing diagnostic codes (TYP218,TYP219,ENV020) are already in play across other specs; this file consolidates them and introduces theTSTdiagnostic band for test-specific failures. Open items at the end name what a v1 revision could expand on.
Design goals
- Tests are ordinary functions. A
@test-annotatedfnis a function that q64 already understands. No DSL, no separate build target, no macro layer. The test runner finds them, invokes them, and reports what they do. - Failures are diagnostics. A failing assertion emits the
same envelope shape the compiler does (per
diagnostics.md). The test runner is a diagnostic source like the type-checker — same renderer, same machine-readable output, sameseverityfield. - Capability mocks share one mechanism. The
with_capabilitiesblock fromenv.mdis the only mocking surface. Tests don’t get a parallel dependency-injection framework. - Property tests reuse face laws. A
lawdeclaration on a face is a property test for every fit of that face perfaces.md§Laws. The framework does not introduce a second predicate language. - Determinism by default. Every test gets a seeded RNG; every property test records the seed in the failure report. Two runs of the same suite with the same seeds produce the same results.
Vocabulary
| Word | Meaning |
|---|---|
| test function | A free fn annotated @test. Discovered by qube test. |
| assertion | A call to assert(...), assert_eq(...), etc. that fails by emitting a TST diagnostic. |
| property test | A test generated from a face law per faces.md. Runs random Arbitrary inputs through the law. |
| fixture | A value constructed for a test’s duration. q64 has no @fixture keyword; fixtures are scope-local bindings. |
| mock | A capability fit produced by Face.mock(...), installed via with_capabilities. |
| shrink | The reduction of a counter-example to a smaller failing input. Performed by Arbitrary::shrink. |
The @test annotation
@test is a category-1 compiler-known marker per
annotations.md. It registers a function
with the test runner.
Position and signature
@test attaches to a free fn. Methods, fits, faces, stages,
and nested closures cannot be tests.
@testfn user_round_trip() { let original = User { name: "Ada", age: 30 } let encoded = original.to_json() let decoded = User.from_json(encoded) assert_eq(original, decoded)}Permitted signatures:
@test fn name() // returns (); failure via assertion@test fn name() -> Result<(), TestFailure> // returns Err to fail; try insideOther return types are TST001. Parameters of any kind are
TST002; a test function takes zero arguments and reads
capabilities (mocked or real) ambiently per
env.md.
Effect inference
The compiler infers the effect set of a @test body normally.
A test that touches the network picks up @network; a test
that allocates picks up nothing special (allocation is the
default). The runner enforces no inferred-effect restriction; a
@realtime assertion inside a test is the body’s choice.
The test runner itself is a @no_realtime, @io, @time
context — it logs progress, schedules tests, and reports
results. A @test body that needs @realtime semantics in its
subject under test invokes that subject through the normal
effect rules.
Discovery
qube test discovers tests two ways:
- In-source tests. Any
@test-annotatedfnin a module reachable from the qube’ssrc/is a test. tests/directory. Any module undertests/at the qube root is compiled as a separate crate with the qube as apubdependency.@test-annotatedfns in it are discovered the same way.
tests/ modules can import the qube’s pub surface freely;
they cannot reach the qube’s internal items (this is the same
rule the registry boundary enforces). For tests that need
private-symbol access, declare them inside src/ next to the
code they cover.
Filtering
qube test # run every testqube test user # run tests whose fully-qualified name contains "user"qube test --exact user_id # run only tests whose name is exactly "user_id"qube test --members "math/*" # workspace filter (per qube-cli.md)The filter is a substring match on the test’s
fully-qualified name (my_qube::auth::test_login) unless
--exact is given.
Assertion API
Assertions live in the auto-prelude. Each assertion that fails
emits a TST020 diagnostic and aborts the test (via panic TestFailure) — the test runner catches that panic at the test
boundary, records the failure, and continues with the next
test.
| Assertion | Fails when |
|---|---|
assert(cond) | cond is false. |
assert(cond, msg) | cond is false; msg (any Display) is included in the report. |
assert_eq(a, b) | a != b per Eq. |
assert_neq(a, b) | a == b per Eq. |
assert_matches(value, Pattern) | value does not match the pattern; pattern syntax per grammar.md §Patterns. |
assert_approx_eq(a, b, eps: f64) | ` |
assert_panics(|| { … }) | The closure does not panic. Returns the panic payload for inspection. |
assert_panics_with(|| { … }, P) | The closure does not panic, or panics with a payload not matching pattern P. |
unreachable_test() | Marker for paths that should be unreachable in test (TST022). Distinct from runtime unreachable!() per errors.md. |
assert_eq and assert_neq require both arguments fit Eq
and Debug (the latter for the failure report). TST030 is
emitted at compile time when the arguments don’t fit Debug.
Source-location capture
Assertion source locations are captured at the call site, not
inside the assertion implementation. The compiler synthesizes
the location from the lexer span; assertion authors do not
write @caller-style annotations.
Custom assertions
User code may declare its own assertion helpers; they are
ordinary functions that panic TestFailure:
fn assert_sorted<T: Ord>(items: [T]) { for i in 1..items.len() { if items[i] < items[i - 1] { panic TestFailure { message: "items not sorted at index {i}", location: env.test.caller(), } } }}env.test.caller() returns a Location for the call site of
the function it appears in; it is @test-context-only
(TST040 otherwise).
The TestFailure panic type
pub struct TestFailure { message: str, location: Location, context: Map<str, str>, // optional key-value attachments}
fit TestFailure : Panic { fn code() -> Option<str> { Some("TST020") }}TestFailure is auto-prelude. It is the carrier the assertion
API panics with; the test runner catches it at the test
boundary and converts it to a TST020 envelope. Per
errors.md, panic TestFailure { … } allocates
in the scope arena and unwinds to the nearest scope, which the
runner installs around each test.
A test that prefers Result<(), TestFailure> over panicking
constructs the same value and returns Err(…); the runner
treats either form identically.
Mocking — the with_capabilities pattern
The mock pattern is the use: override on
with_capabilities, per
env.md §“Testing with mocks”. The test framework
adds no new mechanism; it pins the conventions.
The .mock() convention
A capability face exposes a .mock(...) constructor by
convention. The constructor returns a fit suitable for use as
the use: value:
pub fit MockNet : Net { fn get(self, url: Url) -> Result<Response, IoError> { … } pub fn new() -> MockNet { … } pub fn on_get(self, url: Url, body: str) -> MockNet { … }}
impl Net { pub fn mock() -> MockNet @test { MockNet.new() }}Net.mock() is @test-only — calling it outside a @test
function (or a function transitively called from one) is
ENV020 per env.md. The @test effect marker on
the mock method is how the compiler tracks the boundary; it
is the same marker the @test annotation produces in the
inferred set of the test body.
Test usage
@testfn fetch_users_parses_response() { with_capabilities(use: { net: Net.mock() .on_get(url"https://api.example.com/users", body: r#"[{"name":"Ada"}]"#) }) { let users = try fetch_users(url"https://api.example.com/users") assert_eq(users.len(), 1) assert_eq(users[0].name, "Ada") }}The production fetch_users reads env.net ambiently; the
test shadows the binding; the test’s MockNet is what the
synthesized parameter resolves to. No alternative entry point,
no dependency-injection framework.
Strictness modes
Mocks are strict by default: any capability call not
explicitly configured fails the test with TST050. A test
that wants a permissive mock declares so:
with_capabilities(use: { net: Net.mock().permissive() }) { … }permissive() returns the same fit with unconfigured calls
falling through to a default (empty Response, zero bytes
read, etc.). The strict/permissive choice is a per-fit method,
not a framework flag.
Property tests and the Arbitrary face
Property tests are generated from face law declarations per
faces.md §Laws. The framework runs each law
against Arbitrary-generated inputs and reports
counter-examples as TYP218 diagnostics (the band is owned by
faces.md, not this spec).
The Arbitrary face
pub face Arbitrary<T> { fn generate(rng: ref Rng) -> T fn shrink(value: T) -> [T] // smaller candidates}Arbitrary<T> is auto-prelude. Stdlib provides fits for every
primitive type and auto-derives Arbitrary for any struct /
enum whose fields all fit Arbitrary. User code only writes
a manual fit when the auto-derived generator would produce
useless inputs (e.g., a Url parser whose generator would
almost never produce well-formed URLs).
The auto-derive lives under @derive(Arbitrary) per
faces.md §“Auto-derive”. It is opt-in for user
types — adding Arbitrary to a type that has laws and no
manual fit is TYP217 (missing-derive).
Generation strategy
Arbitrary::generate reads from a seeded Rng (one per test,
per property). The framework guarantees:
-
Determinism per seed. Two runs with the same seed produce the same input sequence.
-
Seed reporting. The seed is included in every counter- example report so the failure is reproducible:
error[TYP218]: property test law violated: associative--> src/math.q:42:5|42 | law associative: forall a, b, c => …|= note: seed 0xfaceb00c= note: minimal counter-example: a = 1, b = 2, c = 0 -
Per-test seed source. The seed defaults to a stable hash of the test’s fully-qualified name.
QUBE_TEST_SEED=<u64>in the environment overrides every test to use the same seed.
Shrinking
When a counter-example is found, the framework invokes
Arbitrary::shrink repeatedly: each smaller candidate is
re-tested, and the smallest still-failing input is reported.
Shrinking terminates when shrink returns an empty array or
when every candidate passes.
shrink has no determinism guarantee beyond “every candidate
should be smaller than the input under some structural order.”
The framework treats shrink as advisory; an inefficient
shrinker produces larger counter-examples but does not affect
correctness.
Skip via @skip_laws
A fit that declares laws but cannot satisfy them (e.g.,
Monoid<f64> per faces.md §“Opting out”) uses
@skip_laws:
@skip_lawspub fit Monoid<f64> { … }The fit still works; qube test skips the property tests for
it and emits no TYP218. TYP219 is the diagnostic for
@skip_laws on a fit whose face has no laws.
Fixtures and lifecycle
q64 has no @before_all / @after_each annotation surface.
The scope mechanism from memory.md §“Scope’s implicit
arena” is the fixture mechanism:
@testfn integration_db_query() { scope { let db = TestDb.start() // setup defer db.shutdown() // teardown — runs on scope exit let result = run_query(db, "SELECT 1") assert_eq(result, 1) }}defer runs on scope exit (normal or panicking); the
TestDb.shutdown() call is the teardown. The scope arena
collects any intermediate allocations.
Shared fixtures across tests
A fixture that’s expensive to construct (a populated database, a compiled model) is shared by wrapping the per-test scope:
fn with_loaded_model<R>(f: |Model| -> R) -> R { static MODEL: Lazy<Model> = Lazy { || Model.load("test-fixtures/tiny.bin") } scope { f(MODEL.get()) }}
@test fn classify_smoke() { with_loaded_model(|m| assert(m.classify("hi").confidence > 0.5)) }@test fn classify_unicode() { with_loaded_model(|m| assert(m.classify("👋").confidence > 0.5)) }Lazy is a property wrapper per
annotations.md; the static binding
constructs the value once across all tests. Tests run
sequentially within a single qube unless --parallel is
passed; with parallelism, the user is responsible for shared-
fixture safety.
Parallelism
qube test # serial; the v0 defaultqube test --parallel N # run up to N tests concurrentlyParallel test isolation is at the scope arena boundary: each
test gets its own scope and its own ambient env. Shared
mutable state (static var, @shared structs) is the user’s
responsibility; the framework does not synchronize it.
A test that depends on a host resource (e.g., a fixed network
port) declares so via @allow(TST060) if the implicit
serialization assumption is intentional; the framework cannot
detect cross-test resource conflicts otherwise.
Test output and the diagnostic envelope
qube test emits the same envelope shape as the compiler per
diagnostics.md. The code field carries
the test-specific band (TST020 for assertion failure,
TYP218 for property-law violation, TST050 for unmocked
capability) so downstream tools can filter by failure kind.
Test outcomes are reported per test as a note-severity
envelope on success and an error-severity envelope on
failure. The summary at end-of-run is a plain text line; it is
not part of the structured envelope stream.
ok user_round_trip (3ms)ok fetch_users_parses_response (12ms)FAIL classify_smoke (87ms) error[TST020]: assertion failed: expected 0.5, got 0.42 --> src/ml.q:84:9 | 84 | assert(m.classify("hi").confidence > 0.5) |3 tests, 1 failedThe exit code is 1 per qube-cli.md when
any test fails; 0 otherwise.
Diagnostic codes
Test diagnostics use the TST prefix; the prefix is reserved
in diagnostics.md §“Code conventions”.
Numbers are stable, never reused.
| Code | Short message | When |
|---|---|---|
TST001 | @test function has unsupported return type | Must be () or Result<(), TestFailure>. |
TST002 | @test function takes parameters | A test function takes zero arguments. |
TST003 | @test on a non-free function | @test on a method, fit, stage, or nested closure. |
TST020 | assertion failed | Any assert* failure at runtime. |
TST021 | panic in test body was not TestFailure | A test panicked with a non-TestFailure payload. Surfaced as a test failure, attributing the original panic. |
TST022 | unreachable_test() reached | A unreachable_test() marker executed. |
TST030 | assertion arguments don’t fit Debug | assert_eq(a, b) where a or b cannot be debug-printed. Compile-time. |
TST031 | assertion arguments don’t fit Eq | assert_eq / assert_neq where the type has no Eq fit. |
TST040 | env.test API used outside test context | A function calling env.test.caller() is not transitively reachable from a @test. Compile-time. |
TST050 | unmocked capability call in strict mock | A test’s strict mock saw a capability call it wasn’t configured for. |
TST051 | mock-configuration unreached | A configured mock entry was never invoked during the test. Warning by default. |
TST060 | suggestion: parallel-unsafe test in parallel run | (Note severity.) A test with @allow(TST060) runs in --parallel mode; the framework warns that the suppression may be wrong now. |
Property-test counter-examples remain on TYP218 (owned by
faces.md). The choice keeps law-checking under
the faces namespace; only test-specific failures (assertions,
mock policy, fixture lifecycle) live under TST.
All codes are emitted using the standard envelope from
diagnostics.md.
Examples
A minimal unit test
@testfn parse_empty_string() { let result = parse("") assert_matches(result, Err(ParseError::Empty))}A test with a fixture and teardown
@testfn write_then_read() { scope { let tmp = TempFile.create() defer tmp.cleanup() tmp.write("hello") assert_eq(tmp.read(), "hello") }}A capability-mocked test
@testfn fetch_404_returns_not_found_error() { with_capabilities(use: { net: Net.mock() .on_get(url"https://api.example.com/missing", status: 404, body: "") }) { let r = fetch(url"https://api.example.com/missing") assert_matches(r, Err(FetchError::NotFound)) }}A property test (via face laws)
pub face Monoid<T> { fn zero() -> T fn combine(a: T, b: T) -> T
law left_id: forall a: T => combine(zero(), a) == a law right_id: forall a: T => combine(a, zero()) == a law associative: forall a, b, c: T => combine(combine(a, b), c) == combine(a, combine(b, c))}
pub fit Monoid<String> { fn zero() -> String { "" } fn combine(a: String, b: String) -> String { a + b }}
// No @test needed — `qube test` runs the three laws against random Strings.A manual Arbitrary fit
pub fit Arbitrary<Url> { fn generate(rng: ref Rng) -> Url { let host = pick(rng, ["example.com", "q64.dev", "localhost"]) let path = "/" + rng.alphanumeric(0..16) Url.parse("https://{host}{path}").unwrap() }
fn shrink(u: Url) -> [Url] { // Try the host root, then drop path segments one at a time. if u.path.is_empty() { [] } else { [u.with_path("")] } }}A test that expects a panic
@testfn division_by_zero_panics() { let payload = assert_panics(|| { divide(1, 0) }) assert(payload.code() == Some("ARITH001"))}Open items deferred
- Per-test budgets. Wall-clock timeout, allocation cap, per-test sample count for property tests. v0 has a flat default (100 samples per property test, no timeout); the knobs to override per-test are deferred.
- Snapshot testing.
assert_snapshot(value)comparing against an on-disk golden file. Convenient for diagnostic- envelope tests but requires a stable filesystem layout per qube; deferred. - Test-only items. A
@testmodifier onstruct,fn, ormoddeclarations to gate code oncfg(test)-style conditional compilation. Today, helpers used only by tests live undertests/or accept being part of the qube’s surface. - Distributed test execution. Running
qube test --parallel Nacross multiple machines; v0 runs in one process. - Coverage instrumentation. Branch and line coverage. A codegen concern more than a test-framework one; lands when codegen does.
- Fuzz integration. A
@fuzzannotation distinct from property tests, with corpus management and crash bucketing. Out of scope for v0. Arbitrarygenerators with explicit weights. The current generation is uniform; user-supplied weights for enum variant frequency or string-length distribution land later.- Test-time effect overrides. Whether
@testshould weaken some effect bounds (e.g., allow@realtimeviolation inside a stub) for ergonomic mocking. Today the user wraps the call site.
Related specs
annotations.md—@testis a category-1 marker;@skip_lawsis the per-fit law-opt-out.faces.md—lawdeclarations; theArbitraryface is auto-prelude here; property-test diagnostics live underTYP218/TYP219.env.md—with_capabilities(use: { … })is the mock-installation mechanism;ENV020enforces the.mock()-outside-test rule.errors.md—panic,Panicface,Result/try;TestFailureparticipates in the same machinery.memory.md— scope arena anddeferare the fixture lifecycle mechanism; no test-specific@before_allneeded.concurrency.md— test parallelism uses the same scope-and-spawn machinery as production code; no test-only scheduler.diagnostics.md— envelope format for theTSTcode band; the renderer is shared with the compiler.qube-cli.md—qube testsubcommand, exit codes, member filtering.modules.md—Arbitraryis auto-prelude; thetests/directory’s visibility rules.