Skip to content

Workflow Deep Dive

This page is the long-form integration view of a single APEX run. Where How It Works and Workflow are focused references, this page walks the same pipeline end-to-end and shows how every cross-cutting mechanic — skills, instructions, registries, apex-recall, hooks, the challenger lane, and the lessons-feedback loop — plugs into each stage. Use it once, then jump to the focused references for day-to-day work.

APEX is an opinionated, agent-driven pipeline that turns a natural-language Azure ask into reviewed, deploy-ready Infrastructure-as-Code. Three primitives do the heavy lifting:

  • Agent steps — a single Copilot agent owns one stage, produces versioned artifacts under agent-output/{project}/, and hands off through the Orchestrator.
  • Gates — explicit human or validation checkpoints between steps. The workflow does not auto-advance past a gate.
  • Subagent fan-out — parallel, isolated subagents called by a parent agent for adversarial review, cost queries, deployment previews, or documentation parallelism. Subagents return structured results and never share the parent’s context window.

State lives in three deliberately separate places:

WhereWhatLifecycle
agent-output/{project}/Versioned artifacts (markdown, JSON, diagrams)Per-project, on disk
apex-recall session storeDecisions, findings, step status, governance tracePer-project, queryable
.github/skills/workflow-engine/templates/workflow-graph.jsonDAG — nodes, edges, gates, return edges, plan-lockRepo-wide, read-only
flowchart LR
    H((Human)) -->|approves| G(Gates)
    A[Agent step] --> AR[Artifacts]
    A --> RC[apex-recall]
    A -.->|may invoke| SA[Subagents]
    SA --> A
    AR --> G
    G --> A2[Next agent step]
    RC -.->|context| A2
    classDef gate fill:#fff5d6,stroke:#d4a000;
    class G gate;

Every step pulls from the same five surfaces. Understanding them once removes 80% of the apparent “magic”.

Skills are domain knowledge packs auto-discovered by the description field in each .github/skills/{name}/SKILL.md. Agents read SKILL.md on demand and load references/*.md only when the body explicitly points to one — there is no digest tier.

Skills classify as WORKFLOW (multi-phase procedures), ANALYSIS (read-only investigations), or UTILITY (reusable patterns and defaults). The catalog is large; the relevant slice per step appears in the Skill ↔ Step matrix.

Instructions are rule files auto-loaded by VS Code Copilot when their applyTo glob matches the file under edit. They never need explicit invocation. The most consequential ones for a workflow run:

InstructionTriggered by editingRole
agent-operating-frame.github/agents/*.agent.mdShared agent operating frame
governance-discovery**/04-governance-constraints.{md,json}Policy-discovery requirements
sku-manifest**/sku-manifest.{md,json}Authoring + drift contract for the SKU manifest
iac-plan-best-practices**/04-implementation-plan.mdPlan-level policy + cost rules
iac-bicep-best-practices**/*.bicepBicep code rules (AVM, security baseline)
iac-terraform-best-practices**/*.tfTerraform code rules
azure-artifacts**/agent-output/**/*.mdH2 template enforcement
no-interactive-shellchat-loaded agent/skill/instruction filesBans -i flags, read -p, heredoc prompts
lesson-collection**/*orchestrator*.agent.mdLesson-capture protocol

Five JSON/CSV registries are the source of truth for module choice, deprecation avoidance, and governance fallbacks:

FileRead byWhen
avm-bicep-modules.csv05-IaC Planner, 06b-Bicep CodeGenModule discovery and pinning
avm-terraform-modules.csv05-IaC Planner, 06t-Terraform CodeGenModule discovery and pinning
avm-module-index.json05-IaC Planner, 03-ArchitectLifecycle status (Available / Proposed / Orphaned) lookup
azure-deprecations.json03-Architect, 05-IaC PlannerBlock sunset SKUs early
governance-policy-baseline.json04g-GovernanceFallback baseline when live discovery is empty
governance-policy-baseline.fixture.jsonValidators + testsDeterministic test fixture

All cross-step state flows through the apex-recall CLI — agents never read or write 00-session-state.json directly. The full schema for show --json lives in tools/apex-recall/docs/show-schema.md; the valid decision-keys registry lives in tools/apex-recall/docs/decision-keys.md.

Lifecycle commands used during a run:

Terminal window
apex-recall init <project> --json # new project
apex-recall show <project> --json # full context
apex-recall checkpoint <project> <step> <phase> --json # after each phase
apex-recall complete-step <project> <step> --json # on completion
apex-recall decide <project> --key <k> --value <v> --json # record decision
apex-recall finding <project> --add "<text>" --json # log a finding
apex-recall review-audit <project> <step> ... --json # after challenger

Read-only orientation (used by every agent on resume): sessions | files | search '<term>' | decisions, all --json-capable.

Three enforcement layers sit outside the agent prompt:

  1. Lefthook pre-commit pipeline runs serially on staged files: markdown-lint, link-check (site docs only), h2-sync, artifact-validation, agents, model-catalog-sync, instructions, skill-references, sku-manifest-render, safe-shell.
  2. Lefthook pre-push runs tools/scripts/diff-based-push-check.sh which categorises changed files and fires only matching validators.
  3. GitHub Actions complement the local hooks with full-repo validation on PRs.

The 10-Challenger adversarial-review wrapper is a separate enforcement plane — it audits AI-generated creative decisions in artifacts, not file syntax. Hooks and the challenger never overlap responsibilities.

Every stage section follows the same sub-template so it is scannable. Counts of resources, lenses, or passes come from workflow-graph.json — treat that file as authoritative.

Step 0 — Project Init (Orchestrator boot)

Section titled “Step 0 — Project Init (Orchestrator boot)”

Before Step 1 runs, 01-Orchestrator initialises the project and captures two project-scoped decisions that every downstream agent reads (never re-asked):

  • iac_tool — Bicep or Terraform. Captured at Step 1 Phase 2 by 02-Requirements and persisted via apex-recall decide … --key iac_tool. No default — the user must choose.
  • review_depth — Adversarial-review depth for the whole project. Captured at project boot (or first gate after init). Default default = single-pass comprehensive at Steps 1, 2, 4 plus governance-reconciliation at Step 3.5. Opt-in deep flips every challenger call into the rotating-lens multi-pass cascade defined in adversarial-review-protocol.md. Persisted via apex-recall decide … --key review_depth.

Once written, both decisions can be changed only by editing the apex-recall value directly — the orchestrator does not re-prompt. For a single-artifact deep review without flipping the project, invoke 10-Challenger manually. Full contract: 01-orchestrator.agent.md → Computing decisions.review_depth.

  • Purpose & inputs — Capture the project intent and pin the SKU manifest revision 1. requires: []; produces: 01-requirements.md, sku-manifest.json, sku-manifest.md.
  • Driving agent02-Requirements (no subagents).
  • Skills auto-loadedazure-defaults (regions, tags, naming), azure-artifacts (H2 templates).
  • Instructions activatedagent-operating-frame, azure-artifacts, sku-manifest, no-interactive-shell.
  • Data sources — none beyond user answers; lessons from prior runs are optionally surfaced by Orchestrator init.
  • apex-recall touchpointsinit, decide (sets iac_tool, region, complexity, relational_db), checkpoint per phase, complete-step 1.
  • Artifactsagent-output/{project}/01-requirements.md, sku-manifest.{json,md} (empty services[] is the common case; user-pinned SKUs only).
  • Challenger review — single-pass comprehensive (mandatory).
  • Gate & approvalgate-1 blocks until the human approves 01-requirements.md.
  • Hooks on commitmarkdown-lint, artifact-validation, sku-manifest-render.
  • Common failures — under-specified non-functional requirements; caught by the challenger and routed back via the step-1 → step-1 self-refine edge.
  • Purpose & inputs — Produce WAF-pillar-scored architecture and a cost estimate. requires: gate-1; mutates sku-manifest.
  • Driving agent03-Architect with cost-estimate-subagent.
  • Skills auto-loadedazure-defaults, azure-artifacts, microsoft-docs (on demand), context-management.
  • Instructions activatedagent-operating-frame, azure-artifacts, sku-manifest.
  • Data sourcesavm-module-index.json (lifecycle status), azure-deprecations.json, Azure Pricing MCP (via subagent).
  • apex-recallcheckpoint per phase; decide for review-depth default; cost estimate stored as artifact, summary in findings.
  • Artifacts02-architecture-assessment.md, 03-des-cost-estimate.md, 03-des-sku-comparison.md (when SKU trade-offs exist), mutated sku-manifest.
  • Challenger review — single-pass comprehensive (mandatory). decisions.review_depth = "deep" opts into rotating-lens multi-pass (security-governance, architecture-reliability, optionally cost-feasibility).
  • Gate & approvalgate-2.
  • Hooks on commitmarkdown-lint, artifact-validation, sku-manifest-render.
  • Common failures — orphaned/proposed AVM modules selected (caught by avm-module-index.json check), missing private-endpoint story (caught by security-governance lens on deep review).
  • Purpose & inputs — Architecture diagrams + ADRs. requires: gate-2; produces 03-des-diagram.drawio, 03-des-adr-*.md.
  • Driving agent04-Design. Optional — users can skip directly to Step 3.5 governance.
  • Skills auto-loadeddrawio (or python-diagrams), azure-adr, azure-defaults, azure-artifacts.
  • Instructions activateddrawio, azure-artifacts, agent-operating-frame.
  • Data sourcesavm-module-index.json for module-aware diagrams.
  • apex-recallcheckpoint, complete-step 3.
  • Artifacts.drawio source + PNG; one ADR file per material decision.
  • Challenger review — opt-in only; scope design-adr if invoked.
  • Gate & approval — no gate; flows directly to Step 3.5.
  • Hooks on commitmarkdown-lint, link-check for ADR references.
  • Common failures — diagrams drifting from the architecture assessment; surfaced at Step 7 drift detection.
  • Purpose & inputs — Discover effective Azure Policy assignments (incl. management-group-inherited) for the target subscription and reconcile them with the approved architecture. requires: gate-2.
  • Driving agent04g-Governance, invoking .github/skills/azure-governance-discovery/scripts/discover.py.
  • Skills auto-loadedazure-governance-discovery, azure-defaults, azure-artifacts, iac-common (drift routing).
  • Instructions activatedgovernance-discovery (mandatory policy contract), azure-artifacts.
  • Data sources — Azure Policy REST API (live); governance-policy-baseline.json as documented fallback.
  • apex-recallcheckpoint, decide --key governance_depth, records the L0 discovery envelope as the first link in the attestation chain.
  • Artifacts04-governance-constraints.md + .json (with discovery_metadata envelope).
  • Challenger review — single-pass governance-reconciliation. Skipped when constraints.count == 0.
  • Gate & approvalgate-2.5. Precondition: reconciliation must not be escalated_to_step-2; if it is, the gate stays closed and Step 2 must re-approve before reconciliation re-runs. This closes the governance-vs-architecture livelock.
  • Hooks on commitmarkdown-lint, artifact-validation (governance JSON has a dedicated schema check).
  • Common failures — Deny-effect policy on a planned resource; routed back to Architect via step-3_5 → step-2 return edge (on_must_fix_governance_conflict).
  • Purpose & inputs — Produce a machine-readable implementation plan with frozen inputs for code generation. requires: gate-2.5; mutates sku-manifest.
  • Driving agent05-IaC Planner (a Sonnet 4.6 agent that branches Bicep vs Terraform via decisions.iac_tool).
  • Skills auto-loadedazure-defaults, azure-artifacts, python-diagrams, iac-common (plan-consistency-checks + governance-drift-routing), and track-specific azure-bicep-patterns or terraform-patterns.
  • Instructions activatediac-plan-best-practices, azure-artifacts, sku-manifest.
  • Data sourcesavm-bicep-modules.csv / avm-terraform-modules.csv (pinning), avm-module-index.json (lifecycle), azure-deprecations.json, governance constraints from Step 3.5.
  • apex-recallcheckpoint per phase; writes the L1 governance attestation (Governance Compliance Matrix H2); records decisions.governance_trace.
  • Artifacts04-implementation-plan.md (with ## 🛡️ Governance Compliance Matrix and ## 📤 Code-Generation Contract H2s), 04-iac-contract.json, 04-policy-property-map.json, 04-environment-manifest.json, dependency + runtime Python-diagrams (.py + .png).
  • Challenger review — single-pass comprehensive (mandatory). Deep-depth opts into rotating lenses (same matrix as Step 2).
  • Gate & approvalgate-3. Two preconditions:
    1. plan-readiness — all challenger passes APPROVED.
    2. plan-architecture-escalation — anti-livelock: any finding with requires_step == "step-2" re-opens the gate and traverses the step-4 → step-2 on_architecture_must_fix edge.
  • Hooks on commitmarkdown-lint, artifact-validation, sku-manifest-render, agents/instructions if the planner agent was edited.
  • Common failures — AVM module lifecycle drift; missing private endpoint on a data-tier resource; deny-policy conflict surfaced late.
  • Purpose & inputs — Emit ready-to-deploy IaC. requires: gate-3; inputs are frozen (plan-lock) and read-only.
  • Driving agents06b-Bicep CodeGen or 06t-Terraform CodeGen, each calling its track’s validate subagent.
  • Skills auto-loadedazure-defaults, azure-artifacts, azure-bicep-patterns or terraform-patterns, iac-common, context-management.
  • Instructions activatediac-bicep-best-practices or iac-terraform-best-practices, agent-operating-frame.
  • Data sources — same AVM CSV + index; policy-property-map and environment-manifest from Step 4.
  • apex-recallcheckpoint per phase; writes the L2 attestation rows; never edits frozen plan artifacts.
  • Artifactsinfra/bicep/{project}/ or infra/terraform/{project}/, 05-iac-handoff.json.
  • Challenger review — opt-in only (artifact_scope: iac-code); default skips. Plan-level findings return to Step 4 via the step-5b|t → step-4 on_refine edge — they never self-edit the plan.
  • Gate & approvalgate-4 is a validation gate (lint, build, bicep build / terraform validate clean).
  • Hooks on commitvalidate:iac-security-baseline and IaC-specific validators via the pre-push diff check.
  • Common failures — hallucinated AVM parameters (caught by bicep build / terraform validate); attempts to self-edit the frozen plan (caught by plan_readonly enforcement and routed back).
  • Purpose & inputs — Execute the deployment with safety nets. requires: gate-4; mutates sku-manifest on quota/region substitution.
  • Driving agents07b-Bicep Deploy (preferring azd provision) or 07t-Terraform Deploy. Each calls policy-precheck-subagent (L3 live policy check) plus bicep-whatif-subagent / terraform-plan-subagent.
  • Skills auto-loadedazure-defaults, azure-artifacts, iac-common (circuit-breaker, deploy-shared-workflow, policy-precheck-contract, governance-drift-routing).
  • Instructions activatedazure-yaml if azure.yaml is edited; iac-bicep-best-practices or iac-terraform-best-practices.
  • Data sources — live Azure Policy state via the precheck subagent; Azure Resource Manager for what-if / plan.
  • apex-recall — precondition: decisions.governance_trace must be present (full L0 → L1 → L2 → L3 chain) before az deployment create / azd provision / terraform apply.
  • Artifacts06-deployment-summary.md, 06-policy-precheck.json. The deployment summary folds the precheck into an informational H2 — no separate review.
  • Challenger review — none (deploy artifacts are tool output, not creative decisions).
  • Gate & approvalgate-5 after human approval. On failure, the step-6 → step-5 on_fail edge returns to CodeGen; on architecture gap surfaced at deploy time, step-6 → step-2 on_refine returns to Architect.
  • Hooks on commitmarkdown-lint, artifact-validation.
  • Common failures — quota exhaustion (handled via block-with-escalation substitution + sku-manifest mutation), policy Deny at apply time, transient ARM 5xx (handled by the iac-common circuit breaker).
  • Purpose & inputs — Produce the as-built documentation suite from the deployed resource state. requires: gate-5.
  • Driving agent08-As-Built (subagent fan-out, no further challenger). Seven parallel substeps: design document, operations runbook, cost estimate, compliance matrix, backup/DR plan, resource inventory, documentation index.
  • Skills auto-loadedazure-defaults, azure-artifacts, drawio, python-diagrams, context-management (Mode A compression).
  • Instructions activatedazure-artifacts, drawio, markdown-docs (for any docs-site copy).
  • Data sources — live Azure Resource Manager (resource inventory); sku-manifest for bidirectional drift detection.
  • apex-recallcheckpoint per substep; complete-step 7.
  • Artifacts07-design-document.md, 07-operations-runbook.md, 07-ab-cost-estimate.md, 07-compliance-matrix.md, 07-backup-dr-plan.md, 07-resource-inventory.md, 07-documentation-index.md; final sku-manifest mutation captures drift.
  • Challenger review — none.
  • Gate & approval — no gate; documentation set is the terminal artifact.
  • Hooks on commitmarkdown-lint, artifact-validation, sku-manifest-render.
  • Common failures — drift between planned and deployed SKUs; bubbled into lessons-learned for the next run.

The Orchestrator follows the lesson-collection protocol throughout the run (not just at the end). Triggers: challenger must_fix, user rejection, subagent NEEDS_REVISION, Azure Policy violation surfaced at what-if, explicit user concern. Each trigger appends one lesson to 09-lessons-learned.json; the markdown twin renders at workflow completion.

The diagram below collapses every step into a single orchestration view showing agent → subagent → gate → artifact lanes plus the shared context surfaces (skills, apex-recall, registries, lessons store) and the challenger lane.

APEX end-to-end orchestration — agents, subagents, gates, and shared context surfaces across Steps 1 to 7

Per-stage routing details (gate preconditions, return edges) stay inline as Mermaid in each stage section above — this diagram is the spatial overview.

09-lessons-learned.json is initialised at Orchestrator init, appended throughout execution by the lesson-collection triggers, and queried by the next project’s Orchestrator at its own init. Findings and decisions recorded in apex-recall reinforce the lessons store — the two stores are complementary, not redundant.

Lessons-learned feedback loop — initialisation, append-during-run, and seeding the next run's Orchestrator init

The tools/scripts/lessons-to-checklists.mjs script (npm run report:challenger-gaps) distils recurring lessons into candidate hardening for the challenger lenses themselves — the loop also closes back into the reviewer.

The schema lives in tools/schemas/lesson-log.schema.json. The entry below is fabricated for illustration only and never appeared in a real run:

agent-output/{project}/09-lessons-learned.json (illustrative)
{
"workflow_mode": "production",
"project": "{project}",
"lessons": [
{
"step": 4,
"phase": "phase_3_module_selection",
"category": "factual-accuracy",
"trigger": "challenger must_fix",
"observation": "Planner pinned avm/res/storage/storage-account at a version that lacked the requireInfrastructureEncryption flag required by an inherited deny policy.",
"root_cause": "AVM module-index lifecycle was Available but the version chosen predated the policy property.",
"action": "Move policy-property-map.json check earlier in Phase 2, before module pinning.",
"telemetry": { "iterations": 2, "wall_time_min": 18 }
}
]
}

APEX assumes that an Azure Landing Zone (ALZ) is already deployed. ALZ provides the platform-level guardrails — management group hierarchy, Azure Policy assignments, RBAC role definitions, connectivity (hub-spoke or Virtual WAN), and identity — that APEX consumes rather than recreates. Understanding this boundary is critical: APEX operates inside the landing zone, not instead of it.

In APEX, “greenfield” refers to a net-new application or workload project — the application code, its IaC, and the surrounding pipeline artifacts are being created from scratch. It says nothing about the target Azure environment, which may be a mature ALZ tenant with strict inherited policies, a partially configured subscription, or a freshly minted empty subscription.

This matters because two unrelated concepts often get collapsed under the same word:

  • Application-greenfield (APEX sense) — no prior app code, no prior IaC for this workload. APEX is designed for this case, and the “greenfield CAF tag fallback” in azure-defaults/references/tag-strategy.md uses this sense.
  • Environment-greenfield — no ALZ, no inherited policy, an empty subscription. APEX handles this separately via the no-ALZ fallback documented in When there is no landing zone below.

The two cases are independent: an application-greenfield project can land in a mature ALZ, and an environment-greenfield subscription can host a brownfield migration. APEX assumes ALZ is present by default; the no-ALZ fallback is a documented exception, not the norm.

ALZ layerWhat it gives APEX
Management groupsInheritance scope for Azure Policy. The governance-policy-baseline workflow crawls this hierarchy at Step 3.5.
Azure PolicyDeny/audit/DINE rules for security, tagging, allowed regions, allowed SKUs. APEX discovers these live and encodes them into 04-governance-constraints.json.
ConnectivityHub VNet or Virtual WAN hub with ExpressRoute/VPN gateways, Azure Firewall or NVA, and centralized Private DNS Zones.
IdentityEntra ID tenant, privileged identity governance, break-glass accounts. APEX assumes Managed Identity and Entra-only auth.
LoggingCentral Log Analytics workspace + Defender for Cloud assignment. APEX references the existing workspace rather than creating per-workload workspaces.
Diagnostics & monitoringThe Log Analytics workspace resource ID is surfaced through governance discovery and wired into every Azure resource via the platform’s diagnostic-settings policy/module. APEX consumes the ID — it never provisions one.

This matrix maps each ALZ layer to the APEX step that reads it and the artifact or decision key where the value lands.

ALZ layerAPEX step that consumes itWhere it lands
Management groupsStep 3.5 (04g-Governance)04-governance-constraints.json (discovery_metadata.scope)
Azure PolicyStep 3.5 + Step 404-governance-constraints.{md,json}; Step 4 Governance Compliance Matrix
Connectivity (hub-spoke / vWAN)Step 2 Phase 6b + Step 4decisions.vnet_mode, decisions.existing_vnet_id; Step 4 plan
IdentityStep 4 + Step 5decisions.identity_model; least-privilege role assignments in IaC
LoggingStep 5 (CodeGen)Diagnostic-settings module wiring per resource
Diagnostics workspaceStep 5 + Step 7Diagnostic-settings module ID; Step 7 compliance matrix

How ALZ guardrails accelerate and de-risk APEX

Section titled “How ALZ guardrails accelerate and de-risk APEX”
AcceleratorHow ALZ provides itHow APEX consumes it
Pre-populated governancePolicy assignments inherited from management-group scopes.Step 3.5 (04g-Governance) discovers required tags, denied public endpoints, mandatory encryption directly — challenger review becomes a reconciliation pass against known facts, not a speculative audit.
Overlapping security baselineTenant-wide Azure Policy enforces TLS, public-access denials, and stricter rules (e.g. deny public network access on all PaaS).APEX’s non-negotiable baseline (TLS 1.2+, HTTPS-only, no public blob, Managed Identity) is a subset; when ALZ is stricter, Step 3.5 captures the stricter rule and the IaC Planner honours it at Step 4.
Known network boundariesAddress spaces, peering topology, DNS resolution, firewall rules pre-established.The VNet planning gate (Architect Phase 6b) slots a spoke into the existing topology instead of designing one from scratch — selected via decisions.vnet_mode = use-existing.
Scoped RBACPlatform team pre-assigns roles at subscription / resource-group scope.APEX records decisions.identity_model and generates least-privilege role assignments that fit within the existing RBAC structure, validated by the azure-rbac skill.
Compounding cost governanceSubscription-level budget alerts + cost-management policies.APEX’s per-project cost-monitoring baseline (Wave 4 of CodeGen) stacks beneath ALZ: ALZ catches subscription-wide anomalies, APEX catches project-level overruns.

If the target subscription has no inherited policies (the governance-policy-baseline workflow returns an empty envelope), APEX falls back to the no-ALZ defaults documented in azure-defaults: lowercase 4-tag set, swedencentral region, and the full non-negotiable security baseline. The challenger review flags the absence of inherited guardrails as an informational finding so the team is aware they are operating without platform-level safety nets. This is the environment sense of “greenfield” — see What greenfield means in APEX for the disambiguation. The no-ALZ fallback is orthogonal to whether the application itself is new.

APEX’s VNet planning gate — triggered at Architect Phase 6b — handles the two scenarios that arise in an ALZ environment: bring your own VNet (spoke already provisioned by the platform team) or create a new VNet (spoke provisioned by APEX inside an application landing zone subscription).

The user chooses via decisions.vnet_mode:

ModeWhen to useWhat APEX does
use-existingPlatform team pre-provisions spoke VNets, peering, UDRs, and NSGs centrally. Common in regulated environments.Validates the VNet exists (az network vnet show), imports its address space, and plans subnets within the existing CIDR. IaC code references the VNet by resource ID — it does not create or modify it.
create-newApplication teams own their spoke lifecycle, or the workload lands in a dedicated subscription with no pre-provisioned network.Generates a full VNet module (AVM-first), with subnets sized per the workload’s SKU-aware subnet matrix. The Planner wires peering to the hub if the architecture calls for it.

The choice is captured by apex-recall decide --key vnet_mode and flows to the IaC Planner (Step 4) and CodeGen (Step 5). When vnet_mode = use-existing, CodeGen emits a data source (Terraform) or an existing resource reference (Bicep) — never a create block for the VNet itself.

Decision capture. The VNet planning gate persists registered apex-recall keys plus two proposed ones that describe behaviour the planner already exhibits:

  • decisions.vnet_modecreate-new or use-existing (registered).
  • decisions.existing_vnet_id — required when vnet_mode = use-existing (registered).
  • decisions.identity_modelmanaged-identity (default) or one of the alternatives (registered).
  • decisions.hub_topologyhub-spoke or virtual-wan (proposed, not yet in decision-keys.md).
  • decisions.dns_zone_strategycentral-reference / spoke-create / escalate (proposed, not yet in decision-keys.md).

Registering the two proposed keys in decision-keys.md and the apex-recall CLI validator is deferred to a follow-up PR so this documentation change ships standalone.

Both ALZ connectivity models — hub-spoke (Azure Firewall / NVA in a hub VNet) and Virtual WAN (Microsoft-managed hub with integrated routing) — are supported. APEX does not provision the hub or the WAN itself; it provisions the spoke and assumes connectivity to the hub is established via peering (hub-spoke) or a Virtual WAN VNet connection (vWAN). The subnet plan produced at Phase 6b accounts for hub-side constraints such as forced-tunnel UDRs and NSG rules inherited from ALZ policy. The canonical subnet sizing matrix and the two-step existing-VNet validation live in azure-defaults/references/vnet-planning.md.

Private DNS Zones — enumeration and reuse

Section titled “Private DNS Zones — enumeration and reuse”

In a well-architected ALZ, Private DNS Zones live centrally — typically in a connectivity subscription or a shared-services resource group — and are linked to the hub VNet (or the Virtual WAN hub’s DNS proxy). When a spoke workload creates a private endpoint, it registers an A record in the appropriate zone (e.g. privatelink.vaultcore.azure.net for Key Vault).

APEX handles DNS zone resolution in a pattern analogous to governance policy discovery:

  1. Enumeration. A scheduled GitHub Actions workflow (following the same pattern as the governance-policy-baseline workflow) can query the target subscription and connected scopes for existing Private DNS Zones via az network private-dns zone list. The result — a JSON inventory of zone names, resource IDs, and VNet links — is committed to .github/data/ as a checked-in baseline, just as governance-policy-baseline.json captures policy state.

    The proposed inventory file would parallel governance-policy-baseline.json in shape (filename private-dns-zone-baseline.json is proposed — the file is not committed today):

    {
    "discovery_metadata": {
    "discovered_at": "2026-05-21T12:00:00Z",
    "scope": "/subscriptions/<connectivity-sub-id>",
    "source": "live"
    },
    "zones": [
    {
    "name": "privatelink.vaultcore.azure.net",
    "resource_id": "/subscriptions/.../privateDnsZones/privatelink.vaultcore.azure.net",
    "subscription_id": "<connectivity-sub-id>",
    "vnet_links": [{ "vnet_id": "/subscriptions/.../virtualNetworks/hub-vnet", "registration_enabled": false }],
    "linked_services": ["keyvault"]
    }
    ]
    }
  2. Decision at plan time. When the IaC Planner encounters a service that requires a private endpoint, it checks whether the corresponding privatelink.* zone already exists in the enumerated inventory:

    • Zone exists centrally — the plan references the zone by resource ID and creates only the DNS record group (A record) for the private endpoint. No new zone is created.
    • Zone does not exist and policy allows creation — the plan includes a new Private DNS Zone resource (AVM module) linked to the spoke VNet. This is the common path for greenfield environments or when a particular privatelink.* zone is not yet provisioned centrally.
    • Zone does not exist and policy denies creation — the Step 3.5 governance constraints (or a live deny policy on Microsoft.Network/privateDnsZones) block the creation. The planner raises a must_fix finding that routes back to the Architect (or to the platform team for manual provisioning).
  3. VNet link wiring. When using a centrally managed zone, the plan creates a VNet link from the zone to the spoke VNet (if one does not already exist). When creating a new zone, the link is part of the same module.

flowchart TD
    PE[Private Endpoint needed] --> CHK{Zone in inventory?}
    CHK -->|Yes| REF[Reference existing zone by ID]
    CHK -->|No| POL{Policy allows creation?}
    POL -->|Yes| NEW[Create zone + VNet link]
    POL -->|No| BLK[must_fix → platform team]
    REF --> LNK{VNet link exists?}
    LNK -->|Yes| REC[Create A record only]
    LNK -->|No| ADDLNK[Add VNet link + A record]

This three-way branch ensures that APEX never duplicates a centrally managed DNS zone (which would break resolution), never violates a deny policy, and always degrades gracefully to a human escalation when the platform team needs to act.

Appendix A — Artifact contract reference

Section titled “Appendix A — Artifact contract reference”

The full H2 templates for every agent-output/ artifact live in azure-artifacts/SKILL.md and its templates/ folder. The SKU manifest contract lives in sku-manifest.instructions.md; the governance JSON shape is documented inside governance-discovery.instructions.md. This page deliberately links rather than duplicates.

StepAlways-loaded skillsOn-demand skills
1azure-defaults, azure-artifactsmicrosoft-docs
2azure-defaults, azure-artifacts, context-managementmicrosoft-docs, azure-compute, azure-storage
3azure-defaults, azure-artifacts, azure-adrdrawio or python-diagrams
3.5azure-defaults, azure-artifacts, azure-governance-discovery, iac-commonmicrosoft-docs
4azure-defaults, azure-artifacts, iac-common, python-diagrams, track-specific patternsmicrosoft-docs, azure-rbac
5azure-defaults, azure-artifacts, track-specific patterns, iac-common, context-managementazure-rbac, entra-app-registration
6azure-defaults, azure-artifacts, iac-commonazure-quotas, azure-validate, azure-deploy
7azure-defaults, azure-artifacts, drawio, python-diagrams, context-managementazure-resources, azure-compliance

Appendix C — Instruction ↔ trigger matrix

Section titled “Appendix C — Instruction ↔ trigger matrix”
InstructionapplyTo globEffective at step
agent-operating-frame.github/agents/*.agent.mdAll
azure-artifacts**/agent-output/**/*.md1–7
sku-manifest**/sku-manifest.{md,json}1, 2, 3.5, 4, 6, 7
governance-discovery**/04-governance-constraints.{md,json}3.5
iac-plan-best-practices**/04-implementation-plan.md4
iac-bicep-best-practices**/*.bicep5b, 6b
iac-terraform-best-practices**/*.tf5t, 6t
azure-yaml**/azure.yaml5, 6
drawio**/*.drawio3, 7
lesson-collection**/*orchestrator*.agent.mdThroughout
no-interactive-shellchat-loaded agent/skill/instruction filesAuthoring only
no-hardcoded-countsrepo-wide markdown + scriptsAuthoring only
markdown-docssite/src/content/docs/**, docs/**Doc authoring

Terse pointers only — full definitions live in the linked concept docs.

TermSee
Challenger / lensWorkflow Engine & Quality
GateWorkflow Engine & Quality
Fan-outAgent Architecture
Frozen inputsworkflow-graph.json plan_lock block (linked above)
L0–L3 attestationworkflow-graph.json attestation_chain
Skill tiersSkills & Instructions