Submission Lifecycle & AI Review
A submission moves through a state machine driven by the five admin-MCP tools. This page documents the states, the pipeline, and the AI-review rubric that decides whether a submission autopublishes.
State machine
Every partner submission has a state (column in partner_submissions) that the pipeline transitions through:
| State | Meaning |
|---|---|
draft | The submission exists and is editable. Created by admin_submit_skill / admin_submit_workflow. Re-running the same submit tool with the same submissionId upserts in place. |
submitted | Transient. admin_request_review has been accepted and the gate is being scheduled. Most partners will not see this state. |
in_review | The gate pipeline is executing (Stage-1 lint, sandbox, AI review). |
approved | All gates passed and aiReview.verdict === "pass". The artifact has been materialized as an agent_blueprints row with partnerOwned = true and is live in the catalog. |
rejected | At least one gate failed, or the AI-review verdict was not pass. The submission is editable again — make changes, then call admin_request_review to retry. |
Happy-path pipeline
Most submissions go through this sequence:
You can skip admin_validate_submission and admin_test_submission and go straight from submit to admin_request_review — the gate composes all three checks internally. The standalone tools exist so you can iterate faster (Stage-1 lint without paying for a sandbox run, sandbox run without committing to the AI review).
AI-review rubric
The AI reviewer evaluates every submission against four criteria. Each criterion can produce one or more findings; each finding has a severity of error, warning, or info. The reviewer's overall verdict is computed as a floor:
| Severity present | Verdict |
|---|---|
Any error finding | fail |
No error, at least one warning finding | warnings |
All info or no findings | pass |
If the reviewer's declared verdict is more lenient than the floor implied by its findings, the floor wins. The verdict cannot be softened below what the findings demand.
rubric_prompt_injection
The reviewer flags content that attempts to manipulate downstream agents or coerce model behavior outside the artifact's stated purpose.
- What triggers it: Markdown bodies or prompt bodies that include instructions targeting the model itself — phrases like "ignore previous instructions," "disregard the system prompt," "act as if you are X," or hidden directives embedded inside tool descriptions.
- How to fix: Strip imperative directives that target the LLM rather than the user. Phrase examples as what the user will see, not what the model should do. Keep prompt-engineering scaffolding in
promptBody(workflows) — the rubric is more permissive there because the agent runs the prompt. - Typical severity:
errorwhen the injection is overt or could escape the artifact's sandbox;warningwhen ambiguous.
rubric_off_topic
The reviewer flags content that doesn't match the artifact's stated name, description, and heroCopy.
- What triggers it: A skill called "Competitor lookup" whose markdown body actually describes how to file expense reports; a workflow whose
descriptionpromises a buying-committee briefing but whosepromptBodyproduces marketing emails. - How to fix: Align the title, description, hero copy, and body. If the artifact has evolved, update the metadata to match the body — or split it into two submissions.
- Typical severity:
warningwhen partially off-topic;errorwhen wholly misaligned.
rubric_security
The reviewer flags content that could enable abuse — leaking secrets, exfiltrating data, or facilitating account takeover.
- What triggers it: Skills that ask agents to dump environment variables; workflows that fetch a URL based on user input without validation; descriptions that reference credentials, tokens, or PII in ways that imply storage or transmission.
- How to fix: Remove any instruction that would cause an agent to disclose, persist, or transmit credentials. Constrain user-controlled URLs to a known allowlist. Don't reference real customer PII in marketing copy.
- Typical severity:
erroris the default — security findings rarely de-escalate towarning.
rubric_brand_alignment
The reviewer flags content that conflicts with the host platform's positioning or includes inappropriate competitor references.
- What triggers it: Promotional copy that disparages other vendors, sample outputs that name competitor products as recommendations, profanity, or tone that doesn't match a B2B SaaS context.
- How to fix: Edit the description, hero copy, and
marketingUseCasesto remove competitor names, soften comparative claims, and keep the tone professional. Reference your own product favorably; don't reference competitors at all. - Typical severity:
warning— most brand-alignment findings can be addressed with a copy edit.
The autopublish gate
A submission autopublishes if and only if admin_request_review's top-level status is approved, which happens when all three of the following hold:
- Stage-1 lint passes —
gate.stage1.status === "pass". - Sandbox run succeeds —
gate.sandbox.status === "succeeded". - AI review returns
pass—gate.aiReview.status === "completed"ANDgate.aiReview.verdict === "pass".
At launch, aiReview.verdict === "warnings" does not autopublish. The gate returns status: "rejected" with rejectionReason describing the warnings so you can address them and resubmit.
An operator-side advisory mode lever exists for periods when Phoenix wants every non-failing submission to land in front of a human reviewer (e.g. during a rubric tuning window). When advisory mode is on, every pass, warnings, or queued outcome is routed to status: "in_review" instead of approved/rejected; fail still rejects. The lever is not partner-controllable, but its activation is visible: when advisory mode is the reason a run is held, gate.aiReview.advisoryMode: true appears on the wire. If you see that flag, your submission is queued for human review — poll admin_request_review to get the final decision once the reviewer acts.
A rejected submission stays editable — re-run admin_submit_* with the same submissionId, then admin_request_review again. There is no penalty for repeated submissions other than rate limits (Troubleshooting → 429).
When admin_request_review returns in_review
A small fraction of submissions don't get a verdict on the first synchronous call. When that happens, admin_request_review returns status: "in_review" (rather than approved or rejected) and gate.aiReview.runId is the handle Phoenix uses to track the pending AI-review run.
Two reasons drive this state:
- AI-review queue — the synchronous AI-review call timed out, so the run is finishing in the background. The gate will eventually resolve to
approvedorrejectedonce the run completes.gate.aiReview.statusisqueuedandgate.aiReview.advisoryModeis absent. - Advisory mode — Phoenix has flipped advisory mode for the submission pipeline; every non-
failoutcome is held for a human reviewer.gate.aiReview.advisoryModeistrueon the wire. The accompanyinggate.aiReview.statusiscompletedwhen the reviewer returnedpassorwarnings, orqueuedwhen the sync run was queued and advisory mode was on (both conditions can coincide).
In both cases, poll by re-calling admin_request_review against the same submissionId. The response will flip to approved or rejected once the background work concludes (or the human reviewer acts).
Separately, Phoenix monitors a false-approval-rate (FAR) signal as a post-hoc dashboard metric — it does not influence individual gate decisions. Partners cannot observe it from the API.