A blunt outside review of the Codex53 / Codex54 / Codex55 main traders and the CodexF / Codex54F / Codex55F fade lanes. Verified against runtime logs, Vanta execution logs, configs, and trade-diary outputs. No code edits, no service restarts — analysis only.
The current Codex pipeline is not a trading agent. It is a four-layer veto stack with an LLM at the top emitting
mostly-FLAT opinions, a stalker that only sees pre-filtered near-price ideas, a suspicion gate that fires hard
vetoes any time the model defaults to phase="unclear", and a Vanta layer that re-litigates everything
with another participation gate and a 2–3 point staleness guard. The pre-May-5 version was a money-losing
signal-spammer at a ~37% paper win rate. The post-May-5 version replaced it with a money-saving silence machine.
Neither is the “cunning directional trader” the design called for.
Functional failure against the stated goal. The pipeline is partially viable for risk control and not viable for opportunity capture. It avoids the prior loss rate by simply not trading. The architecture is overweight on serial vetoes, underweight on directional planning, and the diary feedback loop only knows about kills — not about kills-that-should-not-have-been-kills. Continuing with this stack and merely “loosening filters” will not fix the conceptual problem.
| Lane | Runtime lines | First event | Last event |
|---|---|---|---|
| Codex53 trader | 182,329 | 2026-04-10 | 2026-05-15 11:21 |
| Codex54 trader | 189,279 | 2026-04-05 | 2026-05-15 11:21 |
| Codex55 trader | 26,953 | 2026-04-23 | 2026-05-15 11:21 |
| Vanta Codex53 | 122,534 | — | 2026-05-15 |
| Vanta Codex54 | 79,176 | — | 2026-05-15 |
| Vanta Codex55 | 13,839 | — | 2026-05-15 |
| Vanta CodexF | 73 | — | 2026-05-15 |
| Vanta Codex54F | 53 | — | 2026-05-15 |
| Vanta Codex55F | 21 | — | 2026-05-15 |
| Lane | Total decisions | FLAT % | LONG % | SHORT % |
|---|---|---|---|---|
| Codex53 | 90,140 | 90.6% | 5.3% | 4.1% |
| Codex54 | 88,358 | 90.6% | 4.9% | 4.5% |
| Codex55 | 8,128 | 89.6% | 4.7% | 5.8% |
Roughly 9% of polls produce any directional opinion. The prompt biases the model toward FLAT and tells it the diary lesson is that “persuasive entries tagged moved_too_soon have been toxic.” Recent diary memory after a losing paper run makes FLAT the safer answer for the model on every poll.
day armed trig susp badloc sigval_fail ati_sub fills 2026-05-08 3 3 0 3 1 1 2 2026-05-11 49 18 0 381 260 10 10 2026-05-12 164 54 0 663 381 26 26 2026-05-13 138 41 14 911 385 21 21 2026-05-14 223 7 86 1095 398 4 4 2026-05-15 126 2 14 463 415 0 0
day armed trig susp badloc sigval_fail ati_sub fills 2026-05-08 37 14 0 195 81 10 19 2026-05-11 74 26 0 926 141 18 26 2026-05-12 80 35 0 585 132 12 12 2026-05-13 75 29 3 817 127 13 13 2026-05-14 109 4 34 835 163 1 1 2026-05-15 56 3 14 360 122 1 1
day armed trig susp badloc sigval_fail ati_sub fills 2026-05-11 31 3 0 138 60 2 2 2026-05-12 48 5 0 171 69 2 2 2026-05-13 51 11 1 229 88 4 4 2026-05-14 35 0 4 135 47 0 0 2026-05-15 18 0 0 79 43 0 0
Theses arm at 100–200/day; only 0.5–4% reach a stalker trigger. Codex55 is a smaller-volume clone of Codex54 and is producing 0 fills on the last two days. CodexF (the fade for Codex53) cannot be fed from Codex55 because Codex55 produces almost no qualifying suspicion-blocks.
| Filter stage | Count | What kills it |
|---|---|---|
| LLM FLAT | 15,784 / 20,626 | 76.5% of polls; the model declines |
| Stalker bad-location (logged each poll) | 3,516 | Thesis alive but price not at value |
signal_validation_failed total | 1,917 | entry_too_far (1,074), generic_structure_not_enough (426), counter_bias (266), remembered_resistance/support (113), rr_below_min (38) |
entry_stalk_suspicion_blocked | 114 | Hard vetoes from suspicion gate |
ny_no_entry_cutoff_standby | 1,312 | NY cutoff 15:30–20:00 ET |
Vanta participation_gate:decision_not_allow:BLOCK | 1,053 | ATLAS decision packet says don’t trade |
Vanta codex_execution_guard_blocked at ≤3 pts | 21 | Price drifted from planned entry |
Vanta stale_source_signal | 15 | Source signal older than 45s |
Reached ati_order_submitted | ~86 | — |
00:06:41 SHORT 29526.75/29528.75→29517.0 risk=2 reward=9.75 RR=4.9
vetoes=[unclear_setup_failed_cross_exam, no_defended_trade_location]
00:43:13 LONG 29578.25/29576.25→29588.75 risk=2 reward=10.5 RR=5.25
vetoes=[unclear_setup_failed_cross_exam, no_defended_trade_location]
00:50:44 LONG 29583.5 /29581.5 →29593.0 risk=2 reward=9.5 RR=4.75
vetoes=[no_defended_trade_location]
00:55:20 SHORT 29579.75/29581.75→29568.75 risk=2 reward=11.0 RR=5.5
vetoes=[unclear_neutral_middle_requires_hard_evidence,
generic_middle_without_extra_confirmation,
unclear_setup_failed_cross_exam,
no_defended_trade_location]
These have ≥4.5 RR with tight 2-pt stops at the band/value zone. They were killed not because the geometry was bad but because the LLM defaulted phase to “unclear” and didn’t volunteer enough free-form trap-language or preferred-structure tags. The gate is taxing the vocabulary of the LLM, not the structure of the setup.
codex_fade_router.jsonl since 2026-05-14 20:35 startup: 2 startups, 12 fade signals published, 32 rejected.setup_location_stale (drift > 3 pt by the time the router built the fade), 15 non_fade_suspicion_block (block was qualified but had no listed “qualifying veto”), 1 session_fade_limit, 1 missing_geometry.| Analyst | Closed trades | Target | Stop | Loss-exit | Profit-exit | Win rate |
|---|---|---|---|---|---|---|
| OpenClaw Codex (53) | 465 | 154 | 203 | 86 | 21 | 37.7% |
| OpenClaw Codex54 | 355 | 111 | 139 | 87 | 17 | 36.2% |
| OpenClaw Codex55 | 18 | 0 | 1 | 15 | 2 | 11.1% |
Default MIN_RR = 1.15 with an RR ladder of 1.25–2.25. With this ladder a 37% win rate is net
losing in any honest reckoning. The skepticism layer added in early May was a rational response to a real loss
record — it then went too far the other way.
The diary additionally has 487 trade / published_not_executed records, 336 trade / blocked,
98 thesis / skipped, and 26 thesis / blocked. None of these have
post-hoc R-multiple reconstruction — the diary never asks “what would this rejected or unexecuted
candidate have done if taken?”
LLM call (90% FLAT)
→ If LONG/SHORT: _validate_trade
reject if |entry - price| > 25 (kills planned distant entries)
reject if RR < 1.15
reject if counter_bias
reject on location_block_reason (remembered_resistance/support,
generic_structure_not_enough)
→ Arm value-entry thesis (TTL 1200s)
→ Stalker:
wait for trigger (lower-band rejection, middle-band pullback,
BB-structure boundary, band-pressure reclaim, etc.)
if triggered, run suspicion gate
hard_vetoes set if phase == "unclear" without explicit
cross-exam survival
score ≥ 4.0 of 5 pillars required AND zero hard_vetoes
if pass, _validate_trade again
if pass, per_bot_session_gate.check
publish signal
→ Vanta polls signal file:
participation_gate decision_not_allow → skip
max_source_age 45s → skip if stale
level_confluence 15pt → skip if missing
codex_execution_guard 2pt deviation → skip if drifted
require_consecutive_same_signal = 3
min_seconds_between_entries = 600s
finally submit ATI order
A defensive veto chain, not a trading agent. There are at least eight independent reasons a trade can be killed (model FLAT, main-path validation, location discipline, stalker bad-location, suspicion gate, session gate, participation gate, execution-guard staleness). Any one can kill any candidate. Probability of all eight approving a single setup is, in practice, a few times per day on the best days and zero on bad days.
A directional planner: pick a side per market regime, build a named entry zone (e.g. lower band + EMA20
+ Asian high pivot 29 580), wait there, and if price reaches it intact, execute. The LLM should do the thesis
and zone-naming, deterministic code should do the waiting and triggering, and a single post-mortem layer should
ask “did the kill prove right?” Today the LLM does the thesis and each tick gets a new
chance to be vetoed. The stalker is not stalking — it is reacting to whatever near-price candidate
_validate_trade lets through.
_validate_trade runs before thesis-arming. entry_too_far > 25pt
kills the LLM’s planned-entry intent. This contradicts the thesis-arming prompt that explicitly tells
the model to plan distant entries. The prompt says yes; the code says no.
unclear_setup_failed_cross_exam
and no_defended_trade_location fire whenever the LLM doesn’t explicitly volunteer
trap-language or preferred-structure tags. Sound geometric setups with 5+ RR get killed because the model
wrote phase="unclear".
codex_max_entry_deviation_points: 2.0 is too tight for
the median NQ minute range. Combined with the fade router’s 3-point MAX_SETUP_DRIFT_POINTS,
signals that pass everything else commonly fail in the 5–30s between publish and ATI.
participation_gate is the most prolific kill (1,053 + 1,202 + 410 since
May 8). Most of those are against standby packets, but the volume tells us upstream and downstream gates have
near-redundant rejection logic.
unclear_setup_failed_cross_exam) is not in QUALIFYING_VETOES, so the majority of
rejections never reach the fade lane. Codex55F has produced zero fills since startup; source-side rules
starve it.
Hybrid LLM planner + deterministic execution watcher with mandatory shadow accounting.
_validate_trade running before thesis-arming. Move validation to the moment
the stalker triggers, not the moment the LLM emits.codex_max_entry_deviation_points: 2.0 as a hard reject. Replace with a
slippage adjustment or widen to 5–8 pts.require_consecutive_same_signal: 3 (Vanta) plus
REQUIRE_CONSECUTIVE_POLLS: 1 (trader) plus min_seconds_between_entries: 600s —
pick one.was_correct_kill = bool populated by
the replay layer.QUALIFYING_VETOES to match the
actually-most-common vetoes, or run fade routers in shadow-only until source quality improves.A new architecture is justified only if all of these are true:
If any of (1)–(4) cannot be met in design, the next version is the wrong next version.
published_not_executed and blocked candidates against bar history for
2026-05-01 → 2026-05-15. Compute realized R-multiple per candidate.entry_stalk_rejected_bad_location while a thesis is armed: how often did price
reach the planned entry within TTL but fail the stalker’s extra bounce/boundary requirements?participation_gate decisions. Determine how many BLOCKs were against real entry
signals vs standby packets.The system is currently optimizing for “avoid the last loss type” rather than “execute the next good idea.” A 90% FLAT rate from a model whose prompt-memory tells it “your aggressive entries were toxic” combined with eight independent vetoes is not a trading agent. Fix the asymmetry — measure what the gates kill, not just that they fired — before any more code changes.