Editor's note (updated 2026-05-12): This editorial has been revised to reflect Season 1 scope refinement. XAUUSD and USDJPY were removed from the experiment universe; the numbers and references below reflect the updated 4-instrument scope (NAS100, US30, US500, EURUSD). Original publication date preserved.
AI Trading Benchmark Week 1 Results: Claude vs GPT-5.4 (April 13-17, 2026)
Claude closed +$958.51 on 7 trades (57.1% win rate). GPT-5.4 took zero trades after the season scope narrowed to FX majors and US indexes.
Key Findings
- Claude Opus 4.6 closed Week 1 at $50,958.51 — up 1.92% — after 7 trades at a 57.1% win rate and +1.23R net.
- GPT-5.4 closed Week 1 at $50,000.00 — flat. The model's only attempted entries were in instruments later removed from the experiment scope; after the methodology refinement, GPT's Week 1 ledger is empty.
- Friday April 17 was a split session for Claude. Three trades: EURUSD long stopped at -$1,078.92, US30 long banked +$1,256.60 at TP1, and the day netted +$177.68 on the back of the US30 trade alone.
- Biggest dollar print of the week: Claude's US30 long on April 17 (+1.24R, +$1,256.60). Biggest loss: Claude's EURUSD long on April 14 (-1.00R, -$1,182.20).
Season Scorecard
- Win Rate
- 57.1%
- Season R
- +1.2R
- Net P&L
- +$959
- Trades
- 7
- Win Rate
- 0.0%
- Season R
- +0.0R
- Net P&L
- +$0
- Trades
- 0
The Week in Macro
Week 1 of the AI Trading Benchmark coincided with a week of disinflationary surprises and a weakening dollar. The single dominant event was Tuesday's PPI release. Core PPI printed 0.1% against a 0.4% forecast; headline PPI came in at -0.5% against a 1.1% forecast. Neither is a mild miss. The combination reset Fed-tightening expectations across the curve.
The dollar responded immediately. DXY traded down to 98.06 intraday — its weakest level in the current sequence — and closed the week below 98.20. That is a meaningful move off the prior week's 99.30 zone. US 10-year yields followed the dollar lower, dropping to 4.27% and closing below the 5-day EMA of 4.296%. For risk-sensitive longs — index futures and EURUSD specifically — the two most important macro inputs moved in alignment.
VIX spent most of the week between 18 and 20, normalizing from prior-week readings above 21. The combination of falling volatility and falling yields is unusual outside of explicit central-bank accommodation; it implies the bond market is pricing easier conditions while equities are comfortable enough to stop paying the safe-haven premium.
Friday was different. By April 17, the market had digested the dovish repricing, and the trades that worked on Monday through Thursday began to reverse. EURUSD, which had grinded higher on dollar weakness, reversed. NAS100 weakened into the weekend. This is the backdrop against which Claude took its Friday trades — and the backdrop explains, at least in part, why the model had a difficult session that day.
The next macro beat sits in Week 2: CPI Wednesday, retail sales Thursday, and the FOMC minutes released late in the week. A CPI that confirms Tuesday's PPI print would extend the dollar's slide. A surprise in either direction could invalidate some of the trades that worked this week.
Equity Curve
Apr 13 — Apr 17
Head-to-Head
| Metric | Claude | GPT |
|---|---|---|
| Trades | 7 | 0 |
| Wins | 4 | 0 |
| Losses | 3 | 0 |
| Win Rate | 57.1% | — |
| Net R | +1.2R | 0.0R |
| Net P&L | +$959 | $0 |
| Biggest Win | +$1,257 | — |
| Biggest Loss | -$1,182 | — |
| Peak Balance | $53,026 | $50,000 |
| Trough Balance | $49,702 | $50,000 |
Claude's Week
Claude Opus 4.6 took seven trades this week. Three on Monday, one on Tuesday, none on Wednesday, one on Thursday, two on Friday. That distribution tells you most of what you need to know about Claude's trading style in Week 1: it engages when it sees a setup and sits on its hands when it doesn't.
The Monday sprint (+$711.85 NAS100, +$1,106.80 US30, +$1,207.58 EURUSD) was the strongest day any model has posted on this benchmark so far. All three trades were long instruments that benefited from the dovish macro repricing that carried over from the prior Friday. None of the three required extended deliberation — Claude's own decision journals show entry decisions in 2-4 evaluations each, with confidence jumping cleanly from 40-50% to 65-72% at the trigger candle. That is what pattern recognition looks like when the macro backdrop, the technical structure, and the entry trigger all converge.
Tuesday was a single-trade day and it hit a stop. The EURUSD long ran into session resistance without a pullback and got taken out for -1.00R (-$1,182.20). That print was the week's biggest dollar loss and the only trade through Tuesday's close that did not work for Claude.
Thursday's NAS100 short was Claude's only truly contrarian position of the week. With VIX normalizing and equity indices extending, a short on NAS100 required conviction that the prior week's risk-on tone had exhausted. The trade hit stop for -1.00R (-$1,063.20). That is the cost of taking a trade against the prevailing tone — Claude recognized the setup but could not override the trend.
Friday's two trades (-$1,078.92 EURUSD long, +$1,256.60 US30 long) were a weaker version of Monday's sprint. The same bias (long risk-sensitive instruments into a falling-dollar backdrop) worked on one trade and failed on the other. The US30 long was the week's single biggest dollar winner. The EURUSD long ran into the Friday reversal that The Week in Macro describes.
The net effect: Claude held a small positive on the week. The Monday cushion plus the Friday US30 print absorbed the EURUSD and NAS100 stops. The 57.1% win rate alone does not explain the +$959 outcome; the R-distribution does. Claude's winners averaged +1.05R. Its losers averaged -1.00R — clean stop-outs at the structural level. That is the profile of a model with structural discipline in stops and opportunistic allocation into winners.
GPT's Week
GPT-5.4's Week 1 ledger is empty. The model evaluated and entered positions during the week, but every one of GPT's Week 1 attempted trades was in an instrument that has since been removed from the Season 1 scope. After the methodology refinement narrowed the experiment universe to NAS100, US30, US500, and EURUSD, none of GPT's Week 1 entries qualify for the benchmark ledger.
The scoreboard reflects that directly. Zero trades, zero net R, $50,000.00 closing balance — the model is exactly where it started the season. That is not a result the model produced; it is a result of the scope refinement applied retroactively across the experiment. GPT will begin posting numbers to the benchmark from Week 2 onward, when the model's first in-scope trades hit the ledger.
The rest of the season will tell us how GPT trades the four in-scope instruments. Week 1 simply does not contribute to that comparison.
Frequency is a strategy, not an accident. Claude's seven entries delivered a small positive week; GPT-5.4 has yet to put an in-scope trade on the board.
Claude vs GPT: Week 1 Head-to-Head
This is the first week of a multi-month benchmark comparing two frontier language models trading the same instruments under the same risk framework. Claude Opus 4.6 and GPT-5.4 each started Monday with $50,000 in a demo account at the same broker. The models decide what to trade, when to enter, and where to place stops and targets. The broker executes exactly what they ask. The ledger records what happened.
After five trading days, Claude is at $50,958.51 and GPT is at $50,000.00. Claude's seven in-scope trades produced +$958.51 in net P&L and +1.23R. GPT's Week 1 ledger is empty after the Season 1 scope refinement; the model's first benchmark-eligible trades will appear in Week 2.
Why Claude Closed the Week Green
Claude traded seven times across four in-scope instruments — NAS100, US30, EURUSD on the index/FX side. The Monday sprint produced three wins in a single session, building a cushion that survived the Tuesday EURUSD stop, the Thursday NAS100 short stop, and a split Friday. That is the structural advantage frequency provides: more rolls of the dice, more chances to compound a single good session.
Loss discipline was clean. Claude's three losses averaged -$1,108 per stop, each one within ~1% of the model's capital at the time of entry. There was no revenge trading, no doubled-up follow-up after a stop, no chase. The 57.1% win rate combined with -1.00R losers and +1.05R average winners produces a small positive expectancy across the week — which is exactly what the ledger shows.
The Friday Split
April 17 is where the week resolves. Claude took two trades. The EURUSD long bought into the Friday reversal and stopped for -$1,078.92. The US30 long banked +$1,256.60 at TP1 before the broader pullback arrived. Net Friday: +$177.68 — a small green print on what was a difficult session for the dovish-repricing trades that had carried the early week.
This is the session where a model's stop discipline matters more than its read. Claude's read on EURUSD was wrong relative to the Friday tape. The stop was placed structurally and the position closed for exactly its budgeted -1R. The model did not re-enter, did not flip short, did not chase the reversal. That is the behavior the benchmark is built to surface.
What the Numbers Tell Us
Claude finished +1.92% in dollar terms, +1.23R in risk-adjusted terms. GPT finished flat after the Week 1 scope removal. The week tells us very little about how the two models compare directly — only one of them has a Week 1 ledger to compare. The interesting comparisons start in Week 2, where both models post in-scope trades.
We will be watching the broader methodology documentation for any phase rule changes, and tracking how both models handle Week 2's CPI print on Wednesday.
About reported results. Each model outputs three take-profit targets (TP1, TP2, TP3) per trade. In live execution, models typically scale out at TP1 for risk management — the broker position records this as a TP1 exit. The R-multiples and dollar returns shown throughout this editorial reflect the full potential of each trade: where the market actually traveled to (the highest take-profit hit, or stop loss) before the setup was invalidated or exhausted. This lets readers see the complete arc of each setup, not just where the position was closed.
What to Watch in Week 2
Week 2 starts Monday April 20 and closes Friday April 24. Three macro events dominate the calendar: CPI on Wednesday, retail sales on Thursday, and the FOMC meeting minutes on Friday afternoon. Each has the potential to extend or invalidate the dovish repricing that defined Week 1.
A CPI print that confirms the Tuesday PPI miss would likely extend the dollar slide and keep risk-on instruments bid. A surprise upside CPI could invalidate that setup inside a single session. Either outcome is tradeable — the question is whether either model adjusts its playbook to the macro pivot or continues trading the prior week's pattern.
Instrument-level, the Week 1 scoreboard suggests two watchpoints for Week 2. First, EURUSD has now stopped twice in three Claude entries — the pair's volatility is not cooperating with breakout-retest setups. Second, the index complex (NAS100, US30, US500) is where the cleanest dollar prints have come from; whether that pattern repeats under Week 2's heavier macro calendar will matter.
For Claude, the question is whether the Monday-sprint rhythm repeats or whether Week 1's frequency was a function of the setup density, not the model's baseline. For GPT, the question is simpler: when the in-scope ledger opens, what cadence does the model post? Both answers matter. We'll have them by Friday.
Frequently Asked Questions
- Who won Week 1 of the AI Trading Benchmark?
- Claude Opus 4.6 closed Week 1 at $50,958.51 (+1.92%) after 7 in-scope trades at a 57.1% win rate and +1.23R net. GPT-5.4 closed at $50,000.00 — flat. After the Season 1 scope refinement, GPT's Week 1 ledger is empty, so the head-to-head comparison effectively begins in Week 2.
- Why does GPT show zero trades in Week 1?
- GPT-5.4 attempted entries during Week 1, but all of them were in instruments later removed from the experiment universe during a Season 1 scope refinement. The benchmark now tracks four instruments: NAS100, US30, US500, and EURUSD. After the refinement was applied retroactively, GPT's Week 1 ledger is empty.
- What was the biggest single trade of Week 1?
- Claude's US30 long on Friday April 17 was the biggest dollar print of the week at +$1,256.60 (+1.24R, TP1). The trade banked profit before the broader Friday pullback in the index complex arrived and was the print that pulled Claude's Friday session green despite the EURUSD stop on the same day.
- Why did Claude take a small positive on the week?
- Volume plus early-week cushion. The Monday sprint produced three wins in a single session — NAS100, US30, and EURUSD all long, all profitable. That cushion absorbed Tuesday's EURUSD stop, Thursday's NAS100 short stop, and Friday's EURUSD stop. Net for the week: +$958.51 across seven trades.
- What macro events drove markets in Week 1?
- Tuesday's PPI release was the dominant event. Core PPI printed 0.1% against 0.4% expected; headline PPI came in at -0.5% against 1.1% expected. The dovish surprise weakened the dollar to 98.06, drove 10-year yields to 4.27%, and supported risk-on trades through Thursday before the Friday reversal.
Related Reading
- Apr 17, 2026ClaudeEURUSDLONG
EURUSD Pullback Long Stopped Out — 10 Evaluations, Then the Floor Broke
- Apr 17, 2026ClaudeUS30LONG
US30 Long Hits TP1 Then Reverses — Claude Banks +$1,256 Before the Pullback
- Apr 16, 2026ClaudeNAS100SHORT
NAS100 VWAP Short Stopped Out — Claude's Counter-Trend Gamble Fails at -1.0R
- Apr 14, 2026ClaudeEURUSDLONG
EURUSD Long Hit the Stop — Claude's First Loss of the Season
- Apr 13, 2026ClaudeEURUSDLONG
EURUSD Pullback Long Sweeps All Three Targets — +3.1R
- Apr 13, 2026ClaudeNAS100LONG
NAS100 Long Banks TP3 in the Monday Sprint — +$711.85
- Apr 13, 2026ClaudeUS30SHORT
US30 Short Banks TP1 — Claude's Third Win of the Monday Sprint
Methodology
This weekly editorial aggregates trading results from April 13-17, 2026. All numbers come from the live broker execution ledger — no simulation, no backtest.
How P&L is computed. Week P&L is calculated as weekEndBalance - weekStartBalance, never as the sum of individual trade net P&L. The two can differ slightly due to rounding in partial exits; the broker balance is always authoritative.
Week rollover. Each week's starting balance is the previous week's ending balance. Week 1 uses the experiment's initial capital ($50,000 per model). This is why account balances — not trade sums — are the ground truth for performance tracking.
Net R vs. Net P&L. Net R is a risk-adjusted measure (sum of each trade's reward/risk multiple). Net P&L is the literal dollar change in account balance. Both are reported; R-multiples are more comparable across instruments with different tick values.
Weekend handling. Daily balance series forward-fill Saturday and Sunday from the prior Friday close, since markets are closed. This keeps chart visuals continuous without fabricating activity.
Methodology stability. Rules don't change mid-phase. If any rule is updated for a future phase, it's documented at the methodology page.
Scope refinement. This editorial was retroactively updated on 2026-05-12 to remove XAUUSD and USDJPY from the experiment universe.
Week 2 begins Monday. Claude carries $958 of cushion; GPT's in-scope ledger opens Monday at $50,000. What I'll be watching is whether Tuesday's PPI setup repeats on Wednesday's CPI print, and whether GPT's first benchmark-eligible trades arrive with the same conviction the model was showing on out-of-scope setups in Week 1.
Compare with Eduardo’s analysis →Stay in the loop
Want the framework behind these trades?
Get every trade analysis, weekly battle report, and the full AI Trading Playbook delivered directly to you.
- Daily Trade Analysis
- Weekly Battle Reports
- AI Trading Insights
- The AI Trading Playbook (free)