GPT Stops Out on NAS100 — Three Evaluations, One Quick Loss
Day 18. GPT entered the pullback at 28781.5, the bid never appeared, the trade stopped at 28704 for -1R. The closing week of Season 1 starts with a coordinated trap.
This article is part of the Season 1 archive. Season 1 of the AI Trading Benchmark ran from April 13 to May 12, 2026 — Claude Opus 4.6 versus GPT-5.4 across four instruments (NAS100, US30, US500, EUR/USD) after a mid-season scope refinement removed XAUUSD and USDJPY. Final standings: GPT-5.4 +8.90% / $54,448.54, Claude Opus 4.6 +4.53% / $52,266.67, combined portfolio +6.72% / $106,715.21. Season 2 is now live; the homepage carries the current scorecard.
Day 18 is the start of Week 4. GPT entered Thursday with a comfortable lead over Claude and a season balance of $53,377.24 after a quiet May 6. The May 7 NAS100 long was the first attempt of the closing week — a clean pullback into structural support that drew confluence from a Fib retracement, an EMA cluster, and overnight liquidity at 28781.5. GPT took the entry at 82% confidence after three evaluations. Thirty-eight points of adverse motion later, the position stopped at 28704 for -$1,032.28 (SL) on a 13.10-lot fill — -1.0R on the framework side.
The trade is straightforward as data: marginal entry, fast resolution, full-R loss. The interesting question is why GPT and Claude — two models with different evaluation frameworks — both reached the same conclusion on the same chart in the same hour and both lost the same trade.
About reported results. Each model outputs three take-profit targets (TP1, TP2, TP3) per trade. Under Season 1's exit policy, broker positions close 100% at TP1 — the realized R-multiple is always TP1's R when any take-profit is reached. On stop-outs, the realized R is -1.0R and the broker dollar figure reflects the full position loss. The numbers in this article are the realized outcomes.
Result
R-Multiple
AI Confidence
Win Rate
Season Record
Market Environment — May 7, 2026
Thursday's New York open had the right shape for a continuation long.
NAS100 had spent the late Asian session and early London hours grinding higher off the overnight low at 28701, working through 28820 by 14:30 UTC on three relatively orderly 15-minute candles. The session's first pullback rolled into the 28740–28770 zone at 14:55, hit 28765 on a clean wick, and reversed with what looked at first read like a textbook reaction. By 15:00 the tape had reclaimed 28780, parking exactly inside the EMA cluster at 28755–28790. The Fib 50% retracement of the morning's range printed at 28755. The overnight low at 28701 sat below as the protective structural reference.
The Trend Authority Agent read NAS100 at BULLISH 73% with a TRENDING regime tag — the cleanest trend signal NAS100 had given GPT all week. EMA stacking was tight across the 5m, 15m, and 60m timeframes. The Macro Agent leaned bull at 53% on a calendar window with no high-impact USD prints scheduled. The dollar was at 98.31 — half a tick weaker than yesterday's close. The VIX had eased to 14.92. Jobless claims had cleared the data window cleanly at 232K, below the 240K consensus.
Three filters lined up: bullish trend authority, supportive macro, and a structural pullback into a four-way confluence (50% Fib, EMA cluster, overnight liquidity sweep, prior intraday support). The setup looked like a continuation long the framework would normally rate four-of-five.
The one filter that should have been a flag was the quality of the pullback. Price had spent two and a half hours setting up the geometry, but the actual retracement into the zone happened on three short, indecisive wicks rather than one clean reaction candle. The 5-minute MACD histogram had flipped negative ten minutes before GPT's entry window opened. The 15-minute RSI was rolling over from overbought 82 toward 70. Structural read was correct. Momentum read was less convincing. The framework rates trend authority above momentum confirmation when the two diverge — which is most of the time the right call, and occasionally the wrong one.
NAS100 LONG
Setup: Pullback Buy into breakout support
Analysis by SkyAnalyst AI
Strategy Analysis
The pullback long that GPT rated tradeable
The setup template is the standard intraday continuation. After a directional opening move, price retraces toward a confluence zone with at least three independent levels. The entry trigger is a completed 5-minute bullish candle inside the zone. The structural stop sits below the zone's lower edge. Targets ladder up to extension levels.
GPT's read on the NAS100 chart was direct. The trend authority signal was unambiguous at 73% bullish in a TRENDING regime. The pullback geometry was clean. The four-way confluence at 28740–28770 was the kind of zone the framework is calibrated to take. The entry trigger candle at 14:58 UTC had pushed back above 28780, closing inside what GPT considered the zone's upper resolution. The 15-minute RSI was rolling from overbought toward neutral — which the framework reads as healthy reset, not bearish reversal, when trend authority is strong.
Three evaluations between 14:42 and 15:05 walked through the read. The first two flagged the trigger as marginal. The third — at 82% confidence, GPT's highest read of the day — committed. From the model's own reasoning: "Pullback has completed into 28740–28770 with the entry candle closing back above 28780. Trend authority at 73% BULLISH TRENDING confirms direction. Macro stable. RSI reset on the 15m is supportive, not bearish. Position sized at 13.10 lots with structural stop at 28705 immediately below overnight liquidity."
Three evaluations is not many. GPT's typical evaluation count on a marginal-trigger pullback is five to eight. Today's three reflects high conviction on the trend authority signal and a willingness to act fast on what looked like a clean confluence stack. The framework rewarded that profile twice in the prior week — the May 5 US30 long was a one-evaluation entry at 74% confidence and went to TP3. Today the framework took the same shape and the market did not cooperate.
What happened after the entry
The trade was over in forty-six minutes.
Entry was 15:11:58 UTC at 28781.5 — the exact top of the entry zone on a 13.10-lot fill, $1,067.54 of risk-side dollars. The peak price after entry was 28781.5, which means the position never went into profit at all. Price drifted lower from the entry candle. By 15:25 the tape was at 28760, working under the EMA cluster. By 15:40 NAS100 was inside the 28720s. The structural stop at 28705 — placed just below the overnight low sweep at 28701 — was tagged at 15:58:00 UTC. Exit price was 28704, a single tick below the stop. The broker recorded the position closing at -$1,032.28 (SL) on the 13.10-lot fill, -1.0R on the framework side.
No oscillation. No second-touch test. No bid materializing at the confluence zone. The pullback geometry that drew the entry simply did not produce buyers. The Fib level at 28755 broke. The EMA cluster broke. The overnight liquidity sweep at 28701 was tested, the stop was triggered, and the trade was done.
The number that matters is the peak price after entry: 28781.5. That is the entry price itself. The trade never produced a single tick of favorable excursion. In benchmark trading, this is a specific and recognizable failure mode — the entry was either too early, too late, or against the prevailing micro-structure. Forty-six minutes from entry to stop with zero positive excursion is a clean signal that the trigger was the wrong candle.
Why the trigger fired anyway
This is the diagnostic part of the post-mortem.
GPT's evaluation framework weights trend authority and macro confirmation heavily — by design, the framework is calibrated to act on strong directional signals even when the trigger candle is imperfect. The trade-off is explicit: a higher base-rate of entries in trending regimes, at the cost of occasional losses on entries a stricter trigger-rule would have filtered. Across enough trades, the calibration produces a season-long return curve that depends on how often the trending regime tag accurately predicts continuation.
In the May 7 reading, the regime tag was TRENDING at 73% bullish. The framework treated that as load-bearing. The trigger candle at 14:58 — which closed back above 28780 inside the zone — was rated good enough given the strength of the trend authority signal. A strict-trigger version of the rule would have required a textbook bullish reversal candle with a clean wick into the zone's lower edge. That candle did not print. Price hovered at the zone's upper edge, never tested the zone's lower edge cleanly, and the entry triggered on a tepid mid-zone candle.
The quiet tell was the MACD histogram on the 5-minute. It had flipped negative ten minutes before the entry candle closed. A strict-momentum filter would have held the position out of the trade on that basis. GPT's framework does not currently weight 5-minute MACD as a primary input — the model treats it as a secondary confirmation that can be overridden by strong trend authority. Today the trend authority signal was strong and the momentum signal was weak, and the framework's calibration weighted the trend authority. The market resolved in favor of the momentum read.
This is not a framework failure. It is a calibration trade-off resolving against the framework on a single trade. Across the season, the same calibration produced GPT's +8.90% finish. The closing-week loss on this entry is the cost of that calibration on Day 18.
The parallel chart — Claude on the same read
The coordinated trap is the part of today worth dwelling on.
Claude took NAS100 long the same morning, nine minutes after GPT, on essentially the same confluence stack. Different evaluation framework, different trigger rule, six evaluations versus GPT's three, and the same conclusion: the pullback is tradeable, the structural read is correct, the entry should fire. Claude entered at 28770 at 15:21 UTC. GPT had entered at 28781.5 at 15:11 UTC. Both stops were placed in the same structural zone — Claude's at 28690, GPT's at 28705. Both stops triggered within minutes of each other. Claude's loss was -$1,014.00 (SL) and -1.0R. GPT's loss was -$1,032.28 (SL) and -1.0R. The aggregate loss across both models was -$2,046.28.
When two independent models with different evaluation frameworks reach the same conclusion on the same chart and the market rejects both, the read on the underlying instrument deserves the credit, not either model's discipline. The NAS100 chart on May 7 was a trap setup — strong structural confluence, weak follow-through, no buyers when buyers were required. Both models read the structure correctly and both models lost the same trade for the same reason.
This is the part of the AI Trading Benchmark that is hardest to control for. Two models can both be right about what the chart shows and both be wrong about what the chart does next. The benchmark measures cumulative outcomes across many trades. Single coordinated losses like today's are not evidence of framework failure — they are evidence that two models can reach the same incorrect conclusion when the structural read invites it.
How this fits GPT's Season 1 profile
GPT closes Season 1 at +8.90%. The May 7 loss is part of that closing balance, not an outlier from it.
GPT's Season 1 profile is a model that takes more trades than Claude, acts faster on high-confluence setups, and pays for that aggression with occasional fast losses on marginal triggers. The cumulative ledger going into Day 18 sits at +$3,377.24 net across ten recorded broker trades — a +6.75% return before today's stop is booked. The wins have come on setups where the trend authority signal aligned with momentum and the trigger candle was unambiguous. The losses have come on setups where the trend authority signal was strong but the trigger was imperfect — exactly the pattern today's NAS100 produced.
Looking at the prior month, the same pattern is visible. The April 22 US30 long was a strict-trigger entry that went to TP3 cleanly. The April 27 US30 long was another textbook entry that produced. The April 24 NAS100 long was a marginal-trigger entry that stopped fast — same failure mode as today. The framework's win/loss distribution is consistent with its calibration. The seasonality of marginal-trigger losses is also consistent: they cluster around regime transitions where trend authority remains strong but momentum starts to break.
Today's stop puts GPT at $52,344.96 — a step back from the May 6 close at $53,377.24. The season balance is intact, but the lead over Claude has compressed to $78 after both models stopped on the same NAS100 setup. The closing week will be decided by the next two trading days, May 8 and May 11. A small swing in either direction reshuffles the season standings entirely.
What the closing-week setup looks like
Friday is May 8. The macro calendar carries the BoC speech in the afternoon and a light US data window. The dollar will be the deciding input — DXY has been ranging between 98.20 and 98.50 for three sessions, and the May 7 close at 98.31 is mid-range with no clear directional bias.
For GPT, the strategic question is whether to take the May 8 setup if it presents. The model's bias has been to act on high-confluence reads even after a fresh loss. The framework treats prior-trade outcomes as independent of the current setup's edge, which is the correct statistical posture but occasionally produces back-to-back losses when the regime continues to disappoint. The Season 1 closing-week record will reflect whether May 8 and May 11 produce setups GPT's calibration is built for, or setups it is calibrated against.
Today's data point is clean. The trade was marginal. The execution was disciplined. The loss was fast and contained. The framework took a high-confluence read, the market rejected it, the position resolved at the structural stop. That is the framework working exactly as designed — every part of it except the part that matters today, which is the market's vote on whether the structural read was right.
BEARISH 62%Regime:
TRANSITIONINGMacro:
HEADWINDbull 62%USD structural support from extended Fed hold (confirmed by Core CPI beat) and rising real yields (+270bp differential vs. EUR) creates strong bearish bias for G3 pairs, while predetermined BoC legislative cuts through Q3 2026 drive exceptional USDCAD bull conviction at 88%. Geopolitical escalation (Strait of Hormuz closure) sustains oil at $106.86, creating asymmetric cross-pair impacts. Group bias leans USD-bullish with 62 confidence, tempered by AUDUSD binary catalyst tonight and GBPUSD capitulation mechanics.
17.98 (normal)98.293 bullish4.465 rising106.86 bullishWhat the AI Actually Saw
Full 5.6k-character analysis produced by the AI before entering this trade. Unedited.
- Multi-timeframe technical analysis
- Key support and resistance levels
- Pattern recognition and trade setup identification
- Risk-reward assessment with defined invalidation
- Macro context and cross-asset correlation check
Powered by SkyAnalyst AI
SkyAnalyst AINAS100 NY AM Environment
Primary bias: Bullish, but only on pullback support or confirmed breakout. No short setup here.
1) 10Y yield assessment
- US10Y = 4.342% vs 5D EMA = 4.371%
- Yield is below its 5-day EMA, which is the key first-order bullish input for NAS100.
- It is not making a new 5-day high or spiking above recent highs.
- Today’s low 4.316% is below yesterday’s low 4.334%, so rates are still leaning softer.
Default directional bias from rates: Bullish NAS100
2) Macro + cross-asset confirmation
- Macro Agent NAS100 bias: Bullish, 72% confidence
- Rate factor is explicitly cited: lower 10Y yields support mega-cap tech duration
- Caveat: oil collapse introduces some recession/growth-scare noise
Cross-asset check:
- VIX 17.2 < 5D EMA 17.45 → supportive / not bearish
- DXY 97.80 < 5D EMA 98.19 → supportive / no USD headwind
Conclusion: yield + VIX + DXY all lean supportive for NAS100 longs.
3) Trend structure and key levels
- Trend Agent: BULLISH, 72% confidence, TRENDING
- Key levels:
- Resistance: 28827
- VWAP: 28647
- Support / invalidation: 28610
60m structure:
- Price is above fast and slow EMAs
- RSI has stayed firmly bullish (mid/high 60s to 70s)
- MACD is above zero, and latest bar shows re-acceleration after prior cooling
- Daily context remains strong: above yesterday’s high (28705.3) and above 5D EMA
Gap context:
- Opening push was roughly <100 pts vs prior close, so this is not a classic large-gap-fill setup.
Trend foundation: Strongly bullish
4) Lower timeframe read
15m:
- EMA fast > EMA slow
- RSI 75.8: overbought but still trend-supportive
- MACD histogram positive/strong
5m:
- EMA fast > EMA slow
- RSI cooled from overbought to 68
- MACD histogram still positive, but weakening
- Price is trading well above VWAP and near upper 2SD, so do not chase blindly
Execution style now:
Best long ideas are:
- Pullback long into support
- Breakout-retest long above the session high
Breadth / rotation note
- ADD = -175, so breadth is not broadly strong.
- This is not the specific “NYAD positive divergence” warning you mentioned, but it does imply narrow leadership risk: NAS100 strength may be concentrated in large-cap tech.
Qualified Setups Only
Setup 1 — Pullback Buy into breakout support
Type: Trend-continuation pullback long
Confluence: 6/7
Quality: High (about 8.0/9.5)
Why it qualifies
- ✅ 10Y yields support longs
- ✅ Macro Agent bullish >60 with rate support cited
- ✅ Trend Agent bullish >60
- ✅ 60m trend/EMA structure bullish
- ✅ 15m RSI >50 with positive MACD histogram
- ✅ No high-impact USD event within 30 min
- ⚠️ Need the 5m directional reaction at support to trigger the trade
Entry zone
- 28768 to 28782
Entry trigger
- Wait for a 5m rejection candle / bullish engulfing / hold-and-close back above 28780
- Ideal sequence: dip into 28770s, hold above 28755/28760, then reclaim 28780
Stop loss zone
- 28705 to 28712
- This is:
- below nearby pullback structure
- below prior breakout area
- wide enough for NAS100 overshoot
- above the broader trend invalidation 28610
Take profit levels
- TP1: 28840 to 28852
- TP2: 28920 to 28935
- TP3: 29000 to 29020
Trade management
- If TP1 hits, consider reducing risk and trailing under the latest 5m higher low or 5m EMA9.
- If price loses 28755 impulsively before trigger, stand aside and reassess lower.
Setup 2 — Session-high breakout retest long
Type: Opening-range / session-high breakout continuation
Confluence: 5/7 to 6/7
Quality: Medium-High to High (about 7.2–8.0/9.5)
Why it qualifies
- ✅ Yields support
- ✅ Macro supports
- ✅ Trend supports
- ✅ 60m structure bullish
- ✅ 15m momentum bullish
- ✅ No event risk nearby
- ⚠️ Because price is extended above VWAP, this must be a confirmed breakout + retest, not a market order chase
Entry zone
- 28856 to 28870
Entry trigger
- Need a 5m close above 28853.2 (today’s high), then a retest hold above 28845/28850
- Best trigger is a retest that prints a bullish 5m candle and holds the breakout shelf
Stop loss zone
- 28795 to 28803
- Gives room beyond the retest structure and satisfies the minimum volatility allowance
Take profit levels
- TP1: 28925 to 28935
- TP2: 28995 to 29010
- TP3: 29080 to 29100
Trade management
- If breakout occurs on weak follow-through and immediately falls back below 28845, cancel the idea.
- Do not take this if price simply spikes through highs without a retest or volume confirmation.
Execution Notes
- No short setups while:
- 10Y remains below its 5D EMA
- VIX remains below its 5D EMA
- DXY remains below its 5D EMA
- Trend Agent remains bullish above 28610
- Because current price is already far above VWAP, buying at market is poor location.
- If neither setup triggers cleanly within the late-AM window, No Trade.
Risk Guidance
- Use volatility-adjusted sizing, not tighter stops.
- Normal conditions: risk around 1% of equity or less
- If in drawdown: scale down
- Avoid increasing risk above 2% max, even if momentum looks strong
If you want, I can turn these into an automation-ready order template with exact trigger logic and R-multiple mapping.
3 Evaluations — What Changed in 4 Minutes
The AI evaluated NAS100-Pepperstone 3 times before reaching a final decision. Each step shows confidence, reasoning, and whether the model was ready to enter.
I would not enter yet. Price has pulled back cleanly into the 28768-28782 entry zone and is still holding above the key 28755/28760 support area, but the required 5m confirmation is not complete: the active candle is still forming, volume is light, and live price has slipped back under 28780 after only a brief reclaim. The broader intraday structure remains bullish with price above the 5m and 15m EMAs and MACD still above zero, so the setup is close, but I want a completed 5m rejection/engulfing candle or a firm close back above 28780 before executing.
AI evaluation unavailable - using conservative default
Price is back inside the 28768-28782 entry zone and the latest completed 5-minute candle delivered the requested trigger: it dipped into the 28770s, held above the key 28755/28760 support area, and closed back above 28780 at 28793.2. That is a valid breakout-support retest for a long, with 5m and 15m trend structure still bullish, price above both EMAs, and MACD remaining positive. The setup is not perfect because volatility is elevated and the market is still below the 28851 high, but the reclaim confirms enough edge to enter now.
Account Performance
Profit taken at TP1 — the full position is closed at the first target to keep results measurable and comparable across models.
Key Takeaways
Three observations from a fast stop on the opening day of Season 1's closing week:
-
Zero positive excursion is a clean diagnostic. GPT's NAS100 long never traded above its entry price. Peak price after entry was 28781.5, the exact entry tick. When a position never produces favorable motion, the entry candle was either too early, too late, or against the prevailing micro-structure. The realized loss of -$1,032.28 (SL) and -1.0R (SL) inside forty-six minutes is the textbook signature of a trigger that fired on a marginal read in an otherwise strong-looking confluence zone.
-
Three-evaluation entries can lose, and that is consistent with the framework. GPT's typical entry count on a marginal trigger is five to eight evaluations. Today's three reflects high conviction on the 73% bullish trend authority signal and a fast-acting calibration that rewarded the model twice in the prior week. The same calibration that produced the May 5 US30 win at +1.67R produced today's stop. Calibration trade-offs do not have asymmetric outcomes — they have symmetric distributions, and Day 18 is one tail of the distribution.
-
Two models, same trap. Claude took NAS100 long nine minutes after GPT, on the same confluence stack, with a different evaluation framework and a higher evaluation count, and stopped at -1.0R (SL) for -$1,014.00 (SL). When two independent frameworks read the same chart the same way and the market rejects both, the chart deserves the credit. Coordinated losses are not framework failures — they are evidence that structural confluence is necessary but not sufficient for a trade to work.
Season 1 has two trading days left. GPT closes Day 18 at $52,344.96 — a +4.69% return on starting capital — and the lead over Claude has compressed to $78 after the coordinated NAS100 loss. The framework profile that built the season's gains is the same one that took today's stop. The honest closing-week question is not whether GPT can avoid losses on May 8 and May 11, but whether the calibration that produced both the lead and the loss continues to read the residual macro window correctly. — Eduardo, Senior Research Editor
Compare with Isaac’s analysis →Methodology
Both AI models receive identical market data, identical infrastructure, and identical risk parameters. No prompt engineering. No human intervention. Standard API temperature (0.0). Trades executed on demo accounts with institutional spread conditions via Pepperstone Markets. Each model operates with a $50,000 starting balance and 2% risk per trade. All positions are closed at TP1 — the first take-profit target — to keep results measurable and directly comparable across models.
Forex pairs and gold (XAUUSD) have standardized pricing across brokers — the prices in this article will closely match what you see on your own platform. US index CFDs (NAS100, US30, US500) are different: each broker constructs its own index price feed, so entry prices, stop distances, and P&L figures for index trades are specific to Pepperstone Markets. All trades in this experiment were analyzed, executed, and settled on Pepperstone demo accounts using Pepperstone's price feed.
Why This Cannot Be Replicated in ChatGPT or Claude Alone
Copying the analysis prompt into ChatGPT or Claude will not reproduce these results. Neither model has access to live market data — and the data is the foundation of everything.
Every analysis session, SkyAnalyst AI assembles a structured data packet of 50,000–100,000 tokens per instrument from live broker APIs. This is not a price quote. It contains 5 hours of multi-timeframe candle data across 60-minute, 15-minute, and 5-minute charts — each candle carrying full indicator overlays: EMA fast/slow, ATR, MACD with histogram, RSI, volume with SMA, VWAP with standard deviation bands, and others. On top of that: session structure levels (Tokyo, London, New York highs and lows), Fibonacci retracement and extension levels, a rolling 5-day macro window covering the 10Y yield, DXY, VIX, NYAD breadth, oil, and gold — along with additional proprietary data layers, all formatted as structured JSON specifically designed for LLM consumption.
The model never starts from raw data. Before Claude or GPT sees anything, two proprietary SkyAnalyst AI agents — among other internal systems — have already processed the environment: the Macro Analysis Agent produces directional bias with confidence scores and tradeability ratings across intraday and multi-day horizons, while the Trend Authority Agent evaluates technical structure — EMA alignment, momentum, regime classification — and outputs direction, confidence, key levels, and invalidation prices. The trading model synthesizes what these agents and preprocessing layers have already evaluated. This multi-agent pipeline is what produces the quality of analysis shown in this article — a single prompt to a single model, no matter how detailed, cannot replicate what multiple specialized systems produce in sequence.
The goal is to emulate what a professional trader actually does: read the macro environment, analyze multi-timeframe technicals, identify a setup with defined risk, wait for precise entry conditions, and execute with discipline. SkyAnalyst AI provides the infrastructure that gives the trading model everything it needs to do this — live data, preprocessed context, real-time monitoring, and broker execution. This is not a chatbot experiment. It is an institutional-grade trading pipeline where the AI model is the decision-maker, operating under the same conditions and constraints a professional desk would demand.
Trading involves substantial risk of loss. Past performance is not indicative of future results. These are AI model results shared for educational and research purposes only. Not financial advice.
Stay in the loop
Want the framework behind these trades?
Get every trade analysis, weekly battle report, and the full AI Trading Playbook delivered directly to you.
- Daily Trade Analysis
- Weekly Battle Reports
- AI Trading Insights
- The AI Trading Playbook (free)