Google shipped Gemini 3.5 Flash and the release looks very interesting. It beats Gemini 3.1 Pro on coding and agentic benchmarks, runs at more than 280 tokens per second, and lands in the 4th position on our aggregated Leaderboard Ranking - within three points of Claude Opus 4.7. In context, though, there's a number that sits a bit awkwardly in the announcement: $1.50 input / $9.00 output per million tokens. For reference, Gemini 3 Flash launched at $0.50 / $3.00. That's a 3x price increase, 6x if you were on Flash-Lite; and Gemini 3.1 Pro is $2.00 / $12.00.
The headline framing from Google is that 3.5 Flash is "cheaper than the frontier models it now beats." And yeah, technically, that's true: GPT-5.5 launched at $5.00 / $30.00, and Claude Opus 4.7 is $5.00 / $25.00. But developers who built pipelines on Flash did so precisely because it was the budget efficient tier, the model you routed volume through without watching costs balloon, implicit contract that is now repriced. Running the Artificial Analysis benchmark against 3.5 Flash costed ~$1,500, 75% more than doing the same against 3.1 Pro Preview and 450% more than doing it against Gemini 3 Flash.
This fits a pattern we've been tracking. As we wrote when Anthropic released Code Review, the era of subsidized AI inference is winding down. OpenAI's GPT-5.5 was a 2x price jump over 5.4. Claude Opus 4.7 costs the same as Opus 4.6, but needs 30% more tokens on average. Now, 3.5 Flash continues the trend at the Flash tier; the exact place where "cheap enough to run at scale" was the whole value proposition. The labs spent two years buying market share with aggressive pricing, and they're now asking for some of that back.
Google says 3.5 Pro is currently in internal testing and due next month. If Flash already scores so high, Pro could push into genuine frontier range, putting it in direct competition with GPT-5.5 and Claude Opus 4.7 or even surpassing them. Anyway, at this rate, the most interesting number at launch won't be the benchmark score.