OpenAI has rolled out its latest frontier model, GPT-5.4, across ChatGPT, Codex and its API, promising stronger reasoning, agent capabilities and long-context performance for professional tasks.
On Thursday, the company released two versions, GPT-5.4 and GPT-5.4 Pro, aimed at different performance tiers. It says the model can handle complex work across more than forty professions, including finance, law and engineering. Additionally, OpenAI claims GPT-5.4 reduces the need for separate coding models in most use cases.
The system can also operate desktops and web browsers using screenshots, mouse input and keyboard commands. On the OSWorld-Verified benchmark, GPT-5.4 achieved a 75 per cent task success rate. Meanwhile, that score exceeds GPT-5.2’s 47.3 per cent and a 72.4 per cent human baseline.
The model performed strongly on web navigation tests such as WebArena-Verified and Online-Mind2Web. Consequently, OpenAI positions it as more reliable for automated website workflows and agent-based tasks.
OpenAI also introduced a new tool-search method that cuts token use in large systems by nearly half. Furthermore, the long-context version processes about 1.05 million tokens, while the standard version handles 272,000 tokens before extra charges apply.
The company said GPT-5.4 produces fewer false claims than GPT-5.2. However, it now provides more visibility into its reasoning steps through its system card, offering users a clearer view of how it reaches answers.
OpenAI says GPT-5.4 can write code, search the web, use tools and navigate desktops through screenshots, mouse clicks and keyboard inputs. It also says the model can handle spreadsheet work, slide creation and document editing inside longer professional workflows.
Read more: Robot horse blends AI and tradition as DEEP Robotics eyes 2026 IPO
Read more: Grok claims Netanyahu’s proof-of-life video in Jerusalem cafe is AI-generated
ChatGPT 5.4 competes well against Claude
The company said GPT-5.4 leads its earlier model on hard web-browsing and tool-use tests. Mercor CEO Brendan Foody called it the “best model we’ve ever tried” for professional services work.
Claude still offers a strong comparison point, especially for very large research jobs. Anthropic says Claude Opus 4.6 carries a 1 million-token context window and performs well in coding, tool use and long-context reasoning.
However, OpenAI argues GPT-5.4 now combines those strengths with lower-cost general-purpose agent work and native computer use in one model. Additionally, OpenAI prices GPT-5.4 at USD$2.50 per 1 million input tokens and USD$15.00 per 1 million output tokens for long-context standard use.
By contrast, Claude Opus 4.6 costs USD$5.00 and USD$25.00, respectively. Meanwhile, Anthropic still has a narrow edge on SWE-bench Verified, where Claude Opus 4.6 scored 78.7 per cent against GPT-5.4’s 76.9 per cent. Thomson Reuters (TSE: TRI) CTO Joel Hron described Claude’s long-context gains as a “meaningful leap.”