OpenAI has introduced GPT‑5.3‑Codex, describing it as its most capable agentic coding model so far and saying it combines coding strengths from GPT‑5.2‑Codex with the reasoning and professional knowledge capabilities of GPT‑5.2. OpenAI also says the model runs 25% faster for Codex users, aiming to support longer tasks that involve research, tool use, and complex execution.
The company positions GPT‑5.3‑Codex as a shift from code-only help to broader “computer work,” with the ability to stay interactive while it works so users can steer it without losing context. OpenAI says the model is available to people on paid ChatGPT plans across Codex surfaces including the app, CLI, IDE extension, and web, with API access planned once it can be safely enabled. A separate write-up for developers also frames the release around “agent-style development workflows,” emphasizing tool use, computer operation, and end-to-end task completion.
What OpenAI says is new
OpenAI says GPT‑5.3‑Codex can handle long-running workflows more like a colleague, providing frequent updates and allowing real-time back-and-forth as work progresses. The company highlights a “steering” option in the Codex app, noting it can be enabled under Settings → General → Follow-up behavior.
OpenAI also claims GPT‑5.3‑Codex was “instrumental in creating itself,” saying early versions helped debug its own training, manage deployment, and diagnose evaluation results. A developer-focused summary repeats this internal-use claim, describing the model as having helped debug training and support deployment.
Beyond coding, OpenAI says the model is intended to support tasks across the software lifecycle and related professional work, including debugging, deploying, monitoring, writing product documents, editing copy, user research, tests, and metrics. One third-party overview similarly describes GPT‑5.3‑Codex as a merged approach meant to cover both “coding agent” and “reasoning” strengths, aimed at a more general agentic experience.
Benchmarks highlighted in the release
OpenAI reports scores for GPT‑5.3‑Codex across several benchmarks it uses to measure coding, agentic, and real-world capabilities, including SWE‑Bench Pro, Terminal‑Bench 2.0, OSWorld‑Verified, and GDPval. In OpenAI’s table, GPT‑5.3‑Codex scores 56.8% on SWE‑Bench Pro (Public), compared with 56.4% for GPT‑5.2‑Codex and 55.6% for GPT‑5.2.
On Terminal‑Bench 2.0, OpenAI reports GPT‑5.3‑Codex at 77.3%, compared with 64.0% for GPT‑5.2‑Codex and 62.2% for GPT‑5.2. On OSWorld‑Verified, OpenAI reports 64.7% for GPT‑5.3‑Codex, compared with 38.2% for GPT‑5.2‑Codex and 37.9% for GPT‑5.2. For GDPval (wins or ties), OpenAI reports 70.9% for GPT‑5.3‑Codex and 70.9% for GPT‑5.2, and notes results in the post were run with “xhigh” reasoning effort.
OpenAI’s table also includes a “Cybersecurity Capture The Flag Challenges” line, reporting 77.6% for GPT‑5.3‑Codex, 67.4% for GPT‑5.2‑Codex, and 67.7% for GPT‑5.2. A separate developer-oriented summary reproduces the same benchmark values and also notes that OpenAI says humans score around 72% on OSWorld‑Verified.
Availability and performance claims
OpenAI says GPT‑5.3‑Codex is 25% faster for Codex users due to infrastructure and inference stack improvements, framing that speed as part of delivering quicker interactions and results. A developer write-up echoes the 25% speed claim and connects it to “faster interactions” for Codex users.
For access, OpenAI says the model is available with paid ChatGPT plans anywhere Codex is available (app, CLI, IDE extension, and web), while API access is something it is working to enable safely. A third-party overview also states that the model is not yet available in the OpenAI API and says API access is expected “soon,” without providing token pricing details.
OpenAI also says GPT‑5.3‑Codex was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems. A developer summary repeats this hardware note and ties it to the release’s infrastructure story.
Safety and cybersecurity posture
OpenAI says GPT‑5.3‑Codex is the first model it classifies as “High capability” for cybersecurity-related tasks under its Preparedness Framework and that it is deploying additional mitigations and access controls as a result. The company says it is taking a precautionary approach, stating it lacks definitive evidence the model can automate cyberattacks end-to-end but is still deploying its “most comprehensive cybersecurity safety stack” to date.
OpenAI says its mitigations include safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines that include threat intelligence. It also says some requests flagged as elevated cyber risk may be routed from GPT‑5.3‑Codex to GPT‑5.2, and that developers can apply for fuller access through a “Trusted Access for Cyber” pilot program.
In the system card, OpenAI adds that GPT‑5.3‑Codex is being treated as High capability on biology and deployed with safeguards used for other GPT‑5 family models, while stating it does not reach High capability on AI self-improvement. The system card also says this is the first launch OpenAI is treating as High capability in the cybersecurity domain under its framework, again describing a layered safety stack intended to impede threat actors while supporting cyber defenders.
