OpenAI and Broadcom unveiled Jalapeño on Tuesday, OpenAI’s first custom inference ASIC and the opening move in what the two companies are framing as a multi-generation compute platform built together. The chip is inference-only, which is the entire strategic point: it’s designed to serve models like ChatGPT to users at a fraction of the per-query cost of running the same workloads on Nvidia GPUs.

Broadcom CEO Hock Tan told Bloomberg that early testing is showing roughly 50% cost savings versus typical AI graphics processing units. That number, if it survives contact with production scale, is the kind of figure that reshapes a P&L. Inference is the recurring cost of every ChatGPT session; pre-training is the capex bet. OpenAI has been paying Nvidia margins on both, and Jalapeño is a direct attempt to claw the inference side back in-house.

The development timeline is the other headline. OpenAI says Jalapeño went from initial design to manufacturing tape-out in nine months, what the company calls “what may be the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors.” OpenAI President Greg Brockman, speaking to CNBC’s David Faber, credited the compression to the team’s use of OpenAI’s own models in the design loop, and said the speedup from using their own models surprised them.

That detail is doing a lot of narrative work. It positions OpenAI not just as a buyer of silicon but as a lab whose models are now fast enough at chip design to reshape semiconductor timelines. It’s also, conveniently, a sales pitch.

The strategic limits are real. ASICs are cheaper and tuned to specific workloads but less flexible than Nvidia’s GPUs, and industry observers expect the heavier pre-training runs to stay on Nvidia hardware for the foreseeable future. Tan flagged “small prototype development” in late 2026 before broader scale-up, with gigawatt-scale deployments alongside Microsoft and other partners beginning this year.

Markets have already priced the direction of travel. Broadcom shares are up 10% in 2026 and have multiplied nearly sevenfold since the end of 2022, lifted by exactly this kind of custom-silicon deal with hyperscalers and frontier labs. The Nvidia premium isn’t disappearing. It’s being routed around, one inference workload at a time.

Sources