OpenAI and Broadcom on Wednesday unveiled Jalapeño, an LLM-optimized inference chip that Hock Tan, Broadcom’s president and CEO, told Bloomberg should run OpenAI’s workloads at roughly 50% lower cost than typical AI GPUs. It’s OpenAI’s first custom silicon, and it exists to do one thing: reduce structural dependence on Nvidia.

The economics are the whole story. Inference, not training, is now the line item that scales with ChatGPT usage, and a half-priced accelerator dedicated to a single customer’s model stack changes the unit math behind every conversation. Tan told CNBC that prototype work begins in late 2026, the real ramp lands in 2027, and the program goes “full tilt” in the first half of 2028, with a next-generation version slated for 2028 and new chips arriving annually thereafter. Gigawatt-scale deployment alongside Microsoft and other partners is set to begin this year.

The development arc is the part that’ll be studied. Greg Brockman, OpenAI’s president, told CNBC’s David Faber that the chip went from initial design to manufacturing tape-out in nine months, with OpenAI’s own models doing meaningful work on the design itself. Broadcom is characterizing it as possibly the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors. Engineering samples are already running ML workloads at production frequency and power, including GPT-5.3-Codex-Spark. TSMC will handle mass production; Samsung Electronics and SK Hynix are supplying memory. Charlie Kawwas, Broadcom’s president of Semiconductor Solutions, helped present the first sample to Sam Altman.

Markets understood the signal. Broadcom shares are up 10% in 2026 and have multiplied nearly sevenfold since the end of 2022, joining Google and Amazon among the hyperscaler custom-silicon franchises Wall Street now prices as a distinct category.

The framing on both sides was scarcity. Brockman says OpenAI “cannot get compute fast enough.” Tan describes demand from his six biggest customers as “simply insatiable.” Jalapeño is what happens when the largest model lab decides that renting from Nvidia isn’t a permanent strategy.

Sources