Chip capacity
-
-
not sure if the above is readable-will reformat if not
-
Here is a 300mm wafer comprising approx 60 chips (Blackwell chips). A typical defect free yield will be 29-30 chips, circa 50% yield.
TSMC provides a back end packaging process called CoWoS . This is the process whereby the chip is stacking chips onto a single silicon interposer, which is then mounted onto a substrate. This is the area of constrained capacity. Recently Amkor has been added to list of companies able to provide advanced packaging services. In 2024 total TSMC capacity is circa 35,000 wafers per month, rising to 75,000 in 2025 and 135,000 in 2026. We believe, based on reports from JP Morgan that capacity is ramping faster than previously thought and it provides quality insight into output growth. Specifically 2025 will be over 100% greater than 2024. We also know that Blackwell chips are approx 40% more expensive than Hopper which would support 140-150% increase in revenues. Every chip is spoken for.
The same report also states that Blackwell next, Rubin, which was scheduled for production in 2026 in currently 6 months ahead of schedule with working prototypes already in the wild. At this time we expect Nvidia to transition to their Grace-Next GPU, Vera and to adopt HBM4 memory which will improve efficiency significantly. What is interesting is the existing architecture peaks with the NVL-72. 72 Blackwell GPUs in one rack + 36 Grace CPU and consumes 120KW of power. Compare this to the Rubin Ultra servers which will pack up to 576 GPUs per rack + 288 Vera CPUs and consume 960KW of power per rack! The math suggests these racks will cost up to $30M each which is a staggering number but the important point here is efficiency. A rack that is 10X the cost but 100X faster than its predecessor equates to significant cost savings in cap ex and op ex. We believe Microsoft is planning a 2000 rack installation. Yes, $60B, running on nuclear power (green).
What will all this compute be used for? There are 7 new foundational LLMs (large language model)being built today. Each with trillions of parameters (chatGPT 5 is rumoured to be ’several’ trillion. What is a parameter?
In the context of Large Language Models (LLMs) like ChatGPT, a parameter is like a tiny "knob" or "switch" that the model adjusts during training to learn patterns from data. These parameters help the model make decisions, such as predicting the next word in a sentence or understanding the relationship between words.
Here’s a simple way to think about it:Analogy: A Recipe for a Cake
Imagine you are baking a cake. The ingredients (like sugar, flour, and eggs) and how much of each you use are like parameters.
• Too much sugar? The cake gets too sweet.
• Too little flour? The cake won’t hold together.
By fine-tuning these amounts, you get the perfect cake.
Similarly, in an LLM, parameters adjust how the model combines different pieces of information to generate good results.Masa Son (Softbank) thinks is only 3 new models(of 7) see the light of day we will need 20M new GPUs to train them. This only takes into account LLM training let alone the enterprise and Govt, other general use. This is 35k racks of Rubin Ultra and over $1T of investment by 2027(end)
-
Great info Adam, thanks..
I do wonder though, x company buys 10's of thousands of blackwell, eventually makes money off it, then next gen comes out... how do you dump billions of $ of blackwells and afford to also buy billions more Next gen , let alone the logistics of the out with the old in with the new !
-
The beauty of Nvidia chips, unlike many other vendors is that the chips are all compatible. NVL is a four PCIe card package with an NVLink interconnect-so think of the chip as just a computation. You will have Hopper racks, Blackwell racks and later, Rubin racks all 'computing' and all connected.
Over time old racks will be replaced but over a 12-18 month chip generation transition you will find different racks in the same data centre all working together. Customers simply add more power!
The above example might me a CSP who is renting out the various nodes to end users. You would charge $10 for a Blackwell GPU and $5 for a Hopper GPU
-
There will also be a secondary market where 'old' GPUs are repurposed. The math works at present. The naysayers will tell you there is no monetisation which is clearly false. Meta is making record profits and so is MSFT. This is only scratching the surface in areas like recommender based systems, co-pilots and chat bots. The real money will come from new inventions. ChatGPT 5 is supposedly capable of running entire enterprises, running entire call centres and obviously the goal is for AI to get so smart it reinvents the wheel! It might one day solve Teslas FSD problem
In the interim trillions will be spent in the pursuit of these goals. Will they succeed? It's not certain, but the major players are very committed. And all we can ever hope for(with investing) is visibility out a 2 years at best and what we see looks very positive. The naysayers can keep repeating the same thing over and over but valuation wise, earnings matter. Nvidia has a real PE of about 25, MSFT about the same, Apple closer to 30, Tesla 140, Palantir 180 . These are static multiples. You then need to factor in growth rates. It used to be acceptable if a growth stock exhibited growth equal to its PE. Msft just about fits that measure, Apple does not and Tesla is no where near it (maybe 15%) which suggests Tesla is grossly over valued, along with Palantir. Nvidia growth rates are about 50% if you took a 5 year avg on a forward basis which suggests it's cheap. It gets even worse for Tesla when you look at Cashflow.
We have been asked before, why we don't buy Palatir. The answer, its valuation is sky high. It's valued at 50X revenue and it earns 500M with a valuation of $164B. Not what I would call value. Not when its operating margin is < 14% and its revenue growth rate is 20% at best. Compare this to Nvidia operating margin of 63% and revenue growth rate of 100% today(Ive used 50%)
-
-
Earlier week at the Open AI summit, OpenAI's CEO Sam Altman said their next advancement(Model) released in 2025 will be more disruptive than most people expect. 'The industry will see the first examples of AGI in 2025 and will use different tools to achieve complete it. Alman added that initially the impact on humanity will be minimal but in time will become intense. 'In 2025 we will have systems that people will say 'wow, this changes what I expected'. Altman added that in time AGI will bring significant human job displacement. It will be transformative to economies.