During its press tech talk, NVIDIA talked about several technologies surrounding the upcoming GeForce RTX 40 graphics cards based on the Ada Lovelace GPUs. Some of the technologies that were highlighted included the Ada Lovelace GPU itself, the latest DLSS 3 technology, and coolers featured on the brand new Founders Edition models.
NVIDIA Further Details Ada Lovelace GPUs, DLSS 3, GeForce RTX 40 Graphics Cards & More
NVIDIA will be launching its first GeForce RTX 40 series graphics card, the RTX 4090, on the 12th of October, followed by the RTX 4080 series in November. There’s a lot to talk about so let’s get us started.
NVIDIA’s AD102 ‘Ada Lovelace’ GPU – The Next-Gen Powerhouse
At the heart of the NVIDIA GeForce RTX 4090 graphics card lies the Ada Lovelace AD102 GPU. The GPU measures 608,4mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC’s 5nm (N5) node designed for the green team. The GPU features an insane 76.3 Billion transistors.
The NVIDIA Ada Lovelace AD102 GPU features up to 12 GPC (Graphics Processing Clusters). These are 5 more SMs compared to the Ampere GA102 GPUs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What’s changed is the FP32 & the INT32 core configuration. Each sub-core will include 64 FP32 units but combined FP32+INT32 units will go up to 128. This is because half of the FP32 units don’t share the same sub-core as the IN32 units. The 64 FP32 cores are separate from the 128 INT32 cores.
So in total, each sub-core will consist of 16 FP32 plus 16 INT32 units for a total of 32 units. Each SM will have a total of 64 FP32 units plus 64 INT32 units for a total of 128 units. And since there are a total of 144 SM units (12 per GPC), we are looking at a total of 18,432 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM & their own L0 i-cache. This is a 33% increase in Wraps/Threads vs the GA102 GPU. The Register file size is 16,384 across a 32-bit lane. Each SM also carries its own 128 KB of L1 data cache and shared memory so that’s 18 MB of L1 cache.
Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The L2 cache will be increased to 96 MB as mentioned in the leaks. This is a 16x increase over the Ampere GPU that hosts just 6 MB of L2 cache. The cache will be shared across the GPU. The GPU will also feature up to 192 ROPs for the full-die.
There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. Overall, the Ada Lovelace AD102 GPU will offer:
71% More GPCs (Versus Ampere)
71% More Cores (Versus Ampere)
50% More L1 Cache (Versus Ampere)
16x More L2 Cache (Versus Ampere)
71% More ROPs (Versus Ampere)
4th Gen Tensor & 3rd Gen RT Cores
The full die has not been featured on any GPU so far, not even the L40 which has 2 SMs disabled. It is likely that as yields progress, we will eventually see a gaming and workstation product using the full-fat AD102. Till then, the RTX 4090 is the top gaming graphics card while the RTX 6000 Ada is the top workstation solution.
NVIDIA AD102 ‘Ada Lovelace’ Gaming GPU Block Diagram:
NVIDIA AD102 ‘Ada Lovelace’ Gaming GPU ‘SM’ Block Diagram:
NVIDIA Founders Edition Designed To Utilize Up To 600W of Power For Higher Overclocking
As for its brand new Founders Edition cards, the GeForce RTX 4090 24 GB and RTX 4080 16 GB, NVIDIA has produced a compact PCB, similar to the ones we saw on the previous generation & designing a PCB like this helps improve airflow and cooling performance.
NVIDIA says that they have further optimized the Dual Axial Flow Through system, increasing fan sizes and fin volume by 10%, offering 20% higher air-flow, and upgrading to a 23-phase power supply (20+3 Phase for RTX 4090). Memory temperatures are reduced, and the new, substantially more powerful Ada GPUs are kept cool in ventilated cases, giving gamers excellent overclocking headroom. NVIDIA went through a rigorous testing procedure and is said to have evaluated as many as 50 fan designs before finalizing the one we are getting on the new cards. The cooler is used to dissipate heat from the heatsink assembly that comprises a vapor chamber, a big jump from the previous design too.
The NVIDIA GeForce RTX 4080 also uses the same cooler as the RTX 4090 Founders Edition and since it has a lower TDP, it should deliver even better thermal performance.
2 of 9
Each GeForce RTX 40 Series Founders Edition graphics card reduces cable clutter by leveraging the new standard GPU power input of next-gen ATX 3.0 power supplies, the PCIe Gen-5 16-pin Connector. This enables you to power GeForce RTX 40 Series graphics cards with just a single cable, improving the aesthetics of your build. If you are using a previous-gen power supply, an adapter cable is included in the box, allowing you to plug in three 8-pin power connectors, with an optional fourth connector for more overclocking headroom. ATX 3.0 power supplies will be available in October from ASUS, Cooler Master, FSP, Gigabyte, iBuyPower, MSI, and ThermalTake, with more models to come.
One advantage that comes with the new 16-pin connector is that while the Founders Edition cards are designed at 450W & 320W, respectively, they can utilize the extra headroom provided through the new connector for extreme overclocking with the RTX 4090 going for that full 600W mark. The new power delivery also gives the RTX 40 series a 10x increase in response time to power transient management compared to the previous generation.
The new cards also feature DP 1.4a (4K 12-bit HDR @ 240Hz) and HDMI 2.1 (4K 120Hz HDR / 8K 60Hz HDR). All cards are compliant with the PCIe Gen 4 interface on existing motherboards and also feature full compliance with the Resizable-BAR technologies.
NVIDIA GeForce RTX 4090 Founders Edition PCB:
Next-Gen Micron GDDR6X Dies Run 10C Cooler Thanks To New Process Node
NVIDIA has also leveraged Micron’s latest GDDR6X memory chips for its GeForce RTX 40 graphics cards which run 10C cooler, are more power efficient and since they are all 16Gb DRAM dies, they can be fused on one side of the PCB to be cooled better than dual-sided memory.
NVIDIA DLSS 3: Compatibility, Feature Set, Gaming Performance & More
Now, let’s dive into the technological advancements that allow these incredible achievements. To begin with, NVIDIA engineers started with DLSS Super Resolution and added something called Optical Multi Frame Generation based on Ada’s Optical Flow Accelerator. This accelerator analyzes two sequential frames from a particular game, capturing pixel details such as particles, reflections, lighting, and shadows.
On top of that, NVIDIA DLSS 3 also takes into account conventional game engine information such as motion vectors. The DLSS Frame Generation AI convolutional autoencoder network will then decide how to use each of the four inputs (current and prior frames, optical flow field, and motion vectors) to recreate intermediate frames in the best possible way.
NVIDIA DLSS 3 is said to reconstruct 3/4 of the first frame with DLSS Super Resolution and the full second frame with the help of the aforementioned DLSS Frame Generation. Overall, NVIDIA DLSS 3 reconstructs 7/8 of the two total frames displayed, which explains the massive performance uplift.
Additionally, the new version of the Deep Learning Super Sampling image reconstruction technique also includes the latency-lowering NVIDIA Reflex technology.
So talking about DLSS GPU support, the technology will feature full DLSS Frame Generation across all RTX 40 series GPUs. For the older RTX 20 & RTX 30 series, the technology will be available as the DLSS Super Resolution suite (also on RTX 40). Lastly, NVIDIA Reflex will be supported by GeForce 900 series and above.
2 of 9
Cyberpunk 2077 has been shown running NVIDIA DLSS 3, the brand new Ray Tracing Overdrive, and NVIDIA Reflex with up to 4x improved performance and up to 2x reduced latency. That’s not all, as NVIDIA is even promising benefits for CPU-bound games, which generally didn’t run much faster with DLSS 2.0. For example, the notoriously CPU-heavy Microsoft Flight Simulator gets up to 2x improved performance with the new DLSS. Overall, NVIDIA said the following over 35 games and apps already pledged support to NVIDIA DLSS 3.
A Plague Tale: Requiem
Atomic Heart
Black Myth: Wukong
Bright Memory: Infinite
Chernobylite
Conqueror’s Blade
Cyberpunk 2077
Dakar Rally
Deliver Us Mars
Destroy All Humans! 2 – Reprobed
Dying Light 2 Stay Human
F1 22
F.I.S.T.: Forged In Shadow Torch
Frostbite Engine
HITMAN 3
Hogwarts Legacy
ICARUS
Jurassic World Evolution 2
Justice
Loopmancer
Marauders
Microsoft Flight Simulator
Midnight Ghost Hunt
Mount & Blade II: Bannerlord
Naraka: Bladepoint
NVIDIA Omniverse
NVIDIA Racer RTX
PERISH
Portal with RTX
Ripout
S.T.A.L.K.E.R. 2: Heart of Chornobyl
Scathe
Sword and Fairy 7
SYNCED
The Lord of the Rings: Gollum
The Witcher 3: Wild Hunt
THRONE AND LIBERTY
Tower of Fantasy
Unity
Unreal Engine 4 & 5
Warhammer 40,000: Darktide
The NVIDIA GeForce RTX 4080 16 GB and RTX 4080 12 GB graphics cards will be launching in November and be priced at $1199 US and $899 US, respectively.
2 of 9
NVIDIA GeForce RTX 40 Series Preliminary Specs:
Graphics Card NameNVIDIA GeForce RTX 4090NVIDIA GeForce RTX 4080 16GNVIDIA GeForce RTX 4080 12GNVIDIA GeForce RTX 3090 Ti
GPU NameAda Lovelace AD102-300?Ada Lovelace AD103-300?Ada Lovelace AD104-400?Ampere GA102-225
Process NodeTSMC 4NTSMC 4NTSMC 4NSamsung 8nm
Die Size608mm2~450mm2~450mm2628.4mm2
Transistors76 BillionTBDTBD28 Billion
CUDA Cores163849728768010240
TMUs / ROPsTBDTBDTBD320 / 112
Tensor / RT Cores576 / 144TBD / TBDTBD / TBD320 / 80
Base Clock2230 MHz2210 MHz2310 MHz1365 MHz
Boost Clock2520 MHz2510 MHz2610 MHz1665 MHz
FP32 Compute83 TFLOPs49 TFLOPs40 TFLOPs40 TFLOPs
RT TFLOPs191 TFLOPs113 TFLOPs82 TFLOPs78 TFLOPs
Tensor-TOPs1321 TOPs780 TOPs641 TOPs320 TOPs
Memory Capacity24 GB GDDR6X16 GB GDDR6X12 GB GDDR6X12 GB GDDR6X
Memory Bus384-bit256-bit192-bit384-bit
Memory Speed21.0 Gbps23.0 Gbps21.0 Gbps19 Gbps
Bandwidth1008 GB/s736 GB/s504 GB/s912 Gbps
TBP450W320W285W350W
Price (MSRP / FE)$1599 US$1199 US$899 US$1199
Launch (Availability)October 2022November 2022November 20223rd June 2021
Poll Options are limited because JavaScript is disabled in your browser.
The post NVIDIA Details Ada Lovelace GPU Block Diagram, Streaming Multi-Processor, DLSS 3 & GeForce RTX 40 Founders Edition Cooler by Hassan Mujtaba appeared first on Wccftech.