Today was the big day! Today was the day that NVIDIA was supposed make its big announcements. And it definitely didnt disappoint. But we’ll cover that a little later and offer our thoughts.
The setting was REALLY impressive. We’ve been in many conferences, but you can tell the difference when the pros do it. The screens, the epic music, all contributed to an excellent atmosphere. As we are quite exhausted and actually looking forward to spend the remaining energies playing a bit with our new nvidia Shields, we will keep it short and expand content in the next few days. And now, to the announcements.
New technology to increase memory bandwith between CPU and GPU as well as across multiple GPUs, we hope to get more details soon, but looks fantastic, great for single node, multiple GPU configurations which also profit from the Unified memory in CUDA 6.0.
This was one of the announcements that has gotten us really excited. There had already been talks about the 3D stacked memory, but it wasnt clear wether it would actually be feasible to build chips that way. Well, it turns out that they can. And it looks like it will revolutionize the way chips are made, as, in words of Jen-Hsun Huang (NVIDIA’s CEO), it makes possible to maintain the speedup predicted by Moores law, all in a GPU the size of two credit cards. Just the name change from last GTC from Volta to Pascal was kind of surprising. As for the practical effect of Pascal, it increases memory bandwith to 1TB/sec, basically removing all bottlenecks in memory bound algorithms, which in practice are most of them. So performance wise, many applications should find very very nice speedups, larger than the “official” GigaFlops difference between Maxwell and Pascal. Pascal will continue to increase in energy efficiency, with a nice jump in performance per watt. This energy efficiency bodes well for nvidia in HPC market. The energy savings they can achieve are simply amazing. For some projects like the Google Brain they can replicate computations using 0.3% of the energy that normal servers use, with those numbers they could just change their business model and start charging a percentage of the saved energy bill.
GOOGLE BRAIN and MACHINE LEARNING
One of the most amazing examples, 0.3% of the energy cost and 0.25% of the hardware cost for the same computations!!! Actually alomg the same lines as we are experiencing in some of our projects, we have 3.000$ workstations going 6x faster than a 300.000$ cluster.
Yay! big saving!!
PHISICS SOLVER / FLEX
Usual amazing stuff from the physics simulations, Nvidia has a fantastic staff bringing always very impressive demos. We will nalize Flex in more detail in a future post. Also nice real time fight-demo of the Unreal Engine 4.
Tegra K1 and JETSON TK1
The Tegra K1 development board, aka Jetson K1, will be available for developers in April. K1 is looking pretty sweet, the performance slides look really good, very easy to program, plenty of tools to do so. We can hardly wait to get our hands on one of those babies. With 192 cores, 326 GFlops and 4x energy savings as the A15, no wonder most car companies are getting in board with Tegra.
No word over the 64 bit K1, also known as Denver. Lot of speculation already about it in “the tubes”. For one it looks like Nvidia will start to be more carefull with announcements of Tegra, a bit as they do with the Geforce line. When competition is so tough you want to keep your cards well guarded until you want to play your hand. So we dont think the 64 bit tegra is out of the picture, they are probably saving the anouncement for a better time, We’ve seen this tiem and again with the Geforce cards.
Of course as it usually happens with technology the moment you get your hands in one desired piece of it, the next version is announced and then you only have eyes for what you cannot enjoy just yet. In tegra case that will be Erista (Logan’s stranded son… naming of chips starting to look like a soap Opera). Erista will feature the Maxwell architecture, expecting way lower TDP (1.5 W maybe?)… left for speculation were the number of CUDA cores, as in Maxwell the SMS is 128 cores against the Kepler SMX with 192 we are left to wonder whether Erista would use two of them, for a nice 256 cuda cores. This seems to be the pattern for the Kepler to Maxwell transition, more multiprocessors in exchange for the 192 to 128 reduction.
And finally…..in a Oprah-like move, every GTC attendant got a SHIELD!! :D
PS: We will go through the rest of the day in future posts, our Shields just finished charging..