A $1200 Computational Powerhouse for Deep Learning Apps

I have two objectives in this post. First I would like to present some details about NVDIA’s latest Graphics Processing Unit (GPU) chip.  My second objective is to present the reasons why I believe that the availability of this GPU combined with Google’s recent release of TensorFlow (Artificial Intelligence framework) as an open source platform mark a defining moment in the world of Artificial Intelligence (AI) specially Machine Learning (ML).

Two weeks ago I attended a product launch event hosted by NVIDIA for their latest GPU called Pascal GP100.  I was blown away by the magnitude of NVIDIAs accomplishments that made such a remarkable product a reality.  This was somewhat an unusual launch since it was done at an AI Meetup event at Stanford attended by more than a thousand AI professionals and researchers. Below I have attempted to capture some of the key features of this modern-day marvel: 

  1. In actuality this event was the launch of a graphics accelerator card called Titan X that is powered by GP100.  GP100 is based on a brand new architecture developed by NVIDIA targeting Deep Machine Learning, High Performance Computing (HPC), and High-End Gaming applications  
  2. The chip packs 15B transistors having a die size of 600 mm^2 (not a typo, this is the die size and NOT the package size).  It was claimed that it is the largest chip developed by TSMC (using 16nm FinFET process)
  3. It is able to deliver 10.6 TFLOPS (trillion floating point operations per second) of single precision (FP32) performance.  The base clock of 1328 MHz (1480 MHz clock boost)
  4. While the following statement is a gross simplification, the GP100 essentially has 3840 (single-precision) computational cores.  This is a “dream-come-true” for the developers of Deep Learning algorithms
  5. A true packaging marvel.  “ Die on Wafer on Substrate”.  The chip supports the most advanced DRAM interface technology (HBM2) to date.  The package contains eight DRAM dies stacked on top of each in close proximity to the GPU.  Nearly 4000 connections with the DRAM and the GPU
  6. In addition to the need for massive parallel processing, deep learning algorithms mandate sharing of massive amounts of data between GPUs. NVIDIA has developed a proprietary GPU-to-GPU interconnect technology called NVLink that is able to support 160 GB/s bidirectional bandwidth
  7. In a way of a background, high-end graphics processors require a very high degree of parallel processing to render complex and dynamic high resolution images yielding stunning scenes in games.  This makes them ideal for deep learning applications as well since training large Neural Networks require a large number of parallel cores working in unison.  As an analogy, imagine that we are tasked to calibrate hundreds of instruments continuously in real time based on a continuous stream of new information with the caveat that all of the machines have to be tuned simultaneously.  In other words, calibration of one instrument before starting the second one will distort the results of all subsequent calibrations.  The only valid way of recalibration in a Neural Networks is to use multiple computational resources to run simultaneously in parallel.

The technical accomplishments here are astounding but what makes this solution remarkable is the pricing of the accelerator card based on GP100.  This card (Tesla P100) is not much different than the other NVIDIA graphics cards in shape or form and is priced at $1200.  This is definitely a steep price for a gamer, but it is a bargain for companies and institutions working on complex deep learning algorithms.  In a way, you can view this as a powerful supercomputer that can be plugged into a desktop or a server on the network and become a computing powerhouse accessible by a group of developers working on real time ML applications such as image processing and the likes.  Many server OEMs (Dell, HP, IBM, . . .) will introduce servers based on GP100 in not too distant of a future.  

The second key recent milestone in the world of AI, has come from no other than Google.  They recently released TensorFlow as an open source platform (or framework). This framework is targeting machine learning applications and offers tremendous resources that can immensely simplify the life of an algorithm developer allowing them to focus in the core algorithms (and not so much development tools).  You may think of TensorFlow as a super-fancy Matlab-like toolkit specifically developed to cater AI applications.

What is significant here is that armies of programmers can now build amazing AI applications with minimal hardware investment ($1200) and utilizing powerful Open Source TensorFlow tools.