Most of us are already familiar with Nvidia’s silicon offerings found inside a large number of smartphones and tablets, including the Nexus 7. NVIDIA’s current version is the quad-core Tegra 4, which the company is using inside the upcoming Nvidia Shield & other devices from various manufacturers. Yet Tegra 4 didn’t really take off, primarily due to intense competition with Qualcomm’s more efficient SoCs.
But rest assured, something big is coming! Nvidia’s latest promotional efforts take the spotlight off of a fully integrated solution to focus on NVIDIA’s bread and butter: the Kepler GPU. Indeed, Nvidia will be bringing the fully fledged Kepler architecture to the mobile world. It obviously no GTX Titan by any means since the implementation is based on a shrunken down & simplified Kepler design but it nevertheless smash records as far as mobile GPU computing is concerned.
Last year Nvidia’s CEO Jen-Hsun Huang said his company would bring its powerful Kepler graphics processor to “superphones.” Yesterday in a private event at Siggraph 2013 in in Anaheim, California, NVIDIA demonstrated functional Logan silicon for the very first time. The Kepler GPU fits into a tablet form factorand has a power envelope of just two watts!
Nvidia Kepler is also the first architecture designed to scale from supercomputers all the way down to mobile devices!
So what’s inside? Kepler Mobile features a single Kepler SMX (Next Generation Streaming Multiprocessors). That basically means we’ll be looking at 192 CUDA cores, 16 texture units, and, presumably 64 KB of L1. NVIDIA isn’t talking about CPU cores, but it’s safe to assume that Logan will be another 4+1 arrangement of cores or perhaps built around the ARM big.LITTLE design- though likely still based on ARM’s Cortex A15 IP.
The architectural unification of Nvidia’s Tegra SoC and GeForce lineup further closes the gap between the Nvidia PC GPU & Mobile industry. This marks the point where Nvidia makes the switch to unified shaders meaning full modern API support such as OpenGL ES 3.0, OpenGL 4.4, DirectX 11, OpenCL, and CUDA 5.0, which are currently not available on Tegra 4’s NV40-class architecture.
In contrast when Nvidia introduced Tegra 4, it shrugged off the lack of OpenGL ES 3.0 support by saying there wouldn’t be much content any time soon due to developers’ propensity for targeting the lowest common denominator of mobile devices.
For developers and publishers, this would mean that porting existing games/apps may take less time as it once did. This also means that future ARM-based Windows 8 tablets/phones could use the regular DirectX 11 API, making PC apps/games even easier to port. More importantly, developers will have less code to build/maintain and could focus on gameplay and eye-candy instead.
To sum up, that’s 192 full blowned unfied shaders compared to Tegra 4’s 24 vertex and 48 pixel shaders. Nvidia didn’t introduce the unified shader design with Tegra 4 since it wasn’t necessarily the best setup for their mobile chips and they were not yet ready to make the transition.
In ancient graphics architectures, vertices and pixels are being processed by different types of compute units. With a unified architecture, all the compute units are the same and can be allocated to work on any compute tasks. With this setup, the hardware utilization level is higher because more units can be put to work at any given time, regardless of the nature of the scene (wireframe vs. flat-shaded vs. heavily textured). – This turns into higher performance and efficiency.
A single SMX may not sound like a lot, but to put this in perspective, NVIDIA was quick to remind us that it packs more peak theoretical ALU bound performance than the PS3 (which NVIDIA built the “RSX” GPU for) or the once mighty GeForce 8800 GTX (memory bandwidth is another story however). Note that we’re looking at a comparison of GFLOPS and NOT game performance here.
Nvidia is using a 2 W figure to describe Mobile Kepler, comparing it to GeForce GTX Titan’s 250 W maximum TDP. But it is even more interesting to compare it with existing hardware. According to NVIDIA, Kepler uses 3X less power than the iPad 4 GPU to render the same graphics.
NVIDIA took its Ira demo, originally run on a Titan at GTC 2013, and got it up and running on a Logan development board. Ira did need some work to make the transition to mobile. The skin shaders were simplified, smaller textures are used and the rendering resolution was decreased. The mobile version of Ira isn’t quite as jaw-dropping as its desktop counterpart but it’s still easily the most impressive demonstration of its kind on mobile hardware.
The real-time face simulation tries to hop right over the uncanny valley, using an incredibly detailed polygonal model, high-resolution textures, HDR lighting effects that work through multiple layers of simulated skin, and conventional graphics effects like FXAA, tone mapping, and bloom. Particular attention has been paid to sub-surface scattering, a shading system that simulates light passing through skin – note the ears. This version of the Ira face is running full HD (1920×1080) instead of 4K.