Thursday, July 29, 2010

Intel, AMD, NVIDIA and High Performance Computing

In the past few weeks we've had a few conversations at work with engineers from ATI/AMD, Intel, and NVIDIA about their offerings for high performance computing.

Intel is really pushing their Ct software framework and its benefits around code maintainability. It is basically C++ and it abstracts away the need for hardware-specific low level code. For example, to add two 2-d arrays together, you can just write code like:

resultArray = array1 + array2;

No need for low level intrinsics to access the SIMD instruction sets. Ct also supports other higher level abstractions like list homomorphisms. So theoretically, your code can use nice high level data abstractions and it will run fast. As new CPU features are added in the future, your same codebase will take advantage of them through Ct.

As one might expect, Ct is really aimed at intel cpu-like architectures and not GPUs. Intel does a good job explaining the kinds of algorithms and applications in its sweet spot in its whitepaper (

Now, it sounded to me like the Intel team was saying that Ct would also generate code for GPUs. But as the whitepaper describes, it seeems to me that your algorithm would still need to conform to the limitations and advantages of the GPU in order to run fast on the GPU.

So for now, I believe that data parallel algorithms that are suited to the GPU will probably need to be implemented using NVIDIA's CUDA or OpenCl. One interesting tidbit was that NVIDIA claimed to us that their OpenCL drivers should run just as fast as their CUDA drivers. You might expect that NVIDIA would neglect OpenCL. CUDA has a strong following already and it gives NVIDIA some vendor lock-in since CUDA is NVIDIA only. But NVIDIA says they want to differentiate and sell their products on hardware performance not through software API lock-in, which of course, is a good thing and something we developers are happy to hear. Assuming this is true, I think that OpenCl is very attractive since can run on ATI, NVIDIA, or even CPU architectures with the same codebase.


Post a Comment