Montag, 21. Januar 2008 20:52
I did not get any feedback on the .NET extensions forum - perhaps this is a more appropriate forum for my question:
We keep reading about the future of the "80-core" PC as a time frame to realize massive parallelism and while this is true for the common consumer and commodity hardware, there are several hardware platforms that support massive parallelism today. Many of these are closed, proprietary, custom, not to mention cost prohibitive for most - except one (that I am aware of).
The Tesla platform provides a very cost effective massively parallel platform today. While very useful for scientific research, the commercial sector is also looking at it for large data analysis. I am in the finserv sector and many of my peers and I are seriously looking at it for massive number crunching. The one blocking factor is CUDA. While it is an adequate language, it is a new language and we would like to be able to leverage our existing developer skills and tools.
Question: Have you looked into and considered providing support to allow parallel task library to run on Tesla or similar hardware that is attached to a PC. This would allow us (for the cost of a couple regular servers) to start realizing massive parallelism today with existing languages and tools.
Dienstag, 22. Januar 2008 15:22
Up till now you could find dual-CPU systems for servers. Massive data processing with complex algorithms is usually done by dedicated CPUs (DSPs). Today you can find several companies that support more than 8 cores for DSPs all using a dedicated operating system and compiler.
Today the majority of code was written assuming that PCs have a single CPU, servers may have two and DSPs may have more. The problem is the application design. Servers usually use multiple threads to performs wait operations and only DSP applications really get good performance by use of parallelism.
The big problem is that DSPs are coded assuming that the application runs in a closed environment.
A GPU is the first to have a DSP running in an open system. Now the next step is to allow open software to run on this CPUs.
You need to remember that the MMX is a DSP that is external to the CPU on your PC, but (and it is a big but) it is still coupled with the CPU and is using assembly commands that the Pentium family supported long before the 386s.
The new GPU is an external device just like a USB device, only it is on the motherboard for better performance.
DirectX solved this by introducing a special set of API that you can use from your application. Such an API call goes to the device driver and from there to the hardware. This is why it is called hardware acceleration.
You can only perform discrete operations this way. Something like a query. This is because your code runs on the CPU with CPU registers and the GPU has its own registers on a different hardware device.
Debugging such an application is not far from debugging a PocketPC device - remotely.
Bottom line, it is possible to incorporate code but you will most probably end up with the same design as DirectX's where you have discrete algorithms running on the GPU and you use transactions from your application.
Donnerstag, 7. August 2008 10:46
Have you looked at the Accelerator project from Microsoft Research?
It does not implement all the functionality CUDA does, but may be sufficient to the task at hand.