The $400 pricing was determined by the NSF $27.5 million award from NSF, targeting the creation of 10 PFLOPS machine. Intel shipped the early boards but hit the price target to allow TACC to hit their target.
The 2.6PF TOP500 submission had a partial build out of the 10petaflop final systems. There were less than 2,000 of the PHI boards installed.
"In order to reach the 10 PFLOPS target, the computational power in Stampede is split in two parts. First 2 PFLOPS come from 6,400 nodes carrying two Intel Xeon E5 Series processors (Sandy Bridge-EP) and 32GB of DDR3 memory. Second part of the system comes from the countless MIC cards (now known as Xeon Phi), which were supposed to deliver 8 PFLOPS. As it turns out, the "countless MIC coprocessors" fell a bit short of the target, with TACC expecting more than 7PFLOPS, but less than 8PFLOPS. Third part of the Stampede system is 16 memory nodes with 1TB of DDR3 memory and two NVIDIA Tesla K20 boards. Furthermore, Tesla K20 boards are located in 128 out of 6,400 compute nodes for computational purposes, bringing the total number of K20 boards to 144. This number pales in comparison to around 6,500 Xeon Phi boards. The ScaleMP virtual SMP solution is used in order to create a shared memory environment, spanning across all 16TB of memory. This part will mostly target "big data".
While the prices of Intel Xeon E5 systems and the Tesla boards were delivered at special but still realistic pricing, we were quite surprised to learn that the computing center only paid around $400 per Xeon Phi board. Given that competing Tesla K20 boards retail for $3199 (available in December), this can be viewed from a price dumping perspective. Bear in mind the TACC only had $2.4 million for Xeon Phi boards, and reaching 8PFLOPS e.g. 7+ PFLOPS requires around 6,000-7,000 boards. At $400, it is quite a steal. "