For each algorithm and program, there are architectures that are better than others. Some computation may need a lot of FLOPS, but FLOPS are not the only thing to consider. Communication and memory bandwidth and latency are as important as computational power, specially since memory speed and CPU speed are decoupled.