Lessons learned from dataflow (Memory latency)
Microprocessors: An unsolved problem is the memory latency caused by cache misses.
Example: SGI Origin 2000:
- latencies are 11 processor cycles for a L1 cache miss,
- 60 cycles for a L2 cache miss,
- and can be up to 180 cycles for a remote memory access.
- In principle, latencies should be multiplied by the degree of superscalar.
Microprocessors: Only a small part of the memory latency can be hidden by out-of-order execution, write buffer, cache preload hardware, lockup free caches, and a pipelined system bus.
Microprocessors often idle and are unable to exploit the high degree of internal parallelism provided by a wide superscalar approach.
Dataflow: The rapid context switching avoids idling by switching execution to another context.