A 16-wide-issue processor will need to execute about eight loads/stores per cycle.
The primary design goal of the data-cache hierarchy is to provide the necessary bandwidth to support eight loads/stores per cycle.
The size of a single, monolithic, multi-ported, first-level data cache would likely be so large that it would jeopardize the cycle time.
Because of this, we expect the first-level data cache to be replicated to provide the required ports.
Further features of the data supply system:
- A bigger, second-level data cache with less port requirements.
- Data prefetching.
- Processors will predict the addresses of loads, allowing loads to be executed before the computation of operands needed for their address calculation.
- Processors will predict dependencies between loads and stores, allowing them to predict that a load is always dependent on some older store.