Chapter 4

Prelude - Chapter 1 - Chapter 2 - Chapter 3 - Chapter 4 - Chapter 5 - Chapter 6 - Chapter 7

Click here to start

Table of Contents

Chapter 4

Multiple-issue processors

Components of a superscalar processor

Floorplan of the PowerPC 604

Superscalar pipeline (PowerPC- and enhanced Tomasulo-scheme)

Superscalar pipeline without reservation stations

Superscalar pipeline with decoupled instruction windows


Reservation station(s)

Dispatch (PowerPC- and enhanced Tomasulo-Scheme)



Precise interrupt (Precise exception)


Explanation of the term “superscalar”

Explanation of the term “superscalar”

Explanation of the term “superscalar”

Please recall: architecture, ISA, microarchitecture

Sections of a superscalar processor

Temporal vs. spacial parallelism

I-cache access and instruction fetch

Instruction fetch

Prefetching and instruction fetch prediction

Branch prediction

Misprediction penalty

Branch-Target Buffer or Branch-Target Address Cache

Branch-Target Buffer or Branch-Target Address Cache

Static branch prediction

Dynamic branch prediction

One-bit predictor

One-bit vs. two-bit predictors

Two-bit predictors (Saturation Counter Scheme)

Two-bit predictors (Hysteresis Scheme)

Two-bit predictors

Two-bit predictors and correlation-based prediction


One-bit predictor initialized to “predict taken”

Two-bit saturation counter predictor initialized to “predict weakly taken”

Two-bit predictor (Hysteresis counter) initialized to “predict weakly taken”

Predictor behavior in example

Correlation-based predictor

Correlation-based prediction (2,2)-predictor

Prediction behavior of (1,1) correlating predictor

Prediction behavior of (1,1) correlating predictor

Prediction behavior of (1,1) correlating predictor

Prediction behavior of (1,1) correlating predictor

Prediction behavior of (1,1) correlating predictor

Two-level adaptive predictors

Implementation of a GAg(4)-predictor

Mispredictions can be restrained by additionally using:

Implementation of a GAp(4) predictor

GAs(4, 2n)

Compare correlation-based (2,2)-predictor (left) with two-level adaptive GAs(4,2n) predictor (right)

Two-level adaptive predictors: Per-address history schemes



Two-level adaptive predictors: Per-set history schemes



Two-level adaptive predictors

Estimation of hardware costs

Two-level adaptive predictors Simulations of Yeh and Patt using the SPEC89 benchmarks

gselect and gshare predictors

Hybrid predictors

Simulations [Grunwald]


Predicated instructions and multipath execution - Confidence estimation

Implementation of a confidence estimator

Predicated instructions

Predication example


Eager (multipath) execution

(a) Single path speculative execution (b) Full eager execution (c) Disjoint eager execution

Prediction of indirect branches

Branch handling techniques and implementations

High-bandwidth branch prediction

Details of superscalar pipeline

Decode stage

Decoding variable-length instructions


Rename stage

Two principal techniques to implement renaming

Register rename logic

Issue and dispatch

Instruction window organizations

The following issue schemes are commonly used

Single-level, two-window issue

Two-level issue with multiple windows

Wakeup logic

Selection logic

Execution stages

Types of FUs

Media processors and multimedia units

Media processors and multimedia units

Multimedia extensions in today's microprocessors

3D graphical enhancement

Finalizing pipelined execution - completion, commitment, retirement and write-back

Precise interrupts

Precise interrupts

Reorder buffers

Reorder buffer variations

Other recovery mechanisms

Relaxing in-order retirement

The Intel P5 and P6 family

Micro-dataflow in PentiumPro 1995

PentiumPro and Pentium II/III

Pentium® Pro Processor and Pentium II/III Microarchitecture

Pentium II/III

Pentium II/III: The in-order section

Pentium II/III: The in-order section (Continued)

The fetch/decode unit

The out-of-order execute section

Latencies and throughtput for Pentium II/III FUs

Issue/Execute Unit

The in-order retire section.

Retire unit

The Pentium II/III pipeline

Pentium® Pro processor basic execution environment

Application programming registers

PPT Slide

Pentium II/III summary and offsprings

Pentium 4

Pentium 4 features

Advanced dynamic execution

First level caches

Second level caches

NetBurst microarchitecture

Streaming SIMD extensions 2 (SSE2) technology

400 MHz Intel NetBurst microarchitecture system bus

Pentium 4 data types

PPT Slide

Pentium 4 offsprings

PPT Slide


VLIW and superscalar

EPIC: a paradigm shift

EPIC: a paradigm shift

The fusion of VLIW and superscalar techniques

Many EPIC features are taken from VLIWs

Shortcomings of early VLIWs

EPIC design challenges

EPIC Processors, Intel's IA-64 ISA and Itanium

IA-64 Architecture

Today’s architecture challenges

Intel's IA-64 ISA

IA-64’s large register file

Intel's IA-64 ISA

IA-64 bundles

IA-64 : Explicitly parallel architecture

IA-64 scalability

Predication in IA-64 ISA

If-then-else statement

Predication in IA-64 ISA

Speculative loading

Speculative loading - “control speculation”

Speculative loading

Speculative loading “data speculation”

Speculative loading/checking

Software pipelining via rotating registers

SW pipelining by modulo scheduling

SW pipelining by register rotation

SW pipelining by register rotation - Counted loop example

SW pipelining by register rotation - Counted loop example

SW pipelining by register rotation - Optimizations and limitations

IA-64 register stack

IA-64 support for procedure calls

Full binary IA-32 instruction compatibility

Full binary compatibility for PA-RISC

Delivery of streaming media

IA-64 3D graphics capabilities

IA-64 for scientific analysis

PPT Slide

Memory support for high performance technical computing

IA server/workstation roadmap


Conceptual view of Itanium

Itanium processor core pipeline

Itanium processor

Itanium die plot

Itanium vs. Willamette (P4)

Author: Jurij Silc


Home Page:

Download presentation source