東華大學圖書館 |

Utilizing Subword Serialization and Parallelism to Design Efficient High-Performance Processors.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Utilizing Subword Serialization and Parallelism to Design Efficient High-Performance Processors./
作者:	Jackson, Paul J.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2024,
面頁冊數:	157 p.
附註:	Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
Contained By:	Dissertations Abstracts International85-12B.
標題:	Computer engineering. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31294692
ISBN:	9798382810232

Utilizing Subword Serialization and Parallelism to Design Efficient High-Performance Processors.
Jackson, Paul J.

Utilizing Subword Serialization and Parallelism to Design Efficient High-Performance Processors. - Ann Arbor : ProQuest Dissertations & Theses, 2024 - 157 p.

Source: Dissertations Abstracts International, Volume: 85-12, Section: B.

Thesis (Ph.D.)--Princeton University, 2024.

Since the first digital computers, architects have exploited transistor scaling to drive innovation. Decreasing transistor sizes, and subsequently increasing transistor budgets, has naturally led to increasingly complex processor designs. Ultimately, advancements in processor architecture aim to improve performance on computational workloads which enable computers to perform high-level applications. For decades, transistors have scaled exponentially but are approaching fundamental physical limits. The power wall that was identified by the end of Dennard Scaling prompted a revolution in architectural design. Because the power density of transistors no longer stayed constant as transistors scaled, architects developed methods to extract better performance without sacrificing power. A similar wall looms over the architecture community: the end of Moore's law. When transistors stop scaling, architects must find ways to improve performance without using more transistors. This transitional point marking the shifting priorities between different design parameters provides the opportunity to direct research into many directions. This thesis presents a forward looking perspective where silicon area is a fixed limitation and must be considered along with performance and power. This thesis identifies bit-level parallelism as a particularly impactful form of parallelism, the nature and implications of which have not been thoroughly studied in prior art. The effects of bit-level parallelism are studied using Nibbler, a parameterized subword-serial SIMD architecture. Subword-serial architectures like Nibbler perform traditional word-wide computation over multiple cycles, operating on a one-subword-sized portion of the inputs per cycle. This work is the first to isolate and directly study the impact of bit-level parallelism in processor microarchitecture. The analysis presents multiple design points ranging from a fully bit-serial processor to one with a full word-wide datapath of 32 bits. Nibbler is evaluated and characterized for its area, timing, energy, and throughput performance. These characterizations, in combination with additional sensitivity studies, bring light to the effectiveness of serialization as a design technique. This thesis shows that a subword-serial SIMD processor can show simultaneous improvements in all four metrics when compared to a non-serial, word-wide SIMD processor. The results of this study are further analyzed to discuss the impacts of serialization on processor microarchitecture, identifying opportunities for potential future improvements and highlighting key considerations when designing processors which utilize serialized execution.One core idea presented throughout this thesis is that architectural concepts find strength when grounded in reality. While high-level models prove effective in estimating the limits of a technology or concept, physical prototypes demonstrate the feasibility of ideas and provide a minimum bound on performance. Following this ideology, this thesis contains detailed characterizations of two manycore academic chips: Piton and the DECADES test chip. These data, when used in conjunction with their open-sourced RTL and EDA infrastructure, provide anchoring data points which others can use as reference in future architecture evaluation studies.Piton is a 25-core homogeneous tiled processor taped out in the IBM 32nm SOI process technology. Two characterization studies are performed on Piton. The first breaks down power and energy consumption of the chip on SPECint 2006 benchmarks. The second analysis compares the effectiveness of two parallelization techniques, multicore execution and fine-grained multithreading, to provide insight in which techniques work better when optimizing for power, energy, or area.DECADES is a 108-tile heterogeneous tiled processor taped out in the IBM 12nm process technology. The design efforts behind the development of Nibbler culminate in its contribution to the DECADES chip. DECADES contains 23 instances of a 64-lane 8-bit wide Nibbler processor. This thesis details the considerations which must be made when reconciling a theoretically optimal design point with the feasibility of creating a performant chip. This thesis presents an end-to-end analysis of serialization as a design parameter as well as related parallelization techniques. The analyses use a forward-looking perspective emphasizing feasibility and practicality, resulting in a complete picture ranging from theoretical reasoning to realization in Silicon.

ISBN: 9798382810232Subjects--Topical Terms:

621879
Computer engineering.
Subjects--Index Terms:

Bit-level parallelism

Utilizing Subword Serialization and Parallelism to Design Efficient High-Performance Processors.
LDR:05690nmm a2200361 4500 001 2403024
005 20241104055845.5
006 m o d
007 cr#unu||||||||
008 251215s2024 ||||||||||||||||| ||eng d
020 $a 9798382810232
035 $a (MiAaPQ)AAI31294692
035 $a AAI31294692
040 $a MiAaPQ $c MiAaPQ
100 1 $a Jackson, Paul J. $3 859815
245 1 0 $a Utilizing Subword Serialization and Parallelism to Design Efficient High-Performance Processors.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2024
300 $a 157 p.
500 $a Source: Dissertations Abstracts International, Volume: 85-12, Section: B.
500 $a Advisor: Wentzlaff, David.
502 $a Thesis (Ph.D.)--Princeton University, 2024.
520 $a Since the first digital computers, architects have exploited transistor scaling to drive innovation. Decreasing transistor sizes, and subsequently increasing transistor budgets, has naturally led to increasingly complex processor designs. Ultimately, advancements in processor architecture aim to improve performance on computational workloads which enable computers to perform high-level applications. For decades, transistors have scaled exponentially but are approaching fundamental physical limits. The power wall that was identified by the end of Dennard Scaling prompted a revolution in architectural design. Because the power density of transistors no longer stayed constant as transistors scaled, architects developed methods to extract better performance without sacrificing power. A similar wall looms over the architecture community: the end of Moore's law. When transistors stop scaling, architects must find ways to improve performance without using more transistors. This transitional point marking the shifting priorities between different design parameters provides the opportunity to direct research into many directions. This thesis presents a forward looking perspective where silicon area is a fixed limitation and must be considered along with performance and power. This thesis identifies bit-level parallelism as a particularly impactful form of parallelism, the nature and implications of which have not been thoroughly studied in prior art. The effects of bit-level parallelism are studied using Nibbler, a parameterized subword-serial SIMD architecture. Subword-serial architectures like Nibbler perform traditional word-wide computation over multiple cycles, operating on a one-subword-sized portion of the inputs per cycle. This work is the first to isolate and directly study the impact of bit-level parallelism in processor microarchitecture. The analysis presents multiple design points ranging from a fully bit-serial processor to one with a full word-wide datapath of 32 bits. Nibbler is evaluated and characterized for its area, timing, energy, and throughput performance. These characterizations, in combination with additional sensitivity studies, bring light to the effectiveness of serialization as a design technique. This thesis shows that a subword-serial SIMD processor can show simultaneous improvements in all four metrics when compared to a non-serial, word-wide SIMD processor. The results of this study are further analyzed to discuss the impacts of serialization on processor microarchitecture, identifying opportunities for potential future improvements and highlighting key considerations when designing processors which utilize serialized execution.One core idea presented throughout this thesis is that architectural concepts find strength when grounded in reality. While high-level models prove effective in estimating the limits of a technology or concept, physical prototypes demonstrate the feasibility of ideas and provide a minimum bound on performance. Following this ideology, this thesis contains detailed characterizations of two manycore academic chips: Piton and the DECADES test chip. These data, when used in conjunction with their open-sourced RTL and EDA infrastructure, provide anchoring data points which others can use as reference in future architecture evaluation studies.Piton is a 25-core homogeneous tiled processor taped out in the IBM 32nm SOI process technology. Two characterization studies are performed on Piton. The first breaks down power and energy consumption of the chip on SPECint 2006 benchmarks. The second analysis compares the effectiveness of two parallelization techniques, multicore execution and fine-grained multithreading, to provide insight in which techniques work better when optimizing for power, energy, or area.DECADES is a 108-tile heterogeneous tiled processor taped out in the IBM 12nm process technology. The design efforts behind the development of Nibbler culminate in its contribution to the DECADES chip. DECADES contains 23 instances of a 64-lane 8-bit wide Nibbler processor. This thesis details the considerations which must be made when reconciling a theoretically optimal design point with the feasibility of creating a performant chip. This thesis presents an end-to-end analysis of serialization as a design parameter as well as related parallelization techniques. The analyses use a forward-looking perspective emphasizing feasibility and practicality, resulting in a complete picture ranging from theoretical reasoning to realization in Silicon.
590 $a School code: 0181.
650 4 $a Computer engineering. $3 621879
650 4 $a Electrical engineering. $3 649834
653 $a Bit-level parallelism
653 $a Microarchitecture
653 $a Parallelism
653 $a Computational workloads
690 $a 0464
690 $a 0544
710 2 $a Princeton University. $b Electrical and Computer Engineering. $3 3689367
773 0 $t Dissertations Abstracts International $g 85-12B.
790 $a 0181
791 $a Ph.D.
792 $a 2024
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=31294692