東華大學圖書館 |

Software-Defined Hardware Without Sacrificing Performance.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Software-Defined Hardware Without Sacrificing Performance./
作者:	Feldman, Matthew.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	108 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-05, Section: B.
Contained By:	Dissertations Abstracts International83-05B.
標題:	Debugging. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28827964
ISBN:	9798494452863

Software-Defined Hardware Without Sacrificing Performance.
Feldman, Matthew.

Software-Defined Hardware Without Sacrificing Performance. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 108 p.

Source: Dissertations Abstracts International, Volume: 83-05, Section: B.

Thesis (Ph.D.)--Stanford University, 2021.

This item must not be sold to any third party vendors.

In recent years, field programmable gate arrays (FPGAs) have emerged as popular accelerators for certain kinds of compute workloads. FPGAs allow the programmer to define and implement digital circuits that are specialized for arbitrary computation. They exhibit performance-per-watt advantages over traditional instruction-based architectures, like CPUs and GPUs, and are more flexible than customized application-specific integrated circuits (ASICs). The performance efficiency is because programming specialized digital circuits into the FPGA removes the overhead that instructionbased architectures inherently have in fetching and decoding instructions. Furthermore, computation can be arbitrarily parallelized and pipelined to improve the latency and throughput of an application.While FPGAs have been adopted in a variety of places, including Microsoft's datacenters and Amazon's cloud compute services, their widespread adoption is hindered by how difficult they are to program. Traditionally, the best languages for programming FPGAs rely on the programmer to describe their circuit using register-transfer languages (RTLs), such as Verilog and VHDL. This level of abstraction is very cumbersome and makes it difficult for the programmer to quickly and easily design their application. It typically takes many years of experience in using RTLs to become proficient at designing good applications. On the other hand, today's popular applications and domains, such as machine learning and graph processing, evolve very quickly and require a deep level of domain-specific knowledge. It is usually rare for a programmer who is an expert in a field, such as machine learning, to also be an expert in writing RTL. This mismatch is the motivation for the work in this thesis. The work presented in this thesis envisions a new class of performanceoriented programmers who wish to design highly efficient applications on FPGAs without requiring intimate knowledge of RTLs.There are a variety of new high level languages aimed at bridging this gap between performance and abstraction. One approach is high level synthesis (HLS) tools, and allow the programmer to write code in languages like C with pragmas that help describe how it should map to hardware. While convenient, these languages typically obscure the relationship between the code, hardware implementation, and performance. Another approach is domain-specific languages (DSLs) that allow the programmer to employ a variety primitives relevant to a particular domain, such as image processing, or computational style, such as streaming applications. While these languages can generate highly efficient hardware designs, their API can be too restrictive for programmers who cannot formulate their whole application within the domain.In this work, we introduce tools and optimizations that were built on top of Spatial, a high level language for programming FPGAs and other reconfigurable dataflow architectures (RDUs), such as Plasticine. Spatial was designed to be used by performance-oriented programmers and can support a wide variety of domains. First, we describe a new performance debugging tool that automatically generates performance and resource utilization reports for arbitrary Spatial applications. We outline a set of steps that the programmer can use to optimize their application based on the results of this profiling tool. Because optimizing hardware designs is fundamentally different from optimizing software designs in various ways, this tool is the key to understanding what parts of the source code are causing bottlenecks. We show how this tool can help the programmer achieve up to 22x better performance while improving resource utilization by a factor up to 2.8x on a variety of standard benchmarks.Then, we introduce a memory partitioning tool that is capable of quickly solving for highly efficient schemes. The tool uses heuristics and mathematical transformations to generate schemes that can be implemented on FPGA with low resource utilization and latency. We show that this tool can be used to automatically improve LUT utilization by up to 86%, BRAM utilization by up to 38%, and almost always eliminate all DSPs from the memory partitioning logic on a variety of benchmarks, as compared to other state-of-the-art tools.Next, we show how these new components in Spatial can be used to design a data compression kernel that can be used to solve a real-world data compression problem. We explored a variety of different classes of machine learning kernels and swept the parameter space of each one to characterize each one on ML accuracy, latency, and resource utilization. This work highlights how different classes of kernels can map much more efficiently to FPGAs while still achieving low latency and high ML accuracy.Finally, we describe a series of enhancements added to the Spatial language and compiler that enable new kinds of applications and provide the possibility of using Spatial in more diverse environments. This includes new FPGA backends and compilation modes that allow Spatial to be used as an IP generator, as well as a shell for IPs generated from other frameworks. We also discuss new syntax and primitives that were added to the language to help support increasingly dynamic and complex applications.

ISBN: 9798494452863Subjects--Topical Terms:

3689317
Debugging.

Software-Defined Hardware Without Sacrificing Performance.
LDR:06296nmm a2200301 4500 001 2349891
005 20221010063649.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798494452863
035 $a (MiAaPQ)AAI28827964
035 $a (MiAaPQ)STANFORDwd143cm4382
035 $a AAI28827964
040 $a MiAaPQ $c MiAaPQ
100 1 $a Feldman, Matthew. $3 969728
245 1 0 $a Software-Defined Hardware Without Sacrificing Performance.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 108 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-05, Section: B.
500 $a Advisor: Olukotun, Oyekunle.
502 $a Thesis (Ph.D.)--Stanford University, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a In recent years, field programmable gate arrays (FPGAs) have emerged as popular accelerators for certain kinds of compute workloads. FPGAs allow the programmer to define and implement digital circuits that are specialized for arbitrary computation. They exhibit performance-per-watt advantages over traditional instruction-based architectures, like CPUs and GPUs, and are more flexible than customized application-specific integrated circuits (ASICs). The performance efficiency is because programming specialized digital circuits into the FPGA removes the overhead that instructionbased architectures inherently have in fetching and decoding instructions. Furthermore, computation can be arbitrarily parallelized and pipelined to improve the latency and throughput of an application.While FPGAs have been adopted in a variety of places, including Microsoft's datacenters and Amazon's cloud compute services, their widespread adoption is hindered by how difficult they are to program. Traditionally, the best languages for programming FPGAs rely on the programmer to describe their circuit using register-transfer languages (RTLs), such as Verilog and VHDL. This level of abstraction is very cumbersome and makes it difficult for the programmer to quickly and easily design their application. It typically takes many years of experience in using RTLs to become proficient at designing good applications. On the other hand, today's popular applications and domains, such as machine learning and graph processing, evolve very quickly and require a deep level of domain-specific knowledge. It is usually rare for a programmer who is an expert in a field, such as machine learning, to also be an expert in writing RTL. This mismatch is the motivation for the work in this thesis. The work presented in this thesis envisions a new class of performanceoriented programmers who wish to design highly efficient applications on FPGAs without requiring intimate knowledge of RTLs.There are a variety of new high level languages aimed at bridging this gap between performance and abstraction. One approach is high level synthesis (HLS) tools, and allow the programmer to write code in languages like C with pragmas that help describe how it should map to hardware. While convenient, these languages typically obscure the relationship between the code, hardware implementation, and performance. Another approach is domain-specific languages (DSLs) that allow the programmer to employ a variety primitives relevant to a particular domain, such as image processing, or computational style, such as streaming applications. While these languages can generate highly efficient hardware designs, their API can be too restrictive for programmers who cannot formulate their whole application within the domain.In this work, we introduce tools and optimizations that were built on top of Spatial, a high level language for programming FPGAs and other reconfigurable dataflow architectures (RDUs), such as Plasticine. Spatial was designed to be used by performance-oriented programmers and can support a wide variety of domains. First, we describe a new performance debugging tool that automatically generates performance and resource utilization reports for arbitrary Spatial applications. We outline a set of steps that the programmer can use to optimize their application based on the results of this profiling tool. Because optimizing hardware designs is fundamentally different from optimizing software designs in various ways, this tool is the key to understanding what parts of the source code are causing bottlenecks. We show how this tool can help the programmer achieve up to 22x better performance while improving resource utilization by a factor up to 2.8x on a variety of standard benchmarks.Then, we introduce a memory partitioning tool that is capable of quickly solving for highly efficient schemes. The tool uses heuristics and mathematical transformations to generate schemes that can be implemented on FPGA with low resource utilization and latency. We show that this tool can be used to automatically improve LUT utilization by up to 86%, BRAM utilization by up to 38%, and almost always eliminate all DSPs from the memory partitioning logic on a variety of benchmarks, as compared to other state-of-the-art tools.Next, we show how these new components in Spatial can be used to design a data compression kernel that can be used to solve a real-world data compression problem. We explored a variety of different classes of machine learning kernels and swept the parameter space of each one to characterize each one on ML accuracy, latency, and resource utilization. This work highlights how different classes of kernels can map much more efficiently to FPGAs while still achieving low latency and high ML accuracy.Finally, we describe a series of enhancements added to the Spatial language and compiler that enable new kinds of applications and provide the possibility of using Spatial in more diverse environments. This includes new FPGA backends and compilation modes that allow Spatial to be used as an IP generator, as well as a shell for IPs generated from other frameworks. We also discuss new syntax and primitives that were added to the language to help support increasingly dynamic and complex applications.
590 $a School code: 0212.
650 4 $a Debugging. $3 3689317
650 4 $a Optimization techniques. $3 3681622
650 4 $a Cloning. $3 571606
650 4 $a Artificial intelligence. $3 516317
690 $a 0800
710 2 $a Stanford University. $3 754827
773 0 $t Dissertations Abstracts International $g 83-05B.
790 $a 0212
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28827964