東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

FindBook

Google Book

Amazon

博客來

Efficient Lossless Compression in and Beyond Columnar Databases.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Efficient Lossless Compression in and Beyond Columnar Databases./
作者:	Jiang, Hao.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	167 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
Contained By:	Dissertations Abstracts International83-06B.
標題:	Computer science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28648314
ISBN:	9798759973560

Efficient Lossless Compression in and Beyond Columnar Databases.
Jiang, Hao.

Efficient Lossless Compression in and Beyond Columnar Databases. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 167 p.

Source: Dissertations Abstracts International, Volume: 83-06, Section: B.

Thesis (Ph.D.)--The University of Chicago, 2021.

This item must not be sold to any third party vendors.

Columnar databases have dominated the data analysis market for their superior performance in query processing with Big data. However, the extensive data size also brings challenges to data storage and transfer. While people often rely on lossless compression techniques to reduce storage size, database researchers overlook compression in row-wise databases. There are two primary reasons. First, available compression algorithms in row-wise databases are limited. Row-wise databases blend data fields of all types together. Byte-oriented compression algorithms such as Gzip and Snappy are the only choices. Second, Gzip-like algorithms process data in blocks and decompress an entire block before accessing a data row. The decompression is CPU intensive and has a significant impact to query performance. Lack of alternatives and implications to performance impede the applications of compression in row-wise databases. The prosperity of columnar databases changes this situation. Storing data in separated columns enable the application of compression algorithms designed for a single data type. There are also algorithms performing record-level compression, allowing the queries to skip irrelevant records and executes more efficiently. Compression in columnar databases thus reduces data storage and brings the opportunities of improving query efficiency. Besides relational databases, we also explore the benefit of compression in key-value stores. Key-value stores have wide applications, including game, IoT, Social Media, Mobile Devices, and Enterprise Applications. They could provide far better performance than the relational database in specific scenarios. This thesis proposes innovative compression algorithms and system designs to improve the storage and query efficiency in columnar databases and key-value stores. We address three challenges of lossless compression in columnar databases: better encoding algorithms, faster query on encoded data, and selecting proper encoding algorithms for data columns. We present PIDS, a novel compression approach for string columns; SBoost, a C++ library for fast queries on encoded data; and CodecDB, an encoding-aware database with a data-driven encoding selection. We explore the possibility of using compression to accelerate LSM-tree and present CoLoM. This key-value store utilizes a columnar layout and lightweight encoding to improve LSM-tree efficiency. We show that these innovations allow columnar databases and key-value stores to excel the competitors in storage efficiency and query speed.

ISBN: 9798759973560Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Columnar Database

Efficient Lossless Compression in and Beyond Columnar Databases.
LDR:03639nmm a2200349 4500 001 2344656
005 20220531064618.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798759973560
035 $a (MiAaPQ)AAI28648314
035 $a AAI28648314
040 $a MiAaPQ $c MiAaPQ
100 1 $a Jiang, Hao. $3 2095841
245 1 0 $a Efficient Lossless Compression in and Beyond Columnar Databases.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 167 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
500 $a Advisor: Elmore, Aaron J.
502 $a Thesis (Ph.D.)--The University of Chicago, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a Columnar databases have dominated the data analysis market for their superior performance in query processing with Big data. However, the extensive data size also brings challenges to data storage and transfer. While people often rely on lossless compression techniques to reduce storage size, database researchers overlook compression in row-wise databases. There are two primary reasons. First, available compression algorithms in row-wise databases are limited. Row-wise databases blend data fields of all types together. Byte-oriented compression algorithms such as Gzip and Snappy are the only choices. Second, Gzip-like algorithms process data in blocks and decompress an entire block before accessing a data row. The decompression is CPU intensive and has a significant impact to query performance. Lack of alternatives and implications to performance impede the applications of compression in row-wise databases. The prosperity of columnar databases changes this situation. Storing data in separated columns enable the application of compression algorithms designed for a single data type. There are also algorithms performing record-level compression, allowing the queries to skip irrelevant records and executes more efficiently. Compression in columnar databases thus reduces data storage and brings the opportunities of improving query efficiency. Besides relational databases, we also explore the benefit of compression in key-value stores. Key-value stores have wide applications, including game, IoT, Social Media, Mobile Devices, and Enterprise Applications. They could provide far better performance than the relational database in specific scenarios. This thesis proposes innovative compression algorithms and system designs to improve the storage and query efficiency in columnar databases and key-value stores. We address three challenges of lossless compression in columnar databases: better encoding algorithms, faster query on encoded data, and selecting proper encoding algorithms for data columns. We present PIDS, a novel compression approach for string columns; SBoost, a C++ library for fast queries on encoded data; and CodecDB, an encoding-aware database with a data-driven encoding selection. We explore the possibility of using compression to accelerate LSM-tree and present CoLoM. This key-value store utilizes a columnar layout and lightweight encoding to improve LSM-tree efficiency. We show that these innovations allow columnar databases and key-value stores to excel the competitors in storage efficiency and query speed.
590 $a School code: 0330.
650 4 $a Computer science. $3 523869
650 4 $a Artificial intelligence. $3 516317
653 $a Columnar Database
653 $a Compression
653 $a Database
653 $a LSM-tree
690 $a 0984
690 $a 0800
710 2 $a The University of Chicago. $b Computer Science. $3 1674744
773 0 $t Dissertations Abstracts International $g 83-06B.
790 $a 0330
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28648314