語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Design and Data Mining Techniques fo...
~
Rohatgi, Shaurya.
FindBook
Google Book
Amazon
博客來
Design and Data Mining Techniques for Large-Scale Scholarly Digital Libraries and Search Engines.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Design and Data Mining Techniques for Large-Scale Scholarly Digital Libraries and Search Engines./
作者:
Rohatgi, Shaurya.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2023,
面頁冊數:
144 p.
附註:
Source: Dissertations Abstracts International, Volume: 85-05, Section: B.
Contained By:
Dissertations Abstracts International85-05B.
標題:
Internships. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30720673
ISBN:
9798380735773
Design and Data Mining Techniques for Large-Scale Scholarly Digital Libraries and Search Engines.
Rohatgi, Shaurya.
Design and Data Mining Techniques for Large-Scale Scholarly Digital Libraries and Search Engines.
- Ann Arbor : ProQuest Dissertations & Theses, 2023 - 144 p.
Source: Dissertations Abstracts International, Volume: 85-05, Section: B.
Thesis (Ph.D.)--The Pennsylvania State University, 2023.
This item must not be sold to any third party vendors.
The exponential growth of digital libraries and the proliferation of scholarly content in electronic formats have made data mining and information retrieval essential tools for effectively managing, organizing, and disseminating knowledge. This thesis provides a comprehensive analysis of the advancements and challenges in these fields, with a focus on mathematical information retrieval from scholarly documents, figure captioning and classification of scientific images, and searching and re-ranking techniques for largescale scholarly documents. We also explore the future of scholarly search, considering the potential roles of generative artificial intelligence and scientific question-answering systems in these domains. In the initial section of this thesis, we delve into the complex design and implementation aspects involved in building a large-scale digital library. We discuss various challenges and critical design decisions that were made to ensure the library's long-term sustainability and ease of maintenance. Furthermore, we provide an in-depth analysis of the unique and robust components of CiteSeerX, including its advanced crawling, extraction, ingestion, and production-ready capabilities. Our focus then shifts to the investigation of search and re-ranking techniques specifically tailored for large-scale scholarly documents. We delve into various approaches for indexing, searching, and ranking vast collections of scientific literature, proposing inventive methods for optimizing their performance and scalability. After successfully implementing our system and achieving an impressive index of over 15 million academic papers, we explore the numerous potential applications and opportunities that can arise from this extensive collection of scholarly articles. Following this, we conduct a thorough examination of cutting-edge mathematical information retrieval techniques for extracting and processing mathematical expressions from scholarly documents. We present an exhaustive review of existing approaches, shedding light on their strengths and weaknesses, and propose innovative methods that significantly enhance the accuracy and efficiency of mathematical information retrieval systems. Subsequently, we discuss a subset of CiteSeerX data that is focused on Computational Linguistics (CL) - The ACL Anthology Corpus. We provide the metadata, full-text, and citation graph for the CL domain. This dataset is then analyzed for deeper insights into the evolving direction of the field and the potential applications that can be developed from it. One such application is addressing the challenges of figure captioning and classification of scientific images. We analyze state-of-the-art methods for extracting and processing image data from scholarly documents and propose a groundbreaking approach that effectively combines advanced image processing techniques with cutting-edge machine learning algorithms for highly accurate and reliable figure captioning and classification. Lastly, we discuss the future of scholarly search and the role of generative AI in scientific question answering. We envision a question answering system, which looks at the relevant literature and formulates an answer for the researcher's information need. To this end, we investigate the potential of large language models and search for enabling such capabilities and outline the challenges and opportunities that lie ahead in this exciting domain.
ISBN: 9798380735773Subjects--Topical Terms:
3560137
Internships.
Subjects--Index Terms:
Electronic formats
Design and Data Mining Techniques for Large-Scale Scholarly Digital Libraries and Search Engines.
LDR
:04726nmm a2200385 4500
001
2394773
005
20240429063853.5
006
m o d
007
cr#unu||||||||
008
251215s2023 ||||||||||||||||| ||eng d
020
$a
9798380735773
035
$a
(MiAaPQ)AAI30720673
035
$a
(MiAaPQ)PennState_22325szr207
035
$a
AAI30720673
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Rohatgi, Shaurya.
$3
3764263
245
1 0
$a
Design and Data Mining Techniques for Large-Scale Scholarly Digital Libraries and Search Engines.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2023
300
$a
144 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-05, Section: B.
500
$a
Advisor: Giles, C. Lee.
502
$a
Thesis (Ph.D.)--The Pennsylvania State University, 2023.
506
$a
This item must not be sold to any third party vendors.
520
$a
The exponential growth of digital libraries and the proliferation of scholarly content in electronic formats have made data mining and information retrieval essential tools for effectively managing, organizing, and disseminating knowledge. This thesis provides a comprehensive analysis of the advancements and challenges in these fields, with a focus on mathematical information retrieval from scholarly documents, figure captioning and classification of scientific images, and searching and re-ranking techniques for largescale scholarly documents. We also explore the future of scholarly search, considering the potential roles of generative artificial intelligence and scientific question-answering systems in these domains. In the initial section of this thesis, we delve into the complex design and implementation aspects involved in building a large-scale digital library. We discuss various challenges and critical design decisions that were made to ensure the library's long-term sustainability and ease of maintenance. Furthermore, we provide an in-depth analysis of the unique and robust components of CiteSeerX, including its advanced crawling, extraction, ingestion, and production-ready capabilities. Our focus then shifts to the investigation of search and re-ranking techniques specifically tailored for large-scale scholarly documents. We delve into various approaches for indexing, searching, and ranking vast collections of scientific literature, proposing inventive methods for optimizing their performance and scalability. After successfully implementing our system and achieving an impressive index of over 15 million academic papers, we explore the numerous potential applications and opportunities that can arise from this extensive collection of scholarly articles. Following this, we conduct a thorough examination of cutting-edge mathematical information retrieval techniques for extracting and processing mathematical expressions from scholarly documents. We present an exhaustive review of existing approaches, shedding light on their strengths and weaknesses, and propose innovative methods that significantly enhance the accuracy and efficiency of mathematical information retrieval systems. Subsequently, we discuss a subset of CiteSeerX data that is focused on Computational Linguistics (CL) - The ACL Anthology Corpus. We provide the metadata, full-text, and citation graph for the CL domain. This dataset is then analyzed for deeper insights into the evolving direction of the field and the potential applications that can be developed from it. One such application is addressing the challenges of figure captioning and classification of scientific images. We analyze state-of-the-art methods for extracting and processing image data from scholarly documents and propose a groundbreaking approach that effectively combines advanced image processing techniques with cutting-edge machine learning algorithms for highly accurate and reliable figure captioning and classification. Lastly, we discuss the future of scholarly search and the role of generative AI in scientific question answering. We envision a question answering system, which looks at the relevant literature and formulates an answer for the researcher's information need. To this end, we investigate the potential of large language models and search for enabling such capabilities and outline the challenges and opportunities that lie ahead in this exciting domain.
590
$a
School code: 0176.
650
4
$a
Internships.
$3
3560137
650
4
$a
Digital libraries.
$3
567130
650
4
$a
Search engines.
$3
869493
650
4
$a
Anthologies.
$3
952710
650
4
$a
Information retrieval.
$3
566853
650
4
$a
Linux.
$3
738943
650
4
$a
Information technology.
$3
532993
650
4
$a
Information science.
$3
554358
653
$a
Electronic formats
653
$a
Digital libraries
653
$a
Computational Linguistics
653
$a
Mathematical information
690
$a
0489
690
$a
0723
710
2
$a
The Pennsylvania State University.
$3
699896
773
0
$t
Dissertations Abstracts International
$g
85-05B.
790
$a
0176
791
$a
Ph.D.
792
$a
2023
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30720673
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9503093
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入