東華大學圖書館 |

Question Answering and Generation from Structured and Unstructured Data.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Question Answering and Generation from Structured and Unstructured Data./
作者:	Chen, Yu.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2020,
面頁冊數:	144 p.
附註:	Source: Dissertations Abstracts International, Volume: 82-05, Section: B.
Contained By:	Dissertations Abstracts International82-05B.
標題:	Computer science. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28023627
ISBN:	9798684681172

Question Answering and Generation from Structured and Unstructured Data.
Chen, Yu.

Question Answering and Generation from Structured and Unstructured Data. - Ann Arbor : ProQuest Dissertations & Theses, 2020 - 144 p.

Source: Dissertations Abstracts International, Volume: 82-05, Section: B.

Thesis (Ph.D.)--Rensselaer Polytechnic Institute, 2020.

This item must not be sold to any third party vendors.

This dissertation focuses on two different but related natural language processing problems, namely, question answering and question generation. Automated question answering (QA) is the process of finding answers to natural language questions. This problem has been studied since the early days of artificial intelligence and recently has drawn increasing attention. Automatic question answering usually relies on some underlying knowledge sources, e.g., a knowledge base (KB), a database or just free-form text from which answers can be gleaned. As such question answering has numerous applications in areas such as natural language interfaces to databases and spoken dialog systems.We identify in particular three main challenges of question answering. The first challenge is the lexical gap between the questions and the underlying knowledge source. Human language is very flexible. The same question can be expressed in various ways while the knowledge source may use a canonical lexicon. It is therefore nontrivial to map a natural language question to the knowledge source. The second challenge is the problem of complex reasoning in question answering. Many realistic questions are complex since they require multi-hop reasoning. In order to answer those complex questions, an agent typically needs to perform a series of discrete symbolic operations such as arithmetic operations, logical operations, quantitative operations and comparative operations. Making machines perform complex reasoning automatically is fundamentally challenging. Even though one can predefine a set of discrete operations an agent can take, the program search space is still very large because of combinatorial explosion. The third challenge is conversational question answering. In real world scenarios, most questions are asked in a conversational manner. It is quite often that a question is asked within a certain conversational context. In other words, in a conversation, sequential questions are asked and a question being asked at a given turn might refer back to some previous questions or answers. Conversational question answering is significantly more challenging than single-turn question answering since the conversation history must be taken into account effectively in order to understand the question correctly.Natural question generation (QG) is the task of generating natural language questions from a given form of data such as text, images, tables and knowledge graphs (KGs). As a dual task of question answering, question generation has many useful applications such as improving the question answering task by providing more training data, generating practice exercises and assessments for educational purposes, and helping dialog systems to kick-start and continue a conversation with human users.We identify in particular three main challenges of question generation. The first challenge is how to effectively model the context information (e.g., text and KGs). It is extremely important for a QG system to well understand the semantic meanings of the context so as to generate high-quality questions. Particularly, it becomes challenging to model long/large context with rich structure information. The second challenge is how to effectively leverage the answer information for question generation. Answer information is crucial for generating relevant and high quality questions because it can serve as a guidance on ``what to ask'' from the given context. However, most existing methods do not fully utilize the answer information when generating questions. The third challenge is how to effectively optimize a sequence learning model. Cross-entropy loss is widely used for training sequence learning neural networks. However, it has been observed that optimizing cross-entropy based training objectives for sequence learning does not always produce the best results on discrete evaluation metrics. Major limitations of this strategy include exposure bias and evaluation discrepancy between training and testing.In this dissertation, we propose novel and effective approaches to address the above challenges of QA and QG tasks. On the QA side, we first propose a modular deep learning approach to automatically answer natural language questions over a large-scale knowledge base. Specifically, we propose to directly model the two-way flow of interactions between the questions and the underlying KB. The proposed model is able to perform multi-hop reasoning in a KB and requires no external resources and very few hand-crafted features. We show that on a popular WebQuestions KBQA benchmark, our model significantly outperforms previous information retrieval based methods while remaining competitive with handcrafted semantic parsing based methods. Then, we present a novel graph neural network (GNN) based model which is able to capture conversational flow in a dialog when executing the task of conversational machine reading comprehension where the knowledge source is free-form text. Based on the proposed Recurrent Graph Neural Network, we introduce a flow mechanism to model the temporal dependencies in a sequence of context graphs which represent the conversational history. On three public benchmarks, the proposed model shows superior performance compared to existing state-of-the-art methods. In addition, visualization experiments show that our model can offer good interpretability for the reasoning process.On the QG side, we first present a novel bidirectional graph-to-sequence model for the task of QG from KGs. Specifically, we propose to apply a bidirectional graph-to-sequence model to encode the KG subgraph. Furthermore, we enhance our Recurrent Neural Network (RNN) based decoder with the novel node-level copying mechanism to allow directly copying node attributes from the KG subgraph to the output question. Both automatic and human evaluation results demonstrate that our model achieves new state-of-the-art scores, outperforming existing methods by a significant margin on two QG benchmarks. Experiments also show that our QG model can consistently benefit the QA task as a mean of data augmentation. Then, we propose a reinforcement learning (RL) based graph-to-sequence model for the task of QG from text. Our model consists of a graph-to-sequence generator with a novel Bidirectional Gated Graph Neural Network based encoder to embed the passage, and a hybrid evaluator with a mixed objective combining both the cross-entropy loss and the RL loss to ensure the generation of syntactically and semantically valid text. We also introduce an effective Deep Alignment Network for incorporating the answer information into the passage at both the word and contextual levels. Our model is end-to-end trainable and achieves new state-of-the-art scores, outperforming existing methods by a significant margin on the standard SQuAD benchmark.

ISBN: 9798684681172Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Deep learning

Question Answering and Generation from Structured and Unstructured Data.
LDR:08295nmm a2200457 4500 001 2275901
005 20210416102207.5
008 220723s2020 ||||||||||||||||| ||eng d
020 $a 9798684681172
035 $a (MiAaPQ)AAI28023627
035 $a AAI28023627
040 $a MiAaPQ $c MiAaPQ
100 1 $a Chen, Yu. $3 1260328
245 1 0 $a Question Answering and Generation from Structured and Unstructured Data.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2020
300 $a 144 p.
500 $a Source: Dissertations Abstracts International, Volume: 82-05, Section: B.
500 $a Advisor: Zaki, Mohammed J.
502 $a Thesis (Ph.D.)--Rensselaer Polytechnic Institute, 2020.
506 $a This item must not be sold to any third party vendors.
520 $a This dissertation focuses on two different but related natural language processing problems, namely, question answering and question generation. Automated question answering (QA) is the process of finding answers to natural language questions. This problem has been studied since the early days of artificial intelligence and recently has drawn increasing attention. Automatic question answering usually relies on some underlying knowledge sources, e.g., a knowledge base (KB), a database or just free-form text from which answers can be gleaned. As such question answering has numerous applications in areas such as natural language interfaces to databases and spoken dialog systems.We identify in particular three main challenges of question answering. The first challenge is the lexical gap between the questions and the underlying knowledge source. Human language is very flexible. The same question can be expressed in various ways while the knowledge source may use a canonical lexicon. It is therefore nontrivial to map a natural language question to the knowledge source. The second challenge is the problem of complex reasoning in question answering. Many realistic questions are complex since they require multi-hop reasoning. In order to answer those complex questions, an agent typically needs to perform a series of discrete symbolic operations such as arithmetic operations, logical operations, quantitative operations and comparative operations. Making machines perform complex reasoning automatically is fundamentally challenging. Even though one can predefine a set of discrete operations an agent can take, the program search space is still very large because of combinatorial explosion. The third challenge is conversational question answering. In real world scenarios, most questions are asked in a conversational manner. It is quite often that a question is asked within a certain conversational context. In other words, in a conversation, sequential questions are asked and a question being asked at a given turn might refer back to some previous questions or answers. Conversational question answering is significantly more challenging than single-turn question answering since the conversation history must be taken into account effectively in order to understand the question correctly.Natural question generation (QG) is the task of generating natural language questions from a given form of data such as text, images, tables and knowledge graphs (KGs). As a dual task of question answering, question generation has many useful applications such as improving the question answering task by providing more training data, generating practice exercises and assessments for educational purposes, and helping dialog systems to kick-start and continue a conversation with human users.We identify in particular three main challenges of question generation. The first challenge is how to effectively model the context information (e.g., text and KGs). It is extremely important for a QG system to well understand the semantic meanings of the context so as to generate high-quality questions. Particularly, it becomes challenging to model long/large context with rich structure information. The second challenge is how to effectively leverage the answer information for question generation. Answer information is crucial for generating relevant and high quality questions because it can serve as a guidance on ``what to ask'' from the given context. However, most existing methods do not fully utilize the answer information when generating questions. The third challenge is how to effectively optimize a sequence learning model. Cross-entropy loss is widely used for training sequence learning neural networks. However, it has been observed that optimizing cross-entropy based training objectives for sequence learning does not always produce the best results on discrete evaluation metrics. Major limitations of this strategy include exposure bias and evaluation discrepancy between training and testing.In this dissertation, we propose novel and effective approaches to address the above challenges of QA and QG tasks. On the QA side, we first propose a modular deep learning approach to automatically answer natural language questions over a large-scale knowledge base. Specifically, we propose to directly model the two-way flow of interactions between the questions and the underlying KB. The proposed model is able to perform multi-hop reasoning in a KB and requires no external resources and very few hand-crafted features. We show that on a popular WebQuestions KBQA benchmark, our model significantly outperforms previous information retrieval based methods while remaining competitive with handcrafted semantic parsing based methods. Then, we present a novel graph neural network (GNN) based model which is able to capture conversational flow in a dialog when executing the task of conversational machine reading comprehension where the knowledge source is free-form text. Based on the proposed Recurrent Graph Neural Network, we introduce a flow mechanism to model the temporal dependencies in a sequence of context graphs which represent the conversational history. On three public benchmarks, the proposed model shows superior performance compared to existing state-of-the-art methods. In addition, visualization experiments show that our model can offer good interpretability for the reasoning process.On the QG side, we first present a novel bidirectional graph-to-sequence model for the task of QG from KGs. Specifically, we propose to apply a bidirectional graph-to-sequence model to encode the KG subgraph. Furthermore, we enhance our Recurrent Neural Network (RNN) based decoder with the novel node-level copying mechanism to allow directly copying node attributes from the KG subgraph to the output question. Both automatic and human evaluation results demonstrate that our model achieves new state-of-the-art scores, outperforming existing methods by a significant margin on two QG benchmarks. Experiments also show that our QG model can consistently benefit the QA task as a mean of data augmentation. Then, we propose a reinforcement learning (RL) based graph-to-sequence model for the task of QG from text. Our model consists of a graph-to-sequence generator with a novel Bidirectional Gated Graph Neural Network based encoder to embed the passage, and a hybrid evaluator with a mixed objective combining both the cross-entropy loss and the RL loss to ensure the generation of syntactically and semantically valid text. We also introduce an effective Deep Alignment Network for incorporating the answer information into the passage at both the word and contextual levels. Our model is end-to-end trainable and achieves new state-of-the-art scores, outperforming existing methods by a significant margin on the standard SQuAD benchmark.
590 $a School code: 0185.
650 4 $a Computer science. $3 523869
650 4 $a Artificial intelligence. $3 516317
650 4 $a Language. $3 643551
650 4 $a Educational technology. $3 517670
650 4 $a Technical communication. $3 3172863
653 $a Deep learning
653 $a Machine learning
653 $a Natural language generation
653 $a Automatic question answering
653 $a Question generation
653 $a Language processing
653 $a Canonical lexicon
653 $a Knowledge base
653 $a Recurrent Neural Network
653 $a Reinforcement learning
690 $a 0984
690 $a 0800
690 $a 0643
690 $a 0679
690 $a 0710
710 2 $a Rensselaer Polytechnic Institute. $b Computer Science. $3 2094828
773 0 $t Dissertations Abstracts International $g 82-05B.
790 $a 0185
791 $a Ph.D.
792 $a 2020
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28023627