In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Related resources

University researcher(s)

    Academic department(s)

      Exploiting Unlabelled Data for Relation Extraction

      Tran, Thy Thy

      [Thesis]. Manchester, UK: The University of Manchester; 2020.

      Access to files

      Abstract

      Information extraction transforms unstructured text to structured by annotating semantic information on raw data. A crucial step in information extraction is relation extraction, which identifies semantic relationships between named entities in text. The resulting relations can be used to construct and populate knowledge bases as well as used in various applications such as information retrieval and question answering. Relation extraction has been widely studied using fully supervised learning and distantly supervised approaches, these approaches require either manually- or automatically-annotated data. In contrast, a massive amount of unlabelled texts freely available are underused. We hence focus on leveraging the unlabelled data to improve and extend relation extraction. We approach the use of unlabelled text from three directions: (i) use it for pre-training word representations, (ii) conduct unsupervised learning, and (iii) perform weak supervision. Regarding the first direction, we want to leverage syntactic information for relation extraction. Instead of directly tuning such information on a relation extraction corpus, we propose a novel graph neural model for learning syntactically-informed word representations. The proposed method allows us to enrich pretrained word representations with syntactic information rather than re-training language models from scratch as previous work. Throughout this work, we can confirm that our novel representations are beneficial for relations in two different domains. In the second direction, we study unsupervised relation extraction, which is a promising approach because it does not require manually- or automatically-labelled data. We hypothesise that inductive biases are extremely important to direct unsupervised relation extraction. We hence employ two simple methods using only entity types to infer relations. Despite their simplicity, our methods can outperform existing approaches on two popular datasets. These surprising results suggest that entity types provide a strong inductive bias for unsupervised relation extraction. The last direction is inspired by recent evidence that large-scale pretrained language models capture some sort of relational facts. We want to investigate whether these pretrained language models can serve as weak annotators. To this end, we evaluate three large pretrained language models by matching sentences against relations’ exemplars. The matching scores decide how likely a given sentence expresses a relation. The top relations are further used as weak annotations to train a relation classifier. We observe that pretrained language models are confused by highly similar relations, thus, we propose a method that models the labelling confusion to correct relation prediction. We validate the proposed method on two datasets with different characteristics, showing that it can effectively model labelling noise from our weak annotator. Overall, we illustrate that exploring the use of unlabelled data is an important step towards improving relation extraction. The use of unlabelled data is a promising path for relation extraction and should receive more attention from researchers.

      Bibliographic metadata

      Type of resource:
      Content type:
      Form of thesis:
      Type of submission:
      Degree type:
      Doctor of Philosophy
      Degree programme:
      PhD Computer Science
      Publication date:
      Location:
      Manchester, UK
      Total pages:
      201
      Abstract:
      Information extraction transforms unstructured text to structured by annotating semantic information on raw data. A crucial step in information extraction is relation extraction, which identifies semantic relationships between named entities in text. The resulting relations can be used to construct and populate knowledge bases as well as used in various applications such as information retrieval and question answering. Relation extraction has been widely studied using fully supervised learning and distantly supervised approaches, these approaches require either manually- or automatically-annotated data. In contrast, a massive amount of unlabelled texts freely available are underused. We hence focus on leveraging the unlabelled data to improve and extend relation extraction. We approach the use of unlabelled text from three directions: (i) use it for pre-training word representations, (ii) conduct unsupervised learning, and (iii) perform weak supervision. Regarding the first direction, we want to leverage syntactic information for relation extraction. Instead of directly tuning such information on a relation extraction corpus, we propose a novel graph neural model for learning syntactically-informed word representations. The proposed method allows us to enrich pretrained word representations with syntactic information rather than re-training language models from scratch as previous work. Throughout this work, we can confirm that our novel representations are beneficial for relations in two different domains. In the second direction, we study unsupervised relation extraction, which is a promising approach because it does not require manually- or automatically-labelled data. We hypothesise that inductive biases are extremely important to direct unsupervised relation extraction. We hence employ two simple methods using only entity types to infer relations. Despite their simplicity, our methods can outperform existing approaches on two popular datasets. These surprising results suggest that entity types provide a strong inductive bias for unsupervised relation extraction. The last direction is inspired by recent evidence that large-scale pretrained language models capture some sort of relational facts. We want to investigate whether these pretrained language models can serve as weak annotators. To this end, we evaluate three large pretrained language models by matching sentences against relations’ exemplars. The matching scores decide how likely a given sentence expresses a relation. The top relations are further used as weak annotations to train a relation classifier. We observe that pretrained language models are confused by highly similar relations, thus, we propose a method that models the labelling confusion to correct relation prediction. We validate the proposed method on two datasets with different characteristics, showing that it can effectively model labelling noise from our weak annotator. Overall, we illustrate that exploring the use of unlabelled data is an important step towards improving relation extraction. The use of unlabelled data is a promising path for relation extraction and should receive more attention from researchers.
      Thesis main supervisor(s):
      Thesis co-supervisor(s):
      Language:
      en

      Institutional metadata

      University researcher(s):
      Academic department(s):

        Record metadata

        Manchester eScholar ID:
        uk-ac-man-scw:327033
        Created by:
        Tran, Thy
        Created:
        16th December, 2020, 11:56:28
        Last modified by:
        Tran, Thy
        Last modified:
        4th January, 2021, 11:27:46

        Can we help?

        The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.