In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

USING DATA-DRIVEN RESOURCES FOR OPTIMISING RULE-BASED SYNTACTIC ANALYSIS FOR MODERN STANDARD ARABIC

Elbey, Mohamed Nassim

[Thesis]. Manchester, UK: The University of Manchester; 2014.

Access to files

Abstract

This thesis is about optimising a rule based parser for Modern Standard Arabic (MSA).If ambiguity is a major problem in NLP systems. It is even worse in a language MSAdue to the fact that written MSA omits short vowels and for other reasons that will bediscussed in Chapter 1.By analysing the original rule based parser, it turned out that many parses were unnecessarydue to many edges being produced and not used in the final analysis. The first part of this thesis is to investigate whether integrating a Part Of Speech (POS) tagger will help speeding up the parsing, or not. This is a well-known technique for Romance and Germanic languages, but its effectiveness has not been widely explored for MSA.The second part of the thesis is to use statistics and machine learning techniques andinvestigate its effects on the parser. This thesis is not about the accuracy of the parser. Itis about finding ways to improve the speed. A new approach will be discussed, whichwas not explored in statistical parsing before. This approach is collecting statisticswhile parsing, and using these to learn strategies to be used during the parsing process.The learning process involves all the moves of the parsing (moves that lead to the finalanalysis, i.e good moves and moves that lead away from it, i.e bad moves). The ideahere is, not only we are learning from positive data, but also from negative data. Thequestions to be asked:• Why is this move good so that we can encourage it.• Why is this move bad so that we discourage it.In the final part of the thesis, both techniques were merged together: integrating a POStagger and using the learning approach, and finding out the effect of this on the parser.

Keyword(s)

ARABIC; NLP

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science
Publication date:
Location:
Manchester, UK
Total pages:
164
Abstract:
This thesis is about optimising a rule based parser for Modern Standard Arabic (MSA).If ambiguity is a major problem in NLP systems. It is even worse in a language MSAdue to the fact that written MSA omits short vowels and for other reasons that will bediscussed in Chapter 1.By analysing the original rule based parser, it turned out that many parses were unnecessarydue to many edges being produced and not used in the final analysis. The first part of this thesis is to investigate whether integrating a Part Of Speech (POS) tagger will help speeding up the parsing, or not. This is a well-known technique for Romance and Germanic languages, but its effectiveness has not been widely explored for MSA.The second part of the thesis is to use statistics and machine learning techniques andinvestigate its effects on the parser. This thesis is not about the accuracy of the parser. Itis about finding ways to improve the speed. A new approach will be discussed, whichwas not explored in statistical parsing before. This approach is collecting statisticswhile parsing, and using these to learn strategies to be used during the parsing process.The learning process involves all the moves of the parsing (moves that lead to the finalanalysis, i.e good moves and moves that lead away from it, i.e bad moves). The ideahere is, not only we are learning from positive data, but also from negative data. Thequestions to be asked:• Why is this move good so that we can encourage it.• Why is this move bad so that we discourage it.In the final part of the thesis, both techniques were merged together: integrating a POStagger and using the learning approach, and finding out the effect of this on the parser.
Keyword(s):
Thesis advisor(s):
Language:
en

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:229824
Created by:
Elbey, Mohamed
Created:
24th July, 2014, 13:19:00
Last modified by:
Elbey, Mohamed
Last modified:
1st December, 2017, 09:07:25

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.