TOWARDS AUTOMATIC URDU TEXT SUMMARIZATION By ADEEL IFTIKHARRegd

TOWARDS AUTOMATIC URDU TEXT
SUMMARIZATION

By
ADEEL IFTIKHARRegd. No. 2011-UMDB-11127Session : 2015-2017
Department of Computer Sciences & Information Technology
Faculty of Sciences & Engineering
University of Azad Jammu and Kashmir, Muzaffarabad.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

TOWARDS AUTOMATIC URDU TEXT
SUMMARIZATION
By
ADEEL IFTIKHARRegd. No. 2011-UMDB-11127A Thesis
Submitted in partial fulfillment of the requirement for the degree of
Master of Philosophy
In
Computer Sciences
Session : 2015-2017
Department of Computer Sciences & Information Technology
Faculty of Sciences & Engineering
University of Azad Jammu and Kashmir, Muzaffarabad.

CERTIFICATION
Certified that the contents and form of thesis entitled “TOWARDS AUTOMATIC URDU TEXT SUMMARIZATION ” submitted by Adeel Iftikhar ( Registration No. 2011-UMDB-11127) have been satisfactory for the requirement of the degree.

SUPERVISORY COMMITTEE
Supervisor: Dr. Abdul Majid
Assistant Professor
Department of Computer Sciences & Information Technology, University of AJK&K Muzaffarabad.

______________
Member: Name goes here
Designation
Address
______________
Member: Name goes here
Designation
Address
______________
External Examiner: Name goes here
Designation
Address
______________
___________________
Coordinator
Department of Computer Sciences & Information Technology_______________
Dean
Faculty of Sciences & Engineering
________________
Director
Advanced Studies & Research

DEDICATION
To my beloved parents, who always picked me up on time, encouraged me to go on every adventure, especially this one and whose prayers always pave the way to success for me.

CONTENTS
TOC o “2-9” h z “Heading 1,1,AJKU_Heading_Front_Back_Matter,9” LIST OF TABLES PAGEREF _Toc525684897 h viLIST OF FIGURES PAGEREF _Toc525684898 h viiNOTATIONS / ABBREVIATIONS / NOMENCLATURE / ACRONYMS PAGEREF _Toc525684899 h viiiACKNOWLEDGEMENTS PAGEREF _Toc525684900 h ixABSTRACT PAGEREF _Toc525684901 h xPUBLICATIONS PAGEREF _Toc525684902 h xi INTRODUCTION PAGEREF _Toc525684903 h 11.1. BACKGROUND PAGEREF _Toc525684904 h 11.1.1. Natural language Processing PAGEREF _Toc525684905 h 11.1.2. An Overview of NLP for Urdu Language PAGEREF _Toc525684906 h 11.1.3. AUTOMATIC TEXT SUMMARIZATION PAGEREF _Toc525684907 h 21.1.3.1. TYPES OF SUMMARIES PAGEREF _Toc525684908 h 31.1.3.1.1. EXTRACTIVE SUMMARIES PAGEREF _Toc525684909 h 31.1.3.1.2. ABSTRACTIVE SUMMARIES PAGEREF _Toc525684910 h 31.2. AUTOMATIC URDU TEXT SUMMARIZATION PAGEREF _Toc525684911 h 31.3. AIMS AND OBJECTIVES PAGEREF _Toc525684912 h 41.4. STRUCTURE OF THESIS PAGEREF _Toc525684913 h 5 LITERATURE REVIEW PAGEREF _Toc525684914 h 72.1. NATURAL LANGUAGE PROCESSING PAGEREF _Toc525684915 h 72.1.1. PHONOLOGICAL ANALYSIS PAGEREF _Toc525684916 h 72.1.2. MORPHOLOGICAL ANALYSIS PAGEREF _Toc525684917 h 82.1.3. LEXICAL ANALYSIS PAGEREF _Toc525684918 h 82.1.4. SYNTACTIC ANALYSIS PAGEREF _Toc525684919 h 82.1.5. SEMANTIC ANALYSIS PAGEREF _Toc525684920 h 92.1.6. DISCOURDE INTEGRATION PAGEREF _Toc525684921 h 92.1.7. PROGMATIC ANALYSIS PAGEREF _Toc525684922 h 92.2. AUTOMATIC TEXT SUMMARIZATION PAGEREF _Toc525684923 h 92.2.1. SUMMARIZATION TECHNIQUES PAGEREF _Toc525684924 h 102.2.1.1. EXTRACTIVE SUMMARIZATION PAGEREF _Toc525684925 h 102.2.1.1.1. WORD FREQUENCY METHOD PAGEREF _Toc525684926 h 102.2.1.1.2. CUE PHRASE METHOD PAGEREF _Toc525684927 h 112.2.1.1.3. LOCATION METHOD PAGEREF _Toc525684928 h 112.2.1.1.4. DEEP LINGUISTIC METHODS PAGEREF _Toc525684929 h 112.2.1.1.5. LEXICON CHAIN CONSTRACTION PAGEREF _Toc525684930 h 122.2.1.2. ABSTRACTIVE SUMMARIZATION PAGEREF _Toc525684931 h 132.2.1.2.1. SEMANTIC BASED APPROACH PAGEREF _Toc525684932 h 132.2.1.2.2. RULE BASED METHOD PAGEREF _Toc525684933 h 132.3. AUTOMATIC URDU TEXT SUMMARIZATION PAGEREF _Toc525684934 h 15 MATERIALS AND METHODS PAGEREF _Toc525684935 h 163.1. MODULES OF ANALYTIC HIERARCHY PROCESS SUMMARIZATION PAGEREF _Toc525684936 h 163.2. DATA SETUP PAGEREF _Toc525684937 h 173.3. URDU TEXT SUMMARIZATION WITH ANALYTIC HIERARCHY PROCESS PAGEREF _Toc525684938 h 183.3.1. F1: WORD FREQUENCY PAGEREF _Toc525684939 h 203.3.2. F2: KEYWORDS IN THE SENTENCE PAGEREF _Toc525684940 h 213.3.3. F3: SENTENCE LOCATION PAGEREF _Toc525684941 h 213.3.4. F4: SENTENCE LENGTH PAGEREF _Toc525684942 h 21 RESULTS AND DISCUSSION PAGEREF _Toc525684943 h 224.1. URDU CORPORA PAGEREF _Toc525684944 h 224.2. EVALUTION METHODOLOGY PAGEREF _Toc525684945 h 234.2.1. EVALUTION MEASURE PAGEREF _Toc525684946 h 234.2.1.1. RECALL PAGEREF _Toc525684947 h 244.2.1.2. PRECISION PAGEREF _Toc525684948 h 244.2.1.3. F-MEASURE PAGEREF _Toc525684949 h 244.3. RESULTS PAGEREF _Toc525684950 h 24 CONCLUSIONS AND RECOMMENDATIONS FOR FUTURE WORK PAGEREF _Toc525684951 h 255.1. INTRODUCTION ABOUT FOURTH CHAPTER PAGEREF _Toc525684952 h 255.2. REFERENCES PAGEREF _Toc525684953 h 25REFERENCES PAGEREF _Toc525684954 h 26APPENDIX A PAGEREF _Toc525684955 h 27APPENDEX B PAGEREF _Toc525684956 h 28

LIST OF TABLES TOC h z “AJKU_Caption_Table” c Table 4.1: Categories of the articles used in Urdu Summary Corpus PAGEREF _Toc525855258 h 23

LIST OF FIGURES TOC h z “AJKU_Caption_Figure” c Figure 2.1: Commonly used methods for the Text Summarization. PAGEREF _Toc525855244 h 12Figure ?3.1: Steps of AHP (Analytic Hierarchy Process) Urdu Text Summarization PAGEREF _Toc525855245 h 17
NOTATIONS / ABBREVIATIONS / NOMENCLATURE / ACRONYMSNotations are generally a requirement in the sciences and define all notation and symbols used in the text of the dissertation. May be replaced by a list of abbreviations in other disciplines and can be arranged in alphabetical order with explanation.

ACKNOWLEDGEMENTSFirst of all I would like to thanks Almighty Allah, the most merciful, for giving me the opportunity, courage and knowledge to complete this dissertation.

I want to express my gratitude and indebtedness for my supervisor Dr. Abdul Majid, for his ingenious help, persistent encouragement and for being patience through all my mistakes. I am grateful for all his contributions of time, ideas and knowledge to make my research experience productive, exciting and a milestone in my life.

My sincere thanks also goes to the members of my thesis supervisory committee, Dr. Ali Abbas, Dr. Sajjad Ahmed Nadeem and Dr._________ , for their insightful comments and encouragement, but also for the hard question which incented me to widen my research from various perspectives.
I would like to acknowledge the moral and intellectual support given to me by all faculty members, staff members of Department of CS & IT and my M.Phil classmates. It was a very long, and at times, tedious journey, covered with smoothness, thanks for providing me adequate help whenever required.

A very special thanks goes to Mr. Adeel Ahmed Abbasi, for showing keen interest in answering my queries in spite of very busy schedule. He not only explained my queries but also offered valuable suggestions. I doubt that I will ever be able to convey my appreciation fully, but I owe him my eternal gratitude.

Especially, I need to express my gratitude and deep appreciation to Mr. Faizan Ayub, Mr. Shahbaz Saleem and Mr. Arshad Abbasi. They have consistently helped me keep my perspective on what is important in life and shown me how to deal with reality.

Last, but not the least, I thank my parents for their unbounded care, love, constant support, co-operation and sacrifice throughout my research work. Their belief and encouragement under all odds brought me where I stand today.

ABSTRACTThe rapid increase of available information on the internet has led to it becoming an essential part of human life. However, many users do not have enough time to read so much information especially text, which results in users often resorting to reading abstracts and headlines instead. It is not easy for users to manually summarize these large documents. Text summarization is very useful tool to extract the relevant and precise information from extensive amount of text. The goal of automatic text summarization is to shrink the source text into a shorter version of itself while still preserving its informational content and overall meaning.
For the past few years a lot of work has been done in Urdu text processing. However, up to best of my knowledge, no significant work has been done for the summarization of text in Urdu language. The existing approaches summarize the Urdu Text by using sentence weight Algorithm for word processor. The summarization is achieved by removing the stop words from sentence and rank the sentences on the basis of content words. This approach is not applicable for ranking the sentences where the length of sentence is large and the number of of stop words in sentence are higher.

The objective of this research work will be to deal with the issues of automatic text summarization for Urdu language. This proposed model will use analytical hierarchy as a base factor for an evaluation algorithm and improves the summarization quality of Urdu language text. The weighting and combination methods will be the two main contributions of the proposed text summarization algorithm.

PUBLICATIONS
Provides a list of the author’s publications which directly relate to the research work for the dissertation. These papers are often included in an appendix.

List of publications is optional and is for PhD degree students only. All the publications should be listed in alphabetical order. The proposed style for publications/reference listing is APA.

Chapter 1
INTRODUCTIONIn this introductory chapter, the significance of Automatic Text Summarization (ATS) in order to provide fundamental information. And a detailed discussion of Natural Language Processing (NLP) in URDU.

BACKGROUNDAutomatic Urdu Text Summarization require full understanding of NLP, Automatic Text Summarization. These concepts are discussed in following subsections.

Natural language Processing
Natural language processing (NLP) is concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

Many fields like machine learning, linguistics, cognitive science and other computer science areas are combined with NLP for computer system to understand any human language. This process has many application. Which include search engines, question answering and machine translation.

From last two decades, NLP become more challenging and rely on the field associated with machine learning. Statistical methods now dominate NLP, and have moved the field forward substantially, opening up new possibilities for exploitation of data in developing NLP components and applications. Supervised learning approaches are used in many state-of-art natural language techniques. In these approaches, human annotators are used to annotate text corpus for training. Although, this approach has made significant influence in NLP, that faces many challenges.

An Overview of NLP for Urdu LanguageURDU (????) NLP is the novel field of this decade derived from Turkish language and its mean “horde” (Lashkar- ????). URDU belongs to Indo-European language of the Indo-Aryan and Hindustani branch which developed under Persian, Arabic and Turkish with sixty five million speakers. Urdu is one of the popular language of South Asia and national language of Pakistan having about sixteen million speakers only in Pakistan, but it is a secondary language in many other Muslim and Non-Muslim countries, most of which belongs to Asia. URDU and Hindi both have originated from the dialect of Delhi region and Other than the minute details these languages share their morphology. URDU borrows a number of vocabulary from Persian, Arabic and English. Turkish borrowings in URDU are minimal. There are quite a number of words that have found a place in URDU language, often through the Persian Language, have differently nuanced connotations and usages.

URDU script is written from right to left same like Semitic languages having shape similar to Arabic, Persian and Pashto language letters. There is no capitalization in Urdu. This makes identifying proper nouns, titles, acronyms, and abbreviations a difficult task. Diacritics (vowels) are hardly present in the text and words are guessed with the help of the context of surrounding words. In terms of syntax, it has a free word order (Subject Object Verb). Despite spoken by millions of people, Urdu is an under-resourced language. A sentence demonstrating Urdu is given below:
???? ??????? ?? ???? ???? ???
Translation: Urdu is the national language of Pakistan.

In this study URDU language corpus was used and look for to describe consideration to the reality that URDU language corpus research, in contrast to other East Asian and European.

AUTOMATIC TEXT SUMMARIZATIONSummarization is the way of separating important information from one or more sources ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISBN”:”9781119044079″,”abstract”:”Summarization is the art of abstracting key content from one or more information sources 6. Summarization includes text summarization, image summarization, and video summarization. Text summarization is one of application of natural language processing and is becoming more popular for information condensation 1.Information is accessible in great quantity for every topic on internet assembly the key information in the form of summary would benefit a number of users. Automatic text summarization system generates a summary, i.e. it contains short length text which comprises all the key information of the document. Summary can be generated through extractive as well as abstractive methods.”,”author”:{“dropping-particle”:””,”family”:”Chettri”,”given”:”Roshna”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Chakraborty”,”given”:”Udit Kr.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”International Journal of Computer Applications”,”id”:”ITEM-1″,”issue”:”1″,”issued”:{“date-parts”:”2017″},”page”:”5-7″,”title”:”Automatic Text Summarization”,”type”:”article-journal”,”volume”:”161″},”uris”:”http://www.mendeley.com/documents/?uuid=0f5e4f41-5124-4e57-ac77-93b9106a1f25″},”mendeley”:{“formattedCitation”:”(Chettri & Chakraborty, 2017)”,”plainTextFormattedCitation”:”(Chettri & Chakraborty, 2017)”,”previouslyFormattedCitation”:”(Chettri & Chakraborty, 2017)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Chettri ; Chakraborty, 2017). It increases the likelihood of finding the points of texts, so the user will spend less time on reading and understanding whole documents. Text summarization is one among the typical tasks of text mining .The World Wide Web provide a massive information available to users and users are burdened with lengthy text document. Some individuals make decisions on the basis of reviews they have seen and with summaries they can make effective assessment in less time. With growing volume of information summarization play a very vital role in terms of time saving. Text summarization is a challenging task which preferably involves deep natural language processing capacities and in order to simplify the issue current research is focused on extractive summary generation. Summary can be generated through either Extractive or Abstractive summarization technique. Sentence based extractive summarization techniques are commonly used in automatic text generation.

TYPES OF SUMMARIESEXTRACTIVE SUMMARIESThis type of summary is generated by selecting few sentence(s) form the document and scores are assigned to important sentences in the documents and then highly scored sentences are chosen to generate the summary. Extractive summarization uses statistical approach for selecting important sentences or keyword from document. It is performed by concatenating several sentences taken exactly as they appear in the input being summarized.

ABSTRACTIVE SUMMARIESAn abstractive summary does not include the words or phrases from the original document instead it re-interpreted ideas or concepts taken from the original document and shown in a different form. It is written to covey the main information in the input and may reuse phrases or clauses from it, but the summaries are overall expressed in the words of the summary author. It needs extensive natural language processing. This is the ideal goal of abstractive text summarization, an ideal summarizer remains unavailable as it involves natural language processing (NLP), especially in terms of the semantic representations involved. The problem is acerbated in the Urdu language as many of the required language resources that are required for the NLP are not available currently. Therefore, it is much more complex than extractive summarization ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“abstract”:”Text summarization is one of application of natural language processing and is becoming more popular for information condensation. Text summarization is a process of reducing the size of original document and producing a summary by retaining important information of original document. This paper gives comparative study of various text summarization methods based on different types of application. The paper discusses in detail two main categories of text summarization methods these are extractive and abstractive summarization methods. The paper also presents taxonomy of summarization systems and statistical and linguistic approaches for summarization.”,”author”:{“dropping-particle”:””,”family”:”Munot”,”given”:”Nikita”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Govilkar”,”given”:”Sharvari S”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”International Journal of Computer Applications”,”id”:”ITEM-1″,”issue”:”12″,”issued”:{“date-parts”:”2014″},”page”:”975-8887″,”title”:”Comparative Study of Text Summarization Methods”,”type”:”article-journal”,”volume”:”102″},”uris”:”http://www.mendeley.com/documents/?uuid=b5343def-0ea2-4d86-a5b1-b6b598cbd093″},”mendeley”:{“formattedCitation”:”(Munot & Govilkar, 2014)”,”plainTextFormattedCitation”:”(Munot & Govilkar, 2014)”,”previouslyFormattedCitation”:”(Munot & Govilkar, 2014)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Munot ; Govilkar, 2014).

AUTOMATIC URDU TEXT SUMMARIZATIONUrdu is an Indo-Aryan language, widely spoken in South Asia. It is also spoken all over the world due to the large South Asian Diaspora. Urdu has more than 100 million speakers around the globe. It is written in a modi?ed Perso-Arabic script from right to left. It requires speci?c rendering to be viewed properly. Normally, it is written in Nastalique, a highly complex writing system that is cursive and context sensitive.

Determining the informational content of URDU text is a complex process as often involves processes such as tokenization, ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISSN”:”01279084″,”abstract”:”Natural Language Processing (NLP) is an integral part of a conversation and by proxy an integral part of a chatterbot. Building a complete vocabulary for a chatterbot is a prohibitively time and effort intensive endeavor and thus makes a learning chatterbot a much more efficient alternative. Learning can be performed from many facets including individual words to phrases and concepts. From the perspective of words, the grammatical parts of speech become important since they allow meaning and structure to be derived from a sentence. Verbs tend to be unique since they have different forms, namely participles and tenses. As such we present an algorithm to derive the base verb from any participle or tense.”,”author”:{“dropping-particle”:””,”family”:”Raj”,”given”:”Ram Gopal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Abdul-Kareem”,”given”:”S.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Malaysian Journal of Computer Science”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”2011″},”page”:”63-72″,”title”:”A pattern based approach for the derivation of base forms of verbs from participles and tenses for flexible NLP”,”type”:”article-journal”,”volume”:”24″},”uris”:”http://www.mendeley.com/documents/?uuid=698eb5ad-e0df-4b50-b6dc-59d146de5d60″},”mendeley”:{“formattedCitation”:”(Raj & Abdul-Kareem, 2011)”,”plainTextFormattedCitation”:”(Raj & Abdul-Kareem, 2011)”,”previouslyFormattedCitation”:”(Raj & Abdul-Kareem, 2011)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Raj ; Abdul-Kareem, 2011), information dissemination,ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISSN”:”01279084″,”abstract”:”Conversational systems or chatterbots converse/chat by learning from their interactions with users. To do this the systems must have an adaptive knowledge base that can be updated by the systems themselves. RONE is a tele-text based conversational system. RONE’s knowledge base is built using SQL and accessed using the main Java application. Additionally, RONE uses conjunctions and prepositions as markers to expedite the dissemination and storage of information which helps him learn. In this paper, we describe the approach RONE uses to break up new information for learning purposes – the principle technique introduced here being the storage of information in a format to answer all the possible questions directly without inference. We also look at other conversation based learning approaches and their limitations. Further, we compare RONE’s performance against some contemporary conversational systems and provide evidence of the relative superior informational accuracy of RONE’s responses to user interrogation. RONE’s better performance is noteworthy because it is relative to systems which are Loebner Prize medal winners.”,”author”:{“dropping-particle”:””,”family”:”Raj”,”given”:”Ram Gopal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Abdul-Kareem”,”given”:”S.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Malaysian Journal of Computer Science”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”2009″},”page”:”138-160″,”title”:”Information dissemination and storage for tele-text based conversational systems’ learning”,”type”:”article-journal”,”volume”:”22″},”uris”:”http://www.mendeley.com/documents/?uuid=be0bba5d-c161-4bc4-b635-39b06fb2cd83″},”mendeley”:{“formattedCitation”:”(Raj & Abdul-Kareem, 2009)”,”plainTextFormattedCitation”:”(Raj & Abdul-Kareem, 2009)”,”previouslyFormattedCitation”:”(Raj & Abdul-Kareem, 2009)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Raj ; Abdul-Kareem, 2009), as well as other models that are used to compare information,ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISSN”:”01279084″,”abstract”:”Conversational systems are gaining popularity rapidly. Consequently, the believability of the conversational systems or chatterbots is becoming increasingly important. Recent research has proven that learning chatterbots tend to be rated as being more believable by users. Based on Raj’s Model for Chatterbot Trust, we present a model for allowing chatterbots to determine the degree of contradictions in contradictory statements when learning thereby allowing them to potentially learn more accurately via a form of discourse. Some information that is learnt by a chatterbot may be contradicted by other information presented subsequently. Choosing correctly which information to use is critical in chatterbot believability. Our model uses sentence structures and patterns to compute contradiction degrees that can be used to overcome the limitations of Raj’s Trust Model, which takes any contradictory information as being equally contradictory as opposed to some contradictions being greater than others and therefore having a greater impact on the actions that the chatterbot should take. This paper also presents the relevant proofs and tests of the contradiction degree model as well as a potential implementation method to integrate our model with Raj’s Trust Model.”,”author”:{“dropping-particle”:””,”family”:”Raj”,”given”:”Ram Gopal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Balakrishnan”,”given”:”Vimala”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Malaysian Journal of Computer Science”,”id”:”ITEM-1″,”issue”:”3″,”issued”:{“date-parts”:”2011″},”page”:”160-167″,”title”:”A model for determining the degree of contradictions in information”,”type”:”article-journal”,”volume”:”24″},”uris”:”http://www.mendeley.com/documents/?uuid=239487a2-9f28-4e5b-9a85-c1060b8309bd”},”mendeley”:{“formattedCitation”:”(Raj & Balakrishnan, 2011)”,”plainTextFormattedCitation”:”(Raj & Balakrishnan, 2011)”,”previouslyFormattedCitation”:”(Raj & Balakrishnan, 2011)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Raj ; Balakrishnan, 2011).
The use of AHP (Analytic Hierarchy Process) for evaluation and decision-making was studied by many authors. Despite the differences in the approaches taken, each technique has its merits. AHP is based on pair wise comparisons using ratio scales to indicate the summarization accuracy performance. In this paper we present the use of AHP for evaluation and selecting sentences based on their weights, and concurrently adjust the method to make it easier to use. It should be noted that the Urdu language differs from the English language both morphologically and semantically. Our method consists of two main parts, one to calculate the weights for the sentences and another to evaluate those sentences which should result in an optimal summary of the Urdu text to be generated.

AIMS AND OBJECTIVES
The analytical hierarchy procedure (AHP) is proposed by Saaty ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1007/978-1-4614-7279-7_1″,”ISBN”:”9781461472780″,”ISSN”:”08848289″,”abstract”:”The Analytic Network Process (ANP) is a generalization of the Analytic Hierarchy Process (AHP). The basic structure is an influence network of clusters and nodes contained within the cl usters. Priorities are established in the same way they are in the AHP usi ng pairwise comparisons and judgment. Many decision problems cannot be struct ured hierarchically because they involve the interaction and dependence of higher-level elements in a hierarchy on lower-level elements. Not only does the importance of the criteria determine the importance of the alternatives as in a hierarchy, but also the importance of the alternatives themselves determines the importance of the criteria. Feedback enables us to factor the future into the present to determine what we have to do to attain a desired future. To illustrate ANP, one example is also presented.”,”author”:{“dropping-particle”:””,”family”:”Saaty”,”given”:”Thomas L.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Vargas”,”given”:”Luis G.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Decision making with the analytic network process”,”id”:”ITEM-1″,”issued”:{“date-parts”:”2013″},”page”:”1-40″,”title”:”The Analytic Network Process”,”type”:”article-journal”,”volume”:”195″},”uris”:”http://www.mendeley.com/documents/?uuid=c313940c-c912-411e-ba96-3a602d3999a3″},”mendeley”:{“formattedCitation”:”(Saaty & Vargas, 2013)”,”plainTextFormattedCitation”:”(Saaty & Vargas, 2013)”,”previouslyFormattedCitation”:”(Saaty & Vargas, 2013)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Saaty ; Vargas, 2013). AHP was originally applied to uncertain decision problems with multiple criteria, and has been widely used in solving problems of ranking, selection, evaluation, optimization, and prediction decisions. The AHP method is expressed by a unidirectional hierarchical relationship among decision levels. The top element of the hierarchy is the overall goal for the decision model. The hierarchy decomposes to a more specific criterion in which a level of manageable decision criteria is metADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1109/17.985748″,”ISBN”:”0018-9391″,”ISSN”:”0018-9391″,”abstract”:”The analytic network process (ANP) is presented as a potentially valuable method to support the selection of projects in a research and development (R amp;D) environment. This paper first discusses the requirements of the R amp;D project selection problem, which requires the allocation of resources to a set of competing and often disparate project proposals. Among the factors complicating this task is the need to make the decision within the framework of an enterprise’s strategic objectives and organizational structure while considering and integrating financial and strategic benefits of each project. The paper discusses the use of the ANP, a general form of Saaty’s analytic hierarchy process, as a model to evaluate the value of competing R amp;D project proposals. A generic ANP model developed by the authors, which includes in its decision levels the actors involved in the decision, the stages of research, categories of metrics, and individual metrics, is presented. The paper concludes with a case study describing the implementation of this model at a small high-tech company, including data based on the actual use of the decision making model”,”author”:{“dropping-particle”:””,”family”:”Meade”,”given”:”L.M.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Presley”,”given”:”a.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Engineering Management, IEEE Transactions on”,”id”:”ITEM-1″,”issue”:”1″,”issued”:{“date-parts”:”2002″},”page”:”59 -66″,”title”:”R&D project selection using the analytic network process”,”type”:”article-journal”,”volume”:”49″},”uris”:”http://www.mendeley.com/documents/?uuid=294c3064-e03d-4472-8e6b-b5a198429445″},”mendeley”:{“formattedCitation”:”(Meade & Presley, 2002)”,”plainTextFormattedCitation”:”(Meade & Presley, 2002)”,”previouslyFormattedCitation”:”(Meade & Presley, 2002)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Meade ; Presley, 2002). Under each criteria, sub-criteria elements related to the criterion can be constructed. The AHP separates complex decision problems into elements within a simplified hierarchical system.

The AHP usually consists of three stages of problem solving: decomposition, comparative judgment, and synthesis of priority. The decomposition stage aims at the construction of a hierarchical network to represent a decision problem, with the top level representing overall objectives and the lower levels representing criteria, sub- criteria and alternatives. With comparative judgments, expert users are requested to set up a comparison matrix at each hierarchy by comparing pairs of criteria or sub-criteria. Finally, in the synthesis of priority stage, each comparison matrix is then solved by an eigenvector ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.7763/IJMLC.2013.V3.264″,”ISSN”:”20103700″,”author”:{“dropping-particle”:””,”family”:”Cami”,”given”:”Bagher Rahimpour”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Amiri”,”given”:”Amin Khodabandeh”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”International Journal of Machine Learning and Computing”,”id”:”ITEM-1″,”issue”:”Icmlc”,”issued”:{“date-parts”:”2013″},”page”:”17-20″,”title”:”Applying AHP Technique for Trust Evaluation in the Semantic Web”,”type”:”article-journal”,”volume”:”3″},”uris”:”http://www.mendeley.com/documents/?uuid=aacfad6d-93ab-492a-a764-5f8c6bdba37f”},”mendeley”:{“formattedCitation”:”(Cami & Amiri, 2013)”,”plainTextFormattedCitation”:”(Cami & Amiri, 2013)”,”previouslyFormattedCitation”:”(Cami & Amiri, 2013)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Cami ; Amiri, 2013) method for determining the criteria importance and alternative archive of SID factors. The rest of the paper is recognized as follow: performance. The purpose of the AHP enquiry in this paper was to construct a hierarchical evaluation system based on the resource attributes and entity reputation.

There are lots of researches on Automatic text summarization and various techniques are being developed. Various researchers have proposed new techniques using multiple methodologies for automatic text summarization in English and other languages. For the past few years lot of work has been done in Urdu text processing but no suitable work has been presented in the area of Urdu language text Summarization. Only the work that is presented for Urdu language is text summarization using sentence weight Algorithm for word processor, This work has some deficiencies while ranking the sentences if the length of sentence is large and too many number of stop words in sentence.

The main purpose of this study is to suggest, improve and implement a text summarizer for URDU language. The other motives of this study are as follows:
Automatic Text Summarization is a prototyping problem in NLP, and its solution can be used to solve other NLP problem as well as news articles summarization and document ranking according to query.

With URDU a morphologically-rich language develop a text summarizer with limited resources.

STRUCTURE OF THESIS
The rest of thesis is organized as following:
In the first chapter, a brief introduction on the significance of Automatic text Summarization in order to provide fundamental information. And the detailed discussion of Natural Language Processing (NLP) in URDU TEXT SUMMARIZATION.

Literature review on prior work in Automatic Text Summarization will be discussed in chapter 2.

Because of a huge number of publications in NLP it is really difficult to attempt this field and also varied languages their dialect differences makes interesting challenges for researchers. Instead, I briefly review the work based on different techniques used for Automatic Text Summarization (ATS). Also focused onto the detailed review of the URDU language based text summarizer.

Chapter 3 serves as a brief introduction into text summarization in URDU with Analytic Hierarchy Process (AHP) which includes the individual representation of sentences and preliminary data setup. This chapter will also describe URDU text summarization with Analytic Hierarchy Process (AHP) to understand the further improvement of the summarization accuracy.

In Chapter 4 description of the dataset used and some experiments to judge the worth of applied Analytic Hierarchy Process (AHP) on URDU language documents. Also presented accuracy rate of summarization and comparisons of results obtained from Analytic Hierarchy Process (AHP) and gold standard and previously available techniques.

Conclusion, main findings of the study is described in summary. Comparison of the performance of the individual method based on experiments that were done using the URUD language corpus. In addition, some possible improvements are suggested that could be done in order to make the summarizer better in some respects.

Chapter 2
LITERATURE REVIEWNATURAL LANGUAGE PROCESSINGNLP developed in 1960’s is considered as a subfield of linguistics and Artificial Intelligence (AI) ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1162/089120100750105975″,”ISBN”:”0130950696″,”ISSN”:”08912017″,”PMID”:”22146067″,”abstract”:”This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora. Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.”,”author”:{“dropping-particle”:””,”family”:”Jurafsky”,”given”:”Daniel”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Martin”,”given”:”James H”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Speech and Language Processing An Introduction to Natural Language Processing Computational Linguistics and Speech Recognition”,”id”:”ITEM-1″,”issued”:{“date-parts”:”2009″},”page”:”0-934″,”title”:”Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition”,”type”:”article-journal”,”volume”:”21″},”uris”:”http://www.mendeley.com/documents/?uuid=b2d31f3c-6dd6-4f60-902d-41eee81293e0″},”mendeley”:{“formattedCitation”:”(Jurafsky & Martin, 2009)”,”plainTextFormattedCitation”:”(Jurafsky & Martin, 2009)”,”previouslyFormattedCitation”:”(Jurafsky & Martin, 2009)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Jurafsky & Martin, 2009).The main objective of NLP is to understanding of natural language and study the problems of automatic generation. Any natural language used by the human is known as Natural Language, but computer languages such as machine language like programing languages and artificial languages cannot be said to be a natural language. NLP is a convenient description for all attempts to use computers to process natural language. NLP is also an area of artificial intelligence research that attempts to reproduce the human interpretation of language for computer system processing. The ultimate goal of NLP is to determine a system of language, words, relations, and conceptual information that can be used by computer logic to implement artificial language interpretation. NLP includes anything a computer needs to understand natural language (written or spoken) and also generate the natural language. To build computational natural language system, Natural Language Understanding (NLU) and Natural Language Generation (NLG) is needed. NLG system convert information from computer databases into normal-sounding human language, and NLU system convert samples of human language into more representation that are easier for computer programs to manipulate. Some of important analysis levels of NLP are as follows:
PHONOLOGICAL ANALYSISThe phonological level is concerned with the analysis of speech sounds like phonemes and is of little interest in textual IR. The minimal unit of the sound system is the phoneme which is capable of distinguishing the meaning in the words. The phonemes combine to form a higher level unit called syllable and syllables combine to form the words. Therefore, the organization of the sounds in a language exhibits the linguistic as well as computational challenges for its analysis.

MORPHOLOGICAL ANALYSIS
The morphological level deals with meaning units i.e. morphemes. It is concerned with analysis of the different forms of a given word in terms of its prefixes, roots, and suffixes. This level of NLP has been traditionally incorporated into IR systems. Stemming techniques that reduce words to some roots forms (stems) for query-document similarity are an example of this level of processing.
For example, morphologically we can analyze word “??? ????” into three separate morphemes: the prefix “???”, the root “???” and the suffix “?”.
LEXICAL ANALYSISThe next higher level is the lexical level that deals with word level processing involving the analysis of structure and meaning of words and part of speech tagging. Lexical operation in IR includes elimination of stop words, generation and use of thesauri for expanding queries and handling abbreviations and acronyms. Part of Speech tagging is another lexical processing that is now being used in IR. It is commonly used operation in NLP, but bot widely known in traditional IR.

Lexical level analysis determines both a lexicon will be utilized along with nature and extent of knowledge that is encoded in the lexicon using define procedure by NLP system.

SYNTACTIC ANALYSIS
The syntactic analysis deals with the grammar and structure of sentences. There can be many possible structure of a sentence. Identification of correct among various alternatives requires higher level knowledge. The attempts to use syntactic analysis in understanding of natural language were based on the assumption that the meaning is inherent in the syntacticADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1016/0306-4573(95)00071-2″,”ISSN”:”03064573″,”abstract”:”The most important role of document representation is to help the system, intermediary, and the user effectively search the system and judge the meaning and relevance of a document. The existing document representation methods have the assumption that the meaning of a document is inherent in the document. But several research results revealed that the meaning and the relevance of a document can be different for different users. Considering that the effectiveness of information retrieval is the similarity among the relevance judgments of the system, intermediary and the user, the incorrect assumption can result in ineffective information retrieval. In this study, the inferential communication model, which implies that the meaning of a document is inferred in the context of the user’s situation to result in different meanings for users in different situations, is used to study an inferential science document representation method. Through this study, several topical components and non-topical components of the science document were found as the inferred meanings of the document. These show the science document aspects which are used for relevance judgments. Science documents need to be represented in terms of these aspects for effective system’s, intermediary’s, and user’s judgments of the meaning and the relevance of the document. Copyright © 1996 Elsevier Science Ltd.”,”author”:{“dropping-particle”:””,”family”:”Park”,”given”:”Hongseok”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Information Processing and Management”,”id”:”ITEM-1″,”issue”:”4″,”issued”:{“date-parts”:”1996″},”page”:”419-429″,”title”:”Inferential representation of science documents”,”type”:”article-journal”,”volume”:”32″},”uris”:”http://www.mendeley.com/documents/?uuid=3d525702-f0ae-4595-9cea-4965ac70766f”},”mendeley”:{“formattedCitation”:”(Park, 1996)”,”plainTextFormattedCitation”:”(Park, 1996)”,”previouslyFormattedCitation”:”(Park, 1996)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Park, 1996) ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISBN”:”0-89791-448-1″,”abstract”:”Many conventional approaches to text analysis and information retrieval prove ineffective when large text collections must be processed in heterogeneous subject areas. An alternative text manipulation system is outlined useful for the retrieval of large heterogeneous texts, and for the recognition of content similarities between text excerpts, based on flexible text matching procedures carried out in several contexts of different scope. The methods are illustrated by search experiments performed with the 29-volume Funk and Wagnalls encyclopedia.”,”author”:{“dropping-particle”:””,”family”:”Salton”,”given”:”Gerard”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Buckley”,”given”:”Chris”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Proceeding SIGIR ’91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval”,”id”:”ITEM-1″,”issued”:{“date-parts”:”1991″},”page”:”21-30″,”title”:”Automatic Experiments Text Structuring and Encyclopedia Retrieval in Automatic Searching”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=1171f1c1-b85e-482c-b735-36df8a5f72b0″},”mendeley”:{“formattedCitation”:”(Salton ; Buckley, 1991)”,”plainTextFormattedCitation”:”(Salton ; Buckley, 1991)”,”previouslyFormattedCitation”:”(Salton ; Buckley, 1991)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Salton & Buckley, 1991). Discussed how syntactic analysis can conclude the limitation of the meaning. Syntactic level processing has been rarely used in traditional IR. Identification of phrasal units is an example of this level of processing that has been used in IR. Although sophisticated parsers have developed but usually statistical methods such as co-occurrence and proximity method has been preferred over NLP for phrase identification in IR.
SEMANTIC ANALYSIS
The semantic level is concerned with the meaning of units larger than words such as clauses and sentences. It involves the use of contextual knowledge to represent meaning. Word sense disambiguation is a task that require a semantic level of processing. This is because a word can be disambiguated only in the context of larger textual units in which it is being used. Due to the sophisticated level of processing and the need of real world and domain-specific knowledge most of the IR system preferred statistical keyword matching to semantic level processing.
DISCOURDE INTEGRATION
The discourse level processing attempts to interpret the structure and meaning of even larger units, for example: the meaning of word “it” in the sentence, “we got it” at paragraph and at document level, in terms of words, phrases, cluster and sentences.
PROGMATIC ANALYSIS
The highest level is the pragmatic level that deals with outside word knowledge (i.e. knowledge external to document to document and/or query). There is no evidence of pragmatic level analysis in IR. Even in AI, research at this level is only experimental in nature. Example: if someone says “the door is open” then it is necessary to know which door “to door” refer to; here it is necessary to know what the intention of the speaker: could be a pure statement of fact, could be an explanation of how the cat got in, or could be a request to the person addressed to close the door.

AUTOMATIC TEXT SUMMARIZATIONText summarization creates a shorter version of a document, which can be easily perceive by the user. The most basic form of text summarization generates a summary from a single source document, although it is also possible to do so from multiple documents. The key applications of text summarization are as follows:
News articles: A summary of a news article empowers quick inspection. It may also be beneficial to summarize the headings of a large number of associated news articles in order to understand the common theme.

Search engine results: A query on a search engine may return several results that need to be presented on a one page. Normally, the heading is followed by short summaries on that page.

Review summarization: Reviewers at sites such as Amazon generate huge numbers of short documents describing their judgments of a specific product. It may be required to condense these reviews into a shorter summary.

Scienti?c articles: Impact summarization is a way of obtaining the most dominant sentences in a specific article. This type of summarization delivers a broad understanding of what the article is about.

Emails: A thread of email corresponds to a discourse between two participants. In such cases, it is important to take the communicative nature of the discussion into account during the summarization process.

Improving other automated tasks: A surprising bene?t of text summarization is that it sometimes increases the performance of other tasks in text analytics.

SUMMARIZATION TECHNIQUESThe two main types of summarization are either extractive or abstractive, which are de?ned below:
EXTRACTIVE SUMMARIZATIONIn extractive summarization, a short summary is created by extracting sentences from the source document without changing the individual sentences in any way. In such cases, an important step is often that of scoring the importance of various sentences. Afterwards, a subset of the top-scored sentences are recollected to maximize the topical coverage and minimize redundancy.

WORD FREQUENCY METHODIt is a word distribution method in which two measures namely term frequency (tf) and document frequency (df) are calculated for each non-stop-word (w) in the document. Term frequency (tf), indicate number of times a word appears in the text which measures salience of word within that document. Document frequency (df) indicate number of documents in which the word appears. The frequent occurrence of a word in a document is treated as informative word, which is calculated by document frequency measure. Thematic words are obtained by comparing the ratio between two frequencies, referred as (tf-idf) measure. Once (tf-idf) score has been computed for each word the next step is to calculate number of such thematic words per sentence. With this value sentences in the input text are ranked and highest scored sentences are picked to be part of summary. Redundancy of information is extremely high in this method. Punjabi, Bengali, Kannada, Odia languages are using this method.

CUE PHRASE METHOD
Weight is assigned to text based on its significance like positive weights “verified, significant, best, this paper” and negative weights like “hardly, impossible”. Cue phrases are usually genre dependent. The sentence consisting such cue phrases can be included in summary. The cue phrase method is based on the assumption that such phrases provide a “rhetorical” context for identifying important sentences. The source abstraction in this case is a set of cue phrases and the sentences that contain them. Above all statistical features are used by extractive text summarization.

LOCATION METHODWeights are assigned to text based on location whether it appears in lead, medial or final position in a paragraph or in appears in the prominent section of the document such as conclusion or introduction. Leading several sentences of a document or last few sentences or conclusion are considered to be more important and included in summary. Hovy & Lin ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.3115/1119089.1119121″,”abstract”:”SUMMARIST is an attempt to create a robust automated text summanzaUon\nsystem, based on the ‘equation’ summarization = topw Ment:ficatwn\n+ mterpretatwn + generatwn We descnbe the system’s arclutecture and\nprovide detmls of some of its modules.”,”author”:{“dropping-particle”:””,”family”:”Hovy”,”given”:”Eduard”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Lin”,”given”:”Chin-Yew”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Advances in Automatic Text Summarization”,”id”:”ITEM-1″,”issued”:{“date-parts”:”1999″},”page”:”81-94″,”title”:”Automated text summarization in summarist”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=beadf494-6e1e-4776-b4b8-505a55422541″},”mendeley”:{“formattedCitation”:”(Hovy & Lin, 1999)”,”plainTextFormattedCitation”:”(Hovy & Lin, 1999)”,”previouslyFormattedCitation”:”(Hovy & Lin, 1999)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Hovy & Lin, 1999) and Edmundson ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1145/321510.321519″,”ISBN”:”0004-5411″,”ISSN”:”00045411″,”abstract”:”This paper describes new methods of automatically extracting documents for screening purposes, i.e. the computer selection of sentences having the greatest potential for conveying to the reader the substance of the document. While previous work has focused on one component of sentence significance, namely, the presence of high-frequency content words (key words), the methods described here also treat three additional components: pragmatic words (cue words); title and heading words; and structural indicators (sentence location). The research has resulted in an operating system and a research methodology. The extracting system is parameterized to control and vary the influence of the above four components. The research methodology includes procedures for the compilation of the required dictionaries, the setting of the control parameters, and the comparative evaluation of the automatic extracts with manually produced extracts. The results indicate that the three newly proposed components dominate the frequency component in the production of better extracts.”,”author”:{“dropping-particle”:””,”family”:”Edmundson”,”given”:”H P”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Journal of the Association for Computing Machinery”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”1969″},”page”:”264-285″,”title”:”New methods in automatic extracting”,”type”:”article-journal”,”volume”:”16″},”uris”:”http://www.mendeley.com/documents/?uuid=ebf6e5e7-c6f9-49e3-9715-1744165cbd96″},”mendeley”:{“formattedCitation”:”(H P Edmundson, 1969)”,”plainTextFormattedCitation”:”(H P Edmundson, 1969)”,”previouslyFormattedCitation”:”(H P Edmundson, 1969)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(H P Edmundson, 1969) used this method. The location method relies on the following intuition headings, sentences in the beginning and end of the text, text formatted in bold, contain important information to the summary.

DEEP LINGUISTIC METHODSLinguistic is a scientific study of language which includes study of semantics and pragmatics. Study of semantics means how meaning is inferred from words and concepts and study of pragmatics includes how meaning is inferred from context. Linguistic approaches are based on considering the connection between the words and trying to find the main concept by analyzing the words. Abstractive text summarization is based on linguistic method which involves the semantic processing for summarization.

LEXICON CHAIN CONSTRACTION
371475308419500The concept of lexical chains was first introduced by Morris and Hirst ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISSN”:”0891-2017″,”abstract”:”In text, lexical cohesion is the result of chains of related words that contribute to the continuity of lexical meaning. These lexical chains are a direct result of units of text being “about the same thing,” and finding text structure involves finding units of text that are about the same thing. Hence, computing the chains is useful, since they will have a correspondence to the structure of the text. Determining the structure of text is an essential step in determining the deep meaning of the text. In this paper, a thesaurus is used as the major knowledge base for computing lexical chains. Correspondences between lexical chains and structural elements are shown to exist. Since the lexical chains are computable, and exist in non-domain-specific text, they provide a valuable indicator of text structure. The lexical chains also provide a semantic context for interpreting words, concepts, and sentences.”,”author”:{“dropping-particle”:””,”family”:”Morris”,”given”:”Jane”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Hirst”,”given”:”Graeme”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Computational Linguistics”,”id”:”ITEM-1″,”issued”:{“date-parts”:”1991″},”page”:”21-48″,”title”:”Lexical cohesion computed by thesaural relations as an indicator of the structure of text”,”type”:”article-journal”,”volume”:”17″},”uris”:”http://www.mendeley.com/documents/?uuid=5c48806c-ecf2-4235-984d-b37f5ef41520″},”mendeley”:{“formattedCitation”:”(Morris & Hirst, 1991)”,”plainTextFormattedCitation”:”(Morris & Hirst, 1991)”,”previouslyFormattedCitation”:”(Morris & Hirst, 1991)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Morris & Hirst, 1991). Basically, lexical chains exploit the cohesion among an arbitrary number of related words. Lexical chains can be computed in a source document by grouping (chaining) sets of words that are semantically related. Identities, synonyms, and hypernyms/hyponyms are the relations among words that might cause them to be grouped into the same lexical chain. Lexical chains are used for IR and grammatical error corrections ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.3115/1034678.1034760″,”ISBN”:”1558606093″,”ISSN”:”1932-6203″,”PMID”:”22087284″,”abstract”:”We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a partof-speech tagger and shallow parser for the identification of nominal groups, and a segmentation algorithm derived from 8. Summarization proceeds in three steps: the original text is first segmented, lexical chains are constructed, strong chains are identified and significant sentences are extracted from the text. We present in this paper empirical results on the identification of strong chains and of significant sentences. Preliminary results indicate that quality indicative summaries are produced and are extensively documented in”,”author”:{“dropping-particle”:””,”family”:”Barzilay”,”given”:”Regina”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Elhadad”,”given”:”Michael”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization”,”id”:”ITEM-1″,”issue”:”48″,”issued”:{“date-parts”:”1997″},”page”:”10-17″,”title”:”Using lexical chains for text summarization”,”type”:”article-journal”,”volume”:”17″},”uris”:”http://www.mendeley.com/documents/?uuid=9413bd94-78c3-405f-9653-a91854eb62ad”},”mendeley”:{“formattedCitation”:”(Barzilay & Elhadad, 1997)”,”plainTextFormattedCitation”:”(Barzilay & Elhadad, 1997)”,”previouslyFormattedCitation”:”(Barzilay & Elhadad, 1997)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Barzilay & Elhadad, 1997), ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1162/089120102762671954″,”ISBN”:”0891-2017″,”ISSN”:”0891-2017″,”abstract”:”While automatic text summarization is an area that has received a great deal of attention in recent research, the problem of efficiency in this task has not been frequently addressed. When the size and quantity of documents available on the Internet and from other sources are considered, the need for a highly efficient tool that produces usable summaries is clear. We present a linear-time algorithm for lexical chain computation. The algorithm makes lexical chains a computationally feasible candidate as an intermediate representation for automatic text summarization. A method for evaluating lexical chains as an intermediate step in summarization is also presented and carried out. Such an evaluation was heretofore not possible because of the computational complexity of previous lexical chains algorithms.”,”author”:{“dropping-particle”:””,”family”:”Silber”,”given”:”H Gregory”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Computational Linguistics”,”id”:”ITEM-1″,”issue”:”4″,”issued”:{“date-parts”:”2002″},”page”:”487-496″,”title”:”Effciently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization”,”type”:”article-journal”,”volume”:”28″},”uris”:”http://www.mendeley.com/documents/?uuid=3bbfe74a-fdf1-468e-9526-7769803c3c4d”},”mendeley”:{“formattedCitation”:”(Silber, 2002)”,”plainTextFormattedCitation”:”(Silber, 2002)”,”previouslyFormattedCitation”:”(Silber, 2002)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Silber, 2002). In computing lexical chains, the noun instances must be grouped according to the above relations, but each noun instance must belong to exactly one lexical chain. There are several difficulties in determining which lexical chain a particular word instance should join. Words must be grouped such that it creates a strongest and longest lexical chain.

Figure 2.1: Commonly used methods for the Text Summarization.ABSTRACTIVE SUMMARIZATIONAbstractive summarization creates a summary that contains new sentences not available in the original document. In some cases, such methods may use phrases and clauses from the original representation although the overall text is still considered new. Of course, generating new text is often challenging because it requires the use of a language model to create a meaningful sequence of words. Even then, one is not guaranteed that the generated summary will contain meaningful sentences. In general, abstractive summarization is much harder and there is only a limited amount of work on the topic.

It is noteworthy that abstractive summarization requires coherence and ?uency. This requires a high level of semantic understanding of the underlying text, which is beyond the capabilities of modern systems. Completely ?uent abstractive summarization represents an unsolved problem in arti?cial intelligence, and most summarization systems are extractive.
SEMANTIC BASED APPROACHIn Semantic based method, semantic representation of document(s) is used to feed into natural language generation (NLG) system. This method focus on identifying noun phrases and verb phrases by processing linguistic data. Different methods using this approach are discussed here.

RULE BASED METHODThe rule based method comprises of three steps. Firstly, the documents to be classified are represented in terms of their categories. The categories can be from various domains. Hence the first task is to sort these. The next thing is to form questions based on these categories. E.g. amongst the various categories like attacks, disasters, health etc., taking the example of an attack category several questions can be figured out like: – What happened? , when did it happen? Who got affected? What were the consequences? Etc. -Depending upon these questions, rules are generated. Here several verbs and nouns having similar meanings are determined and their positions are correctly identified. -The context selection module selects the best candidate amongst these. -Generation patterns are then used for the generation of summary sentences.

Initial work done on text summarization for English language was started almost fifty years ago,ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1016/j.omega.2006.05.003″,”ISBN”:”0305-0483″,”ISSN”:”03050483″,”abstract”:”Due to the funding scale and complexity of technology, the selection of government sponsored technology development projects can be viewed as a multiple-attribute decision that is normally made by a review committee with experts from academia, industry, and the government. In this paper, we present a fuzzy analytic hierarchy process method and utilize crisp judgment matrix to evaluate subjective expert judgments made by the technical committee of the Industrial Technology Development Program in Taiwan. Our results indicate that the scientific and technological merit is the most important evaluation criterion considered in overall technical committees. We demonstrate how the relative importance of the evaluation criteria changes under various risk environments via simulation. © 2006 Elsevier Ltd. All rights reserved.”,”author”:{“dropping-particle”:””,”family”:”Huang”,”given”:”Chi Cheng”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Chu”,”given”:”Pin Yu”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Chiang”,”given”:”Yu Hsiu”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Omega”,”id”:”ITEM-1″,”issue”:”6″,”issued”:{“date-parts”:”2008″},”page”:”1038-1052″,”title”:”A fuzzy AHP application in government-sponsored R&D project selection”,”type”:”article-journal”,”volume”:”36″},”uris”:”http://www.mendeley.com/documents/?uuid=b226d84f-27b5-44ff-bee1-b2d883e0f20f”},”mendeley”:{“formattedCitation”:”(Huang, Chu, & Chiang, 2008)”,”plainTextFormattedCitation”:”(Huang, Chu, & Chiang, 2008)”,”previouslyFormattedCitation”:”(Huang, Chu, & Chiang, 2008)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Huang, Chu, & Chiang, 2008). Text summarization in its infancy comprise of reading the original document and attempting to understand the contents, and after that generating a short document of the content. The automatically summarized text was generated by a machine that assessed the importance of the information within the input document, based on a user’s or application’s needs, ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“author”:{“dropping-particle”:””,”family”:”Móro”,”given”:”Róbert”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Information Science and Technologies Bulletin of the ACM Slovakia”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”2012″},”page”:”56-58″,”title”:”Combinations of Different Raters for Text Summarization”,”type”:”article-journal”,”volume”:”4″},”uris”:”http://www.mendeley.com/documents/?uuid=ebc0ca6b-b5f7-4c7a-aa96-be9dffc63f1b”},”mendeley”:{“formattedCitation”:”(Móro, 2012)”,”plainTextFormattedCitation”:”(Móro, 2012)”,”previouslyFormattedCitation”:”(Móro, 2012)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Móro, 2012). The earliest research on automatic summarization consisted of selecting sentences from a source document based on the term frequencies to measure sentence relevance,ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1147/rd.22.0159″,”ISBN”:”0018-8646″,”ISSN”:”0018-8646″,”abstract”:”Excerpts of technical papers and magazine articles that serve the purposes of conventional abstracts have been created entirely by automatic means. In the exploratory research described, the complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program. Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the “auto-abstract.””,”author”:{“dropping-particle”:””,”family”:”Luhn”,”given”:”H. P.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”IBM Journal of Research and Development”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”1958″},”page”:”159-165″,”title”:”The Automatic Creation of Literature Abstracts”,”type”:”article-journal”,”volume”:”2″},”uris”:”http://www.mendeley.com/documents/?uuid=5df3e557-5373-44cd-ad34-e9c3a43daca0″},”mendeley”:{“formattedCitation”:”(Luhn, 1958)”,”plainTextFormattedCitation”:”(Luhn, 1958)”,”previouslyFormattedCitation”:”(Luhn, 1958)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Luhn, 1958), sentence positions in a paragraph ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1147/rd.24.0354″,”author”:{“dropping-particle”:””,”family”:”Baxendale”,”given”:”Phyllis B”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”I.B.M. Journal of Research and Development”,”id”:”ITEM-1″,”issue”:”4″,”issued”:{“date-parts”:”1958″},”page”:”354-361″,”title”:”Machine-Made Index for Technical Literature- An Experiment”,”type”:”article-journal”,”volume”:”2″},”uris”:”http://www.mendeley.com/documents/?uuid=84280b16-e4f1-43f5-af84-a0a8da2bccb4″},”mendeley”:{“formattedCitation”:”(Baxendale, 1958)”,”plainTextFormattedCitation”:”(Baxendale, 1958)”,”previouslyFormattedCitation”:”(Baxendale, 1958)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Baxendale, 1958), and sentence similarity, ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1145/383952.383955″,”ISBN”:”1581133316″,”ISSN”:”01635840″,”abstract”:”In this paper, we propose two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents. The first method uses standard IR methods to rank sentence relevances, while the second method uses the latent semantic analysis technique to identify semantically important sentences, for summary creations. Both methods strive to select sentences that are highly ranked and different from each other. This is an attempt to create a summary with a wider coverage of the document’s main content and less redundancy. Performance evaluations on the two summarization methods are conducted by comparing their summarization outputs with the manual summaries generated by three independent human evaluators. The evaluations also study the influence of different VSM weighting schemes on the text summarization performances. Finally, the causes of the large disparities in the evaluators’ manual summarization results are investigated, and discussions on human text summarization patterns are presented.”,”author”:{“dropping-particle”:””,”family”:”Gong”,”given”:”Yihong”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Liu”,”given”:”Xin”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval – SIGIR ’01”,”id”:”ITEM-1″,”issued”:{“date-parts”:”2001″},”page”:”19-25″,”title”:”Generic text summarization using relevance measure and latent semantic analysis”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=4c2fa2d8-57c1-4952-af41-eccf16c94fac”},”mendeley”:{“formattedCitation”:”(Gong ; Liu, 2001)”,”plainTextFormattedCitation”:”(Gong ; Liu, 2001)”,”previouslyFormattedCitation”:”(Gong ; Liu, 2001)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Gong & Liu, 2001). Sentences are included in the summary if the words in the sentence have sufficiently high scores. Most supervised extractive methods used currently focus on utilization of powerful machine learning algorithms that can properly combine these features.

Other approaches consist of statistical analysis, generally based assessing the structure of the text via discourse analysis combined with training algorithms that use human generated summaries to estimate the importance probabilities of sentences from the source document. The importance probabilities would then be used to determine if a sentence should be included in the summary. The use of Bayesian models in text summarization systems is popular due to its simplicity,ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1109/TKDE.2006.180″,”ISBN”:”1041-4347″,”ISSN”:”10414347″,”abstract”:”While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM”,”author”:{“dropping-particle”:””,”family”:”Myaeng”,”given”:”Sung Hyon”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Han”,”given”:”Kyoung Soo”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Rim”,”given”:”Hae Chang”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”IEEE Transactions on Knowledge and Data Engineering”,”id”:”ITEM-1″,”issue”:”11″,”issued”:{“date-parts”:”2006″},”page”:”1457-1466″,”title”:”Some effective techniques for naive bayes text classification”,”type”:”article-journal”,”volume”:”18″},”uris”:”http://www.mendeley.com/documents/?uuid=1f506615-ad79-4616-849b-0a2ac945f06d”},”mendeley”:{“formattedCitation”:”(Myaeng, Han, ; Rim, 2006)”,”plainTextFormattedCitation”:”(Myaeng, Han, ; Rim, 2006)”,”previouslyFormattedCitation”:”(Myaeng, Han, ; Rim, 2006)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Myaeng, Han, & Rim, 2006). Other work, ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1145/215206.215333″,”ISBN”:”0897917146″,”ISSN”:”01635840″,”abstract”:”To summarize is to reduce in complexity, and hence in length, while retaining some of the essential qualities of the original. This paper focusses on document extracts, a particular kind of computed document summary. Document extracts consisting of roughly 20% of the original cart be as informative as the full text of a document, which suggests that even shorter extracts may be useful indicative summmies. The trends in our results are in agreement with those of Edmundson who used a subjectively weighted combination of features as opposed to training the feature weights using a corpus.”,”author”:{“dropping-particle”:””,”family”:”Kupiec”,”given”:”Julian”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Pedersen”,”given”:”Jan”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Chen”,”given”:”Francine”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval – SIGIR ’95”,”id”:”ITEM-1″,”issued”:{“date-parts”:”1995″},”page”:”68-73″,”title”:”A trainable document summarizer”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=a86c16f4-0244-48d7-a715-7a04a1141157″},”mendeley”:{“formattedCitation”:”(Kupiec, Pedersen, & Chen, 1995)”,”plainTextFormattedCitation”:”(Kupiec, Pedersen, & Chen, 1995)”,”previouslyFormattedCitation”:”(Kupiec, Pedersen, & Chen, 1995)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Kupiec, Pedersen, & Chen, 1995), claimed that the corpus-trained featuring weights increase accuracy, an assertion that was supported byADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1145/321510.321519″,”ISBN”:”0004-5411″,”ISSN”:”00045411″,”abstract”:”This paper describes new methods of automatically extracting documents for screening purposes, i.e. the computer selection of sentences having the greatest potential for conveying to the reader the substance of the document. While previous work has focused on one component of sentence significance, namely, the presence of high-frequency content words (key words), the methods described here also treat three additional components: pragmatic words (cue words); title and heading words; and structural indicators (sentence location). The research has resulted in an operating system and a research methodology. The extracting system is parameterized to control and vary the influence of the above four components. The research methodology includes procedures for the compilation of the required dictionaries, the setting of the control parameters, and the comparative evaluation of the automatic extracts with manually produced extracts. The results indicate that the three newly proposed components dominate the frequency component in the production of better extracts.”,”author”:{“dropping-particle”:””,”family”:”Edmundson”,”given”:”Harold P.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Journal of the ACM”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”1969″},”page”:”264-285″,”title”:”New Methods in Automatic Extracting”,”type”:”article-journal”,”volume”:”16″},”uris”:”http://www.mendeley.com/documents/?uuid=cc5c53ab-d027-445e-a361-928764d6bb53″},”mendeley”:{“formattedCitation”:”(Harold P. Edmundson, 1969)”,”plainTextFormattedCitation”:”(Harold P. Edmundson, 1969)”,”previouslyFormattedCitation”:”(Harold P. Edmundson, 1969)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Harold P. Edmundson, 1969) . This model handles each sentence individually, which results in main connection between the sentences being ignored. Genetic algorithms can be used to calculate the weights of each sentence in the summary as shown byADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1016/j.ipm.2004.04.003″,”ISBN”:”3540002618″,”ISSN”:”03064573″,”abstract”:”This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA+T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA+T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA+GA, 44% and 40% for LSA+T.R.M. in single-document and corpus level were achieved respectively. © 2004 Elsevier Ltd. All rights reserved.”,”author”:{“dropping-particle”:””,”family”:”Yeh”,”given”:”Jen Yuan”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Ke”,”given”:”Hao Ren”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Yang”,”given”:”Wei Pang”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Meng”,”given”:”I. Heng”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Information Processing and Management”,”id”:”ITEM-1″,”issue”:”1″,”issued”:{“date-parts”:”2005″},”page”:”75-95″,”title”:”Text summarization using a trainable summarizer and latent semantic analysis”,”type”:”article-journal”,”volume”:”41″},”uris”:”http://www.mendeley.com/documents/?uuid=4d6bb245-01fa-4019-9c60-2259205b94a9″},”mendeley”:{“formattedCitation”:”(Yeh, Ke, Yang, & Meng, 2005)”,”plainTextFormattedCitation”:”(Yeh, Ke, Yang, & Meng, 2005)”,”previouslyFormattedCitation”:”(Yeh, Ke, Yang, & Meng, 2005)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Yeh, Ke, Yang, & Meng, 2005).

Despite its benefits, statistical methods have their shortcomings when used for text summarization such as need for human supervision when dealing with ambiguous words; misunderstood rhetoric, construing non-text objects and synonyms and other context dependent terms. Nonetheless statistical approaches to text summarization are still considered useful, ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISBN”:”1550-8366″,”author”:{“dropping-particle”:””,”family”:”McCargar”,”given”:”V”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Bulletin of the American Society for Information Science and Technology”,”id”:”ITEM-1″,”issue”:”4″,”issued”:{“date-parts”:”2004″},”page”:”21-25″,”title”:”Statistical approaches to automatic text summarization”,”type”:”article”,”volume”:”30″},”uris”:”http://www.mendeley.com/documents/?uuid=554cf2e7-c8c5-4238-8c74-bdd1a15f003e”},”mendeley”:{“formattedCitation”:”(McCargar, 2004)”,”plainTextFormattedCitation”:”(McCargar, 2004)”,”previouslyFormattedCitation”:”(McCargar, 2004)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(McCargar, 2004). Recent research on text summarization has overcome some of the problems of statistical approaches by combining them with other approaches. For example,ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1007/978-3-642-25483-3″,”ISBN”:”978-3-642-25482-6″,”ISSN”:”18650929″,”author”:{“dropping-particle”:””,”family”:”Tofighy”,”given”:”Mohsen”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Kashefi”,”given”:”Omid”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Zamanifar”,”given”:”Azadeh”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”id”:”ITEM-1″,”issue”:”May 2014″,”issued”:{“date-parts”:”2011″},”title”:”Persian Text Summarization Using Fractal Theory”,”type”:”article-journal”,”volume”:”254″},”uris”:”http://www.mendeley.com/documents/?uuid=18b16ead-8f5f-4c5d-9d8e-d394cf7a5af0″},”mendeley”:{“formattedCitation”:”(M. Tofighy, Kashefi, & Zamanifar, 2011)”,”plainTextFormattedCitation”:”(M. Tofighy, Kashefi, & Zamanifar, 2011)”,”previouslyFormattedCitation”:”(M. Tofighy, Kashefi, & Zamanifar, 2011)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(M. Tofighy, Kashefi, & Zamanifar, 2011) presented an automatic text summarization system combining both a statistical approach and fractal theory to summarize documents
AUTOMATIC URDU TEXT SUMMARIZATIONAfter giving the short description of previous work on Automatic Text Summarization for English and other resource rich language, in this section, the work done in URDU language related to Automatic Text Summarization is discussed.

For the past few years lot of work has been done in Urdu text processing but no suitable work has been presented in the area of Urdu language text Summarization. Only the work that is presented for Urdu language is text summarization using sentence weight Algorithm for word processor ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“author”:{“dropping-particle”:””,”family”:”Burney”,”given”:”Aqil”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Sami”,”given”:”Badar”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Mahmood”,”given”:”Nadeem”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”International Journal of Computer Applications”,”id”:”ITEM-1″,”issue”:”19″,”issued”:{“date-parts”:”2012″},”page”:”38-43″,”title”:”Urdu Text Summarizer using Sentence Weight Algorithm for Word Processors”,”type”:”article-journal”,”volume”:”46″},”uris”:”http://www.mendeley.com/documents/?uuid=49c9cbc6-411f-4a04-b312-b2e5c326ae0a”},”mendeley”:{“formattedCitation”:”(Burney, Sami, & Mahmood, 2012)”,”plainTextFormattedCitation”:”(Burney, Sami, & Mahmood, 2012)”,”previouslyFormattedCitation”:”(Burney, Sami, & Mahmood, 2012)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Burney, Sami, & Mahmood, 2012), this work took the first step toward Urdu text summarization, it has some deficiencies as only one feature that is number of content words in the sentence and then calculate a sentence score of the basis of content words and include the sentence in final summary which has maximum content words. While ranking the sentences if the length of sentence is large and too many number of stop words in sentence. However significant research work of summarization has been presented in Urdu like language such as Persian ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISSN”:”01279084″,”abstract”:”In recent years, there has been an increasing amount of information on the web. Some of essential resources to shorten text documents use summarization technologies. In this paper, we present an AHP technique for Persian Text Summarization. This proposed model uses analytical hierarchy as a base factor for an evaluation algorithm and improves the summarization quality of Persian language text. The weighting and combination methods are two main contributions of the proposed text evaluation algorithm.”,”author”:{“dropping-particle”:””,”family”:”Tofighy”,”given”:”Seyyed Mohsen”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Raj”,”given”:”Ram Gopal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Javadi”,”given”:”Hamid Haj Seyyed”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Malaysian Journal of Computer Science”,”id”:”ITEM-1″,”issue”:”1″,”issued”:{“date-parts”:”2013″},”page”:”1-8″,”title”:”AHP techniques for persian text summarization”,”type”:”article-journal”,”volume”:”26″},”uris”:”http://www.mendeley.com/documents/?uuid=c1701653-44d6-42dc-b851-3b68c4db1184″},”mendeley”:{“formattedCitation”:”(S. M. Tofighy, Raj, & Javadi, 2013)”,”plainTextFormattedCitation”:”(S. M. Tofighy, Raj, & Javadi, 2013)”,”previouslyFormattedCitation”:”(S. M. Tofighy, Raj, & Javadi, 2013)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(S. M. Tofighy, Raj, & Javadi, 2013), ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“author”:{“dropping-particle”:””,”family”:”Tofighy”,”given”:”Mohsen”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Kashefi”,”given”:”Omid”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Zamanifar”,”given”:”Azadeh”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Text”,”id”:”ITEM-1″,”issued”:{“date-parts”:”0″},”page”:”651-662″,”title”:”CCIS 252 – Persian Text Summarization Using Fractal Theory”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=5932dd71-60e9-45ec-9d77-53cdde14ee5d”},”mendeley”:{“formattedCitation”:”(M. Tofighy, Kashefi, & Zamanifar, n.d.)”,”plainTextFormattedCitation”:”(M. Tofighy, Kashefi, & Zamanifar, n.d.)”,”previouslyFormattedCitation”:”(M. Tofighy, Kashefi, & Zamanifar, n.d.)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(M. Tofighy, Kashefi, & Zamanifar, n.d.) and Arabic ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“author”:{“dropping-particle”:””,”family”:”Imam”,”given”:”Ibrahim”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Nounou”,”given”:”Nihal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Hamouda”,”given”:”Alaa”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Allah”,”given”:”Hebat”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Khalek”,”given”:”Abdul”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”id”:”ITEM-1″,”issued”:{“date-parts”:”2013″},”page”:”2-6″,”title”:”Query Based Arabic Text Summarization”,”type”:”article-journal”,”volume”:”8491″},”uris”:”http://www.mendeley.com/documents/?uuid=8144368c-4b9a-4b48-9015-8575879bf50b”},”mendeley”:{“formattedCitation”:”(Imam, Nounou, Hamouda, Allah, & Khalek, 2013)”,”plainTextFormattedCitation”:”(Imam, Nounou, Hamouda, Allah, & Khalek, 2013)”,”previouslyFormattedCitation”:”(Imam, Nounou, Hamouda, Allah, & Khalek, 2013)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Imam, Nounou, Hamouda, Allah, & Khalek, 2013).
Chapter 3
MATERIALS AND METHODS In this chapter, methodology for Automatic URDU Text Summarization using AHP (Analytic Hierarchy Process) is briefly provided. Figure 3.1 illustrates the methodology that will be used for this purpose. Finally, in section 3.3 the sequence of steps which are pursued in order to gain the summarizing accuracy results.
MODULES OF ANALYTIC HIERARCHY PROCESS SUMMARIZATION
Text Summarization is reducing the source text into a shorter version maintaining its information content and overall sense. Extractive summaries ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“author”:{“dropping-particle”:””,”family”:”Babar”,”given”:”S A”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”id”:”ITEM-1″,”issue”:”4″,”issued”:{“date-parts”:”2014″},”page”:”170-177″,”title”:”Improving Text Summarization using Fuzzy Logic & Latent Semantic Analysis”,”type”:”article-journal”,”volume”:”1″},”uris”:”http://www.mendeley.com/documents/?uuid=a5c7b182-2839-420a-a01b-4a8fe1ac40b8″},”mendeley”:{“formattedCitation”:”(Babar, 2014)”,”plainTextFormattedCitation”:”(Babar, 2014)”,”previouslyFormattedCitation”:”(Babar, 2014)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Babar, 2014)are formulated by extracting key text pieces (sentences or passages) from the text, based on statistical analysis of separate or mixed surface level features such as word/phrase frequency, location or cue words to locate the sentences to be selected. The “most significant” content is treated as the “most frequent” or the “most favorably positioned” content. Such a tactic thus avoids any efforts on deep text understanding. They are conceptually simple, easy to implement.

A typical extractive text summarization procedureADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.4304/jetwi.1.1.60-76″,”ISBN”:”0954405021″,”ISSN”:”17998859″,”abstract”:”Text Mining has become an important research area. Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. In this paper, a Survey of Text Mining techniques and applications have been s presented.”,”author”:{“dropping-particle”:””,”family”:”Gupta”,”given”:”Vishal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Lehal”,”given”:”Gurpreet S.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Journal of Emerging Technologies in Web Intelligence”,”id”:”ITEM-1″,”issue”:”1″,”issued”:{“date-parts”:”2009″},”page”:”60-76″,”title”:”A survey of text mining techniques and applications”,”type”:”article-journal”,”volume”:”1″},”uris”:”http://www.mendeley.com/documents/?uuid=10462140-2247-4022-96fb-401f0285601c”},”mendeley”:{“formattedCitation”:”(Gupta & Lehal, 2009)”,”plainTextFormattedCitation”:”(Gupta & Lehal, 2009)”,”previouslyFormattedCitation”:”(Gupta & Lehal, 2009)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Gupta & Lehal, 2009) can be divided into two steps: 1) Pre Processing step and 2) Processing step. Pre Processing is organized representation of the original text. It usually includes: a) Sentences boundary identification. In English, sentence boundary is identified with existence of full stop at the completion of sentence. b) Stop-Word Removal. Common words with no semantics and which do not aggregate significant information to the task are removed. In Processing step, features influencing the relevance of sentences are decided and calculated and then weights are allocated to these features using weight learning procedure. Final score of each sentence is determined using Feature-weight equation. Top relevant sentences are extracted for final summary.

2667003120390Figure STYLEREF 1 s ?3. SEQ Figure * ARABIC s 1 1: Steps of AHP (Analytic Hierarchy Process) Urdu Text SummarizationFigure ?3.1: Steps of AHP (Analytic Hierarchy Process) Urdu Text Summarization2667006477000
Figure 3.1 shows the scheme of proposed system. In the first step, the source document is taken as input from a text file.

In the second step we prepare the text for feature extraction by splitting the whole document into set of sentences and then tokenizes the sentences. In the third step, the features such as Word frequency (F1), Keywords in the sentence (F2), Sentence Location (F3) and Sentence Length (F4) is extracted from the document for each sentence and combine the result to get the final score of sentence and on the basis of that score we decide whether that sentence is include in the final summary or not. After that final summary is compared with gold standard to calculate the precision recall and f-measure to check the accuracy of our system.

DATA SETUPURDU is written in a modi?ed Perso-Arabic script from right to left. It requires speci?c rendering to be viewed properly. Normally, it is written in Nastalique, a highly complex writing system that is cursive and context sensitive.

Determining the informational content of URDU text is a complex process as often involves processes such as tokenization, ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISSN”:”01279084″,”abstract”:”Natural Language Processing (NLP) is an integral part of a conversation and by proxy an integral part of a chatterbot. Building a complete vocabulary for a chatterbot is a prohibitively time and effort intensive endeavor and thus makes a learning chatterbot a much more efficient alternative. Learning can be performed from many facets including individual words to phrases and concepts. From the perspective of words, the grammatical parts of speech become important since they allow meaning and structure to be derived from a sentence. Verbs tend to be unique since they have different forms, namely participles and tenses. As such we present an algorithm to derive the base verb from any participle or tense.”,”author”:{“dropping-particle”:””,”family”:”Raj”,”given”:”Ram Gopal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Abdul-Kareem”,”given”:”S.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Malaysian Journal of Computer Science”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”2011″},”page”:”63-72″,”title”:”A pattern based approach for the derivation of base forms of verbs from participles and tenses for flexible NLP”,”type”:”article-journal”,”volume”:”24″},”uris”:”http://www.mendeley.com/documents/?uuid=698eb5ad-e0df-4b50-b6dc-59d146de5d60″},”mendeley”:{“formattedCitation”:”(Raj & Abdul-Kareem, 2011)”,”plainTextFormattedCitation”:”(Raj & Abdul-Kareem, 2011)”,”previouslyFormattedCitation”:”(Raj & Abdul-Kareem, 2011)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Raj & Abdul-Kareem, 2011), information dissemination,ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISSN”:”01279084″,”abstract”:”Conversational systems or chatterbots converse/chat by learning from their interactions with users. To do this the systems must have an adaptive knowledge base that can be updated by the systems themselves. RONE is a tele-text based conversational system. RONE’s knowledge base is built using SQL and accessed using the main Java application. Additionally, RONE uses conjunctions and prepositions as markers to expedite the dissemination and storage of information which helps him learn. In this paper, we describe the approach RONE uses to break up new information for learning purposes – the principle technique introduced here being the storage of information in a format to answer all the possible questions directly without inference. We also look at other conversation based learning approaches and their limitations. Further, we compare RONE’s performance against some contemporary conversational systems and provide evidence of the relative superior informational accuracy of RONE’s responses to user interrogation. RONE’s better performance is noteworthy because it is relative to systems which are Loebner Prize medal winners.”,”author”:{“dropping-particle”:””,”family”:”Raj”,”given”:”Ram Gopal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Abdul-Kareem”,”given”:”S.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Malaysian Journal of Computer Science”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”2009″},”page”:”138-160″,”title”:”Information dissemination and storage for tele-text based conversational systems’ learning”,”type”:”article-journal”,”volume”:”22″},”uris”:”http://www.mendeley.com/documents/?uuid=be0bba5d-c161-4bc4-b635-39b06fb2cd83″},”mendeley”:{“formattedCitation”:”(Raj ; Abdul-Kareem, 2009)”,”plainTextFormattedCitation”:”(Raj ; Abdul-Kareem, 2009)”,”previouslyFormattedCitation”:”(Raj ; Abdul-Kareem, 2009)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Raj & Abdul-Kareem, 2009), as well as other models that are used to compare information,ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISSN”:”01279084″,”abstract”:”Conversational systems are gaining popularity rapidly. Consequently, the believability of the conversational systems or chatterbots is becoming increasingly important. Recent research has proven that learning chatterbots tend to be rated as being more believable by users. Based on Raj’s Model for Chatterbot Trust, we present a model for allowing chatterbots to determine the degree of contradictions in contradictory statements when learning thereby allowing them to potentially learn more accurately via a form of discourse. Some information that is learnt by a chatterbot may be contradicted by other information presented subsequently. Choosing correctly which information to use is critical in chatterbot believability. Our model uses sentence structures and patterns to compute contradiction degrees that can be used to overcome the limitations of Raj’s Trust Model, which takes any contradictory information as being equally contradictory as opposed to some contradictions being greater than others and therefore having a greater impact on the actions that the chatterbot should take. This paper also presents the relevant proofs and tests of the contradiction degree model as well as a potential implementation method to integrate our model with Raj’s Trust Model.”,”author”:{“dropping-particle”:””,”family”:”Raj”,”given”:”Ram Gopal”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Balakrishnan”,”given”:”Vimala”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Malaysian Journal of Computer Science”,”id”:”ITEM-1″,”issue”:”3″,”issued”:{“date-parts”:”2011″},”page”:”160-167″,”title”:”A model for determining the degree of contradictions in information”,”type”:”article-journal”,”volume”:”24″},”uris”:”http://www.mendeley.com/documents/?uuid=239487a2-9f28-4e5b-9a85-c1060b8309bd”},”mendeley”:{“formattedCitation”:”(Raj & Balakrishnan, 2011)”,”plainTextFormattedCitation”:”(Raj & Balakrishnan, 2011)”,”previouslyFormattedCitation”:”(Raj & Balakrishnan, 2011)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Raj & Balakrishnan, 2011).
The Dataset that will be used in this study is from publicly available database provided Urdu Summary Corpus by ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISBN”:”9782951740891″,”author”:{“dropping-particle”:””,”family”:”Humayoun”,”given”:”Muhammad”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Muhammad”,”given”:”Rao”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Nawab”,”given”:”Adeel”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Uzair”,”given”:”Muhammad”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Aslam”,”given”:”Saba”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Farzand”,”given”:”Omer”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”id”:”ITEM-1″,”issued”:{“date-parts”:”2014″},”page”:”796-800″,”title”:”Urdu Summary Corpus”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=891f3d46-eba2-4f52-baff-bb14ffa77a66″},”mendeley”:{“formattedCitation”:”(Humayoun et al., 2014)”,”plainTextFormattedCitation”:”(Humayoun et al., 2014)”,”previouslyFormattedCitation”:”(Humayoun et al., 2014)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Humayoun et al., 2014) the data set has fifty (50) articles or different type including News Current affairs, Health, sports, Science & Technology, Tourism, Religion and Miscellaneous articles. In this data set summary provided is of abstractive type that why we cannot use it has gold standard for this I give the documents to different people to generate summary for documents of extractive type. For the summary writing, a group of volunteers was selected. They were native speakers of Urdu, either (1) academicians teaching Urdu in colleges, or, (2) university students that have an interest in Urdu literature. For summary writing, we did not pose any size restriction for human-written summaries. We asked the writers to produce good summaries; doesn’t matter if a summary is large, medium or small in size. However, the summary must not exceed half the size of an article.
URDU TEXT SUMMARIZATION WITH ANALYTIC HIERARCHY PROCESSThe algorithm of calculating weight of sentences is listed below:
ALGORITHM-1:
Algorithm is consisted of three phases as given
Phase 1: Sentence scoring according linear combinations of different measures.

Input: Documents
Output: Scored sentences
Phase 2: Relevant sentence Extraction (Summary Generation).

Input: Number of sentences
Output: Extracted sentences
Phase 3: Evaluation of Summary.

Input: Different summaries as “Standard Summaries” and “Peer summaries”
Output: Precision, Recall and F-score
Input source document
Pre Processing
Split document into set of sentences.

Tokenize each sentence of the source document
Feature Extraction
At this stage we compute different type of feature from source document for calculating sentence score.

Repeat this step for each sentence.

Summary Generation
Combine the result of each feature to get single score of each sentence.

Combine the result of each feature to get single score of each sentence.

If score of sentence is greater than minimum threshold than include into summary and discord sentence otherwise.

Evaluation of Summary
Finally compare system developed summary with human developed summary to check the accuracy of the algorithm by computing recall, precision and f-measure.

_______________________________________________________________________
The most important step toward summarization problems is to extract features that are Word frequency (F1), Keywords in the sentence (F2), Sentence Location (F3) and Sentence Length (F4).The Feature in step (3) are of the algorithm are computed using the formulas:
F1: WORD FREQUENCYThe initial work in Document Summarization started on “Single Document Summarization”, by ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1147/rd.22.0159″,”ISBN”:”0018-8646″,”ISSN”:”0018-8646″,”abstract”:”Excerpts of technical papers and magazine articles that serve the purposes of conventional abstracts have been created entirely by automatic means. In the exploratory research described, the complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program. Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the “auto-abstract.””,”author”:{“dropping-particle”:””,”family”:”Luhn”,”given”:”H. P.”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”IBM Journal of Research and Development”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”1958″},”page”:”159-165″,”title”:”The Automatic Creation of Literature Abstracts”,”type”:”article-journal”,”volume”:”2″},”uris”:”http://www.mendeley.com/documents/?uuid=5df3e557-5373-44cd-ad34-e9c3a43daca0″},”mendeley”:{“formattedCitation”:”(Luhn, 1958)”,”plainTextFormattedCitation”:”(Luhn, 1958)”,”previouslyFormattedCitation”:”(Luhn, 1958)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Luhn, 1958) at IBM. Luhn proposed a frequency based model, frequency of word play a critical role, to decide the importance of any word or sentence in a particular document. Frequency usage of each word is calculated after removal of stock words such as ‘??’, ‘??’, ‘ ???’ and so on. These words are often the most frequent words in sentences but have little semantic impact on a sentences meaning. The most common measure used to calculate the word frequency is as (1).

(1)
Where Content words, Total Words and Stop words are the respective count of the words for a sentence. The sentence weight in the step (ii) of the algorithm is calculated using the following formula
(2)
F2: KEYWORDS IN THE SENTENCE
Keywords are usually words that have the highest occurrences within a sentence. It can be calculated as the ratio of the number of thematic words that occur in the sentence to maximum number of key words in the sentence, as (3).

(3)
F3: SENTENCE LOCATIONThis feature based on the assumption that first and end sentences of a paragraph are the most important.ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1147/rd.24.0354″,”author”:{“dropping-particle”:””,”family”:”Baxendale”,”given”:”Phyllis B”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”I.B.M. Journal of Research and Development”,”id”:”ITEM-1″,”issue”:”4″,”issued”:{“date-parts”:”1958″},”page”:”354-361″,”title”:”Machine-Made Index for Technical Literature- An Experiment”,”type”:”article-journal”,”volume”:”2″},”uris”:”http://www.mendeley.com/documents/?uuid=84280b16-e4f1-43f5-af84-a0a8da2bccb4″},”mendeley”:{“formattedCitation”:”(Baxendale, 1958)”,”plainTextFormattedCitation”:”(Baxendale, 1958)”,”previouslyFormattedCitation”:”(Baxendale, 1958)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Baxendale, 1958), introduced a feature based on “Sentence Position”. Although his work was almost manual but, later on this measure used widely in sentence scoring. Position score is calculated as (4), where n is the number of sentences of paragraph which sentence is located it and i is ordinal number of sentence that regarding its position among other sentences.
(4)
F4: SENTENCE LENGTHThis feature is useful to penalize sentences that are too short, such as these sentences are not expected to belong to the summary, which is calculated as in (5).

(5)

Chapter 4
RESULTS AND DISCUSSION In this chapter, URDU dataset (Corpora) that were used for Automatic Text Summarization (ATS) is discussed first. Then described testing phase, result that are achieved by using AHP algorithm and then evaluate the performance of summarization. In the end, the result obtained from these experiments are presented and briefly discussed.

URDU CORPORALanguage resources, such as corpora, are important for various natural language processing tasks. Acquiring URDU language corpus is no longer a big problem since now a days most of the documents are written in a machine readable format and are available on the web. Urdu has millions of speakers around the world but it is under-resourced in terms of standard evaluation resources. Also, the number of URDU documents are available on the web of different categories. The corpus used in the experiments is from publicly available corpus provided Urdu Summary Corpus by ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISBN”:”9782951740891″,”author”:{“dropping-particle”:””,”family”:”Humayoun”,”given”:”Muhammad”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Muhammad”,”given”:”Rao”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Nawab”,”given”:”Adeel”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Uzair”,”given”:”Muhammad”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Aslam”,”given”:”Saba”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Farzand”,”given”:”Omer”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”id”:”ITEM-1″,”issued”:{“date-parts”:”2014″},”page”:”796-800″,”title”:”Urdu Summary Corpus”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=891f3d46-eba2-4f52-baff-bb14ffa77a66″},”mendeley”:{“formattedCitation”:”(Humayoun et al., 2014)”,”plainTextFormattedCitation”:”(Humayoun et al., 2014)”,”previouslyFormattedCitation”:”(Humayoun et al., 2014)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Humayoun et al., 2014). The benchmark corpus is small yet pioneering effort in the context of Urdu and it is distributed freely.
The data set has fifty (50) articles of different type including News Current affairs, Health, sports, Science ; Technology, Tourism, Religion and Miscellaneous articles. In this corpus summary provided is of abstractive type that why we cannot use it has gold standard for this I give the documents to different people to generate summary for documents of extractive type. For the summary writing, a group of volunteers was selected. They were native speakers of Urdu, either (1) academicians teaching Urdu in colleges, or, (2) university students that have an interest in Urdu literature. For summary writing, we did not pose any size restriction for human-written summaries. We asked the writers to produce good summaries; doesn’t matter if a summary is large, medium or small in size. However, the summary must not exceed half the size of an article. Corpus set categories are show in Table 4.1.
Table 4. SEQ Table_ * ARABIC s 1 1: Categories of the articles used in Urdu Summary CorpusCategory Articles
News 6
Current Affairs 6
Health 6
Sports 10
Science & Technology 10
Tourism 3
Religion 4
Miscellaneous 5
Total 50

EVALUTION METHODOLOGY
Summary evaluation is a very important aspect for text summarization. To perform series of experiments to quantify the contribution of AHP on URDU Summary Corpus by ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISBN”:”9782951740891″,”author”:{“dropping-particle”:””,”family”:”Humayoun”,”given”:”Muhammad”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Muhammad”,”given”:”Rao”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Nawab”,”given”:”Adeel”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Uzair”,”given”:”Muhammad”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Aslam”,”given”:”Saba”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Farzand”,”given”:”Omer”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”id”:”ITEM-1″,”issued”:{“date-parts”:”2014″},”page”:”796-800″,”title”:”Urdu Summary Corpus”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=891f3d46-eba2-4f52-baff-bb14ffa77a66″},”mendeley”:{“formattedCitation”:”(Humayoun et al., 2014)”,”plainTextFormattedCitation”:”(Humayoun et al., 2014)”,”previouslyFormattedCitation”:”(Humayoun et al., 2014)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Humayoun et al., 2014) containing 50 articles.
EVALUTION MEASURE
In this section, for each experiment explained the evaluation measure for Automatic Text Summarization result. This is a complex issue and many different aspects have to be considered simultaneously in order to evaluate and compare different summarizersADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“ISBN”:”0262511061″,”abstract”:”Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents an analysis of news-article summaries generated by sentence extraction. Sentences are ranked for potential inclusion in the summary using a weighted combination of linguistic features – derived from an analysis of news-wire summaries. This paper evaluates the relative effectiveness of these features. In order to do so, we discuss the construction of a large corpus of extraction-based summaries, and characterize the underlying degree of difficulty of summarization at different compression levels on articles in this corpus. Results on our feature set are presented after normalization by this degree of difficulty.”,”author”:{“dropping-particle”:””,”family”:”Mittal”,”given”:”Vibhu O”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Kantrowitz”,”given”:”Mark”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Goldstein”,”given”:”Jade”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Carbonell”,”given”:”Jaime G”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”AAAI-99 Proceedings”,”id”:”ITEM-1″,”issued”:{“date-parts”:”1999″},”page”:”467-473″,”title”:”Selecting Text Spans for Document Summaries: Heuristics and Metrics”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=f7b4ea65-4c8d-497c-8455-8310cbaf8f06″},”mendeley”:{“formattedCitation”:”(Mittal, Kantrowitz, Goldstein, & Carbonell, 1999)”,”plainTextFormattedCitation”:”(Mittal, Kantrowitz, Goldstein, & Carbonell, 1999)”,”previouslyFormattedCitation”:”(Mittal, Kantrowitz, Goldstein, & Carbonell, 1999)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Mittal, Kantrowitz, Goldstein, & Carbonell, 1999). The evaluation our method is obtained by comparing the experimental summarization results with human summaries that were produced manually, ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1145/375551.375604″,”ISBN”:”1581133618″,”abstract”:”This position paper suggests that progress with automatic summarising demands a better research methodology and a carefully focussed research strategy. In order to develop effective procedures it is necessary to identify and respond to the context factors, i.e. input, purpose, and output factors, that bear on summarising and its evaluation. The paper analyses and illustrates these factors and their implications for evaluation. It then argues that this analysis, together with the state of the art and the intrinsic difficulty of summarising, imply a nearer-term strategy concentrating on shallow, but not surface, text analysis and on indicative summarising. This is illustrated with current work, from which a potentially productive research programme can be developed.”,”author”:{“dropping-particle”:””,”family”:”Jones”,”given”:”Karen Sparck”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”id”:”ITEM-1″,”issued”:{“date-parts”:”1998″},”page”:”1-21″,”title”:”Automatic summarising: factors and directions”,”type”:”article-journal”},”uris”:”http://www.mendeley.com/documents/?uuid=5ed3d6fb-585b-49fa-b626-2f60c8c74d5c”},”mendeley”:{“formattedCitation”:”(Jones, 1998)”,”plainTextFormattedCitation”:”(Jones, 1998)”,”previouslyFormattedCitation”:”(Jones, 1998)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Jones, 1998). Recall is taken as a measure of the informational components of the original text that are correctly extracted and Precision is taken as a measure of the components of extracted information that are correct, ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1145/564376.564397″,”ISBN”:”1581135610″,”abstract”:”With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for propose summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets – the Reuters news- wire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline – non learning – system, and a reference trainable summarizer system.”,”author”:{“dropping-particle”:””,”family”:”Amini”,”given”:”Massih-Reza”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Gallinari”,”given”:”Patrick”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval – SIGIR ’02”,”id”:”ITEM-1″,”issued”:{“date-parts”:”2002″},”page”:”105″,”title”:”The use of unlabeled data to improve supervised learning for text summarization”,”type”:”article-journal”,”volume”:”6″},”uris”:”http://www.mendeley.com/documents/?uuid=097b069d-c0ac-44e3-8c98-25c487d747a7″},”mendeley”:{“formattedCitation”:”(Amini ; Gallinari, 2002)”,”plainTextFormattedCitation”:”(Amini ; Gallinari, 2002)”,”previouslyFormattedCitation”:”(Amini ; Gallinari, 2002)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Amini & Gallinari, 2002). Hence the Recall and Precision are formulated as follows:
RECALLThe ability of the search to find all of the relevant sentences in the document. Recall is the number of sentences occurring in both system and ideal summaries divided by the number of sentences in the ideal summary. In other words we can say that Recall is a measure of how much relevant information the system has extracted from the text; it is thus a measure of the coverage of the system. Which is calculated as in (6).

(6)
PRECISION
The ability to retrieve top-ranked sentences that are mostly relevant. Precision is the number of sentences occurring in both system and ideal summaries divided by the number of sentences in the system summary. In other words we can say that precision is a measure of how much of the information that the system returned is actually correct, and is also known as accuracy Which is calculated as in (7).

(7)
Note that recall and precision are antagonistic to one another
Since a conventional system that tries for perfection in terms of precision will always lower its recall score. Similarly, a system that tries for coverage will get more things wrong, thus lowering its precision score.ADDIN CSL_CITATION {“citationItems”:{“id”:”ITEM-1″,”itemData”:{“DOI”:”10.1007/s10462-015-9442-x”,”ISBN”:”1573-7462″,”ISSN”:”15737462″,”abstract”:”This survey investigates several research studies that have been conducted in the field of Arabic text summarization. Specifically, it addresses summarization and evaluation methods, as well as the corpora used in those studies. The literature in this field is fairly limited and relatively new compared to the available literature on other languages, such as English. Therefore, there exists a great opportunity for further research in Arabic text summarization. In addition, one of the largest problems in Arabic summarization was the absence of Arabic gold standard summaries, although this situation is beginning to change, especially with the inclusion of Arabic language as a part of the corpora and tasks in the TAC 2011 MultiLing Pilot and ACL 2013 MultiLing Workshop. Finally, providing the required corpora and adopting them in Arabic summarization studies is an essential demand.”,”author”:{“dropping-particle”:””,”family”:”Al-Saleh”,”given”:”Asma Bader”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},{“dropping-particle”:””,”family”:”Menai”,”given”:”Mohamed El Bachir”,”non-dropping-particle”:””,”parse-names”:false,”suffix”:””},”container-title”:”Artificial Intelligence Review”,”id”:”ITEM-1″,”issue”:”2″,”issued”:{“date-parts”:”2016″},”page”:”203-234″,”publisher”:”Springer Netherlands”,”title”:”Automatic Arabic text summarization: a survey”,”type”:”article-journal”,”volume”:”45″},”uris”:”http://www.mendeley.com/documents/?uuid=d41aef5c-c1d1-47ef-969c-e07355dfd547″},”mendeley”:{“formattedCitation”:”(Al-Saleh ; Menai, 2016)”,”plainTextFormattedCitation”:”(Al-Saleh ; Menai, 2016)”,”previouslyFormattedCitation”:”(Al-Saleh ; Menai, 2016)”},”properties”:{“noteIndex”:0},”schema”:”https://github.com/citation-style-language/schema/raw/master/csl-citation.json”}(Al-Saleh & Menai, 2016) This situation has led to the use of a combined measure called the F-measure that balances recall and precision.

F-MEASUREOne measure of performance that takes into account both recall and precision. F-Measure actually Harmonic mean of recall and precision Compared to arithmetic mean, both recall and precision need to be high for harmonic mean to be high. F-Measure is calculated using formula as in (8).

(8)
This section of evaluation uses a human-generated summary. The individuals involved in this process are the experts in URDU Language. The summary generated by experts would be used as a reference in obtaining the number of relevant sentences in a particular summary. We have used several different URDU news texts to construct the testing corpus.

RESULTS
The following section provide the results obtained from the evaluation of proposed methodology for Automatic URDU Text Summarization using AHP method.

Chapter 5
CONCLUSIONS AND RECOMMENDATIONS FOR FUTURE WORKINTRODUCTION ABOUT FOURTH CHAPTERREFERENCESADDIN Mendeley Bibliography CSL_BIBLIOGRAPHY Al-Saleh, A. B., & Menai, M. E. B. (2016). Automatic Arabic text summarization: a survey. Artificial Intelligence Review, 45(2), 203–234. https://doi.org/10.1007/s10462-015-9442-x
Amini, M.-R., & Gallinari, P. (2002). The use of unlabeled data to improve supervised learning for text summarization. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR ’02, 6, 105. https://doi.org/10.1145/564376.564397
Babar, S. A. (2014). Improving Text Summarization using Fuzzy Logic & Latent Semantic Analysis, 1(4), 170–177.

Barzilay, R., & Elhadad, M. (1997). Using lexical chains for text summarization. Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, 17(48), 10–17. https://doi.org/10.3115/1034678.1034760
Baxendale, P. B. (1958). Machine-Made Index for Technical Literature- An Experiment. I.B.M. Journal of Research and Development, 2(4), 354–361. https://doi.org/10.1147/rd.24.0354
Burney, A., Sami, B., & Mahmood, N. (2012). Urdu Text Summarizer using Sentence Weight Algorithm for Word Processors. International Journal of Computer Applications, 46(19), 38–43.

Cami, B. R., & Amiri, A. K. (2013). Applying AHP Technique for Trust Evaluation in the Semantic Web. International Journal of Machine Learning and Computing, 3(Icmlc), 17–20. https://doi.org/10.7763/IJMLC.2013.V3.264
Chettri, R., & Chakraborty, U. K. (2017). Automatic Text Summarization. International Journal of Computer Applications, 161(1), 5–7. Retrieved from https://books.google.cz/books?id=jf2jBAAAQBAJ
Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the Association for Computing Machinery, 16(2), 264–285. https://doi.org/10.1145/321510.321519
Edmundson, H. P. (1969). New Methods in Automatic Extracting. Journal of the ACM, 16(2), 264–285. https://doi.org/10.1145/321510.321519
Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR ’01, 19–25. https://doi.org/10.1145/383952.383955
Gupta, V., & Lehal, G. S. (2009). A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence, 1(1), 60–76. https://doi.org/10.4304/jetwi.1.1.60-76
Hovy, E., & Lin, C.-Y. (1999). Automated text summarization in summarist. Advances in Automatic Text Summarization, 81–94. https://doi.org/10.3115/1119089.1119121
Huang, C. C., Chu, P. Y., & Chiang, Y. H. (2008). A fuzzy AHP application in government-sponsored R&D project selection. Omega, 36(6), 1038–1052. https://doi.org/10.1016/j.omega.2006.05.003
Humayoun, M., Muhammad, R., Nawab, A., Uzair, M., Aslam, S., & Farzand, O. (2014). Urdu Summary Corpus, 796–800.

Imam, I., Nounou, N., Hamouda, A., Allah, H., & Khalek, A. (2013). Query Based Arabic Text Summarization, 8491, 2–6.

Jones, K. S. (1998). Automatic summarising: factors and directions, 1–21. https://doi.org/10.1145/375551.375604
Jurafsky, D., & Martin, J. H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Speech and Language Processing An Introduction to Natural Language Processing Computational Linguistics and Speech Recognition, 21, 0–934. https://doi.org/10.1162/089120100750105975
Kupiec, J., Pedersen, J., & Chen, F. (1995). A trainable document summarizer. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR ’95, 68–73. https://doi.org/10.1145/215206.215333
Luhn, H. P. (1958). The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development, 2(2), 159–165. https://doi.org/10.1147/rd.22.0159
McCargar, V. (2004). Statistical approaches to automatic text summarization. Bulletin of the American Society for Information Science and Technology.

Meade, L. M., & Presley, a. (2002). R&D project selection using the analytic network process. Engineering Management, IEEE Transactions On, 49(1), 59–66. https://doi.org/10.1109/17.985748
Mittal, V. O., Kantrowitz, M., Goldstein, J., & Carbonell, J. G. (1999). Selecting Text Spans for Document Summaries: Heuristics and Metrics. AAAI-99 Proceedings, 467–473.

Móro, R. (2012). Combinations of Different Raters for Text Summarization. Information Science and Technologies Bulletin of the ACM Slovakia, 4(2), 56–58.

Morris, J., & Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17, 21–48. Retrieved from http://dl.acm.org/citation.cfm?id=971740
Munot, N., & Govilkar, S. S. (2014). Comparative Study of Text Summarization Methods. International Journal of Computer Applications, 102(12), 975–8887.

Myaeng, S. H., Han, K. S., & Rim, H. C. (2006). Some effective techniques for naive bayes text classification. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1457–1466. https://doi.org/10.1109/TKDE.2006.180
Park, H. (1996). Inferential representation of science documents. Information Processing and Management, 32(4), 419–429. https://doi.org/10.1016/0306-4573(95)00071-2
Raj, R. G., & Abdul-Kareem, S. (2009). Information dissemination and storage for tele-text based conversational systems’ learning. Malaysian Journal of Computer Science, 22(2), 138–160.

Raj, R. G., & Abdul-Kareem, S. (2011). A pattern based approach for the derivation of base forms of verbs from participles and tenses for flexible NLP. Malaysian Journal of Computer Science, 24(2), 63–72.

Raj, R. G., & Balakrishnan, V. (2011). A model for determining the degree of contradictions in information. Malaysian Journal of Computer Science, 24(3), 160–167.

Saaty, T. L., & Vargas, L. G. (2013). The Analytic Network Process. Decision Making with the Analytic Network Process, 195, 1–40. https://doi.org/10.1007/978-1-4614-7279-7_1
Salton, G., & Buckley, C. (1991). Automatic Experiments Text Structuring and Encyclopedia Retrieval in Automatic Searching. Proceeding SIGIR ’91 Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 21–30.

Silber, H. G. (2002). Effciently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization. Computational Linguistics, 28(4), 487–496. https://doi.org/10.1162/089120102762671954
Tofighy, M., Kashefi, O., & Zamanifar, A. (n.d.). CCIS 252 – Persian Text Summarization Using Fractal Theory. Text, 651–662.

Tofighy, M., Kashefi, O., & Zamanifar, A. (2011). Persian Text Summarization Using Fractal Theory, 254(May 2014). https://doi.org/10.1007/978-3-642-25483-3
Tofighy, S. M., Raj, R. G., & Javadi, H. H. S. (2013). AHP techniques for persian text summarization. Malaysian Journal of Computer Science, 26(1), 1–8.

Yeh, J. Y., Ke, H. R., Yang, W. P., & Meng, I. H. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management, 41(1), 75–95. https://doi.org/10.1016/j.ipm.2004.04.003

APPENDIX A
APPENDEX B