Natural Language Processing
78
Natural Language Processing
Abstract
Natural Language Processing is a procedure by which machines can turn into more human and in that way they reduce the distance among human being and the machine. For that reason in common sense NLP make it possible humans to communicate with the machines without any difficulty. There are numerous applications developed in the past years in NLP. The majority of these are extremely useful in daily life for example a machine that obtains instructions by voice.
Introduction
Natural Language Processing (NLP) is the computerized technique to evaluate the text which is based on two things. One is a set of theories and the other one is a set of technologies. Because there is a lot of research and development is going on so it is very hard to define a single definition that will satisfy everyone but at the same time there are some characteristic, which can be helpful to describe the Natural Language Processing and that may be a part of any educated person’s definition.
Natural Language Processing is a theoretically motivated range of Computational procedure for evaluating and presenting the naturally going on texts at one and more than one levels of linguistic analysis for the reason of achieving human-like language processing for a variety of tasks and applications.
Different fundamentals of this can be described in detail. First the unfocused conception of ‘range of computational methods’ is necessary because there are multiple methods or techniques by which to decide to achieve a particular type of language analysis.
‘Naturally occurring texts’ could be of any language. The texts can be spoken or written. There is only one requirement which is that there should be a language which is used by humans to communicate to one another. Also, the text being examine should not be specifically build for the purpose of the analysis, but beside of this text be gathered from definite usage.
The idea of ‘levels of linguistic analysis present the facts that there are many types of language processing known to be at work when humans construct or understand language. It is thought that humans usually uses all of these levels where each level express different types of meaning. But different NLP systems uses different levels, or grouping of levels of linguistic analysis, and this is noticed in the variation among different NLP applications. This also direct to much uncertainty on the part of non-specialists as to what is really the NLP, as a system that uses any subset of these levels of analysis that can be said as a NLP-based system. The difference between them that is either the system prefers the ‘weak’ NLP or ‘strong’ NLP.
‘Human-like language processing’ tells that NLP is well thought-out the discipline within
Artificial Intelligence (AI). And whereas the full roots of NLP do depend on a number of other disciplines, as NLP struggle for human-like performance, it is suitable to consider it an AI discipline.
‘For a variety of tasks or applications’ It is noticed that NLP is not usually careful a goal in and of itself, apart from possibly for AI researchers. As a result, you must have an Information Retrieval (IR) system that uses NLP, as well as Machine Translation (MT), Question and Answering etc.
Goal
The goal of NLP as declared above is “to achieve human-like language processing”.
The selection of the word ‘processing’ is very purposeful, and it should not be replaced with ‘understanding. Even if the field of NLP was at first referred to as Natural Language Understanding(NLU) in the very early days of AI, it is well approved today that although the goal of NLP is true NLU, which is not been accomplished yet. A full NLU System can be capable to:
1) Paraphrase an input text
2) Translate the text into a different language
3) Answers and questions regarding the contents of the text
4) Construct inferences from the text
As NLP has made serious job into achieving goals 1 to 3, the truth is that NLP systems cannot construct inferences from text; NLU still remains the goal of NLP
.
There are many more practical goals for NLP, many related to the particular application for which it is being operate for instance, an NLP-based IR system has the goal of giving more accurate complete information in reply to a user’s real information need. The goal of the NLP system is to present the real meaning and target of the user’s query, that could be expressed as naturally in daily language as if they were speaking to a orientation librarian. As well as the contents of the documents that are searched will be presented at all their levels of meaning as a result a true match between need and response (reply)can be found, Regardless of how they are represented in their surface form.
Origins
The current disciplines, the lineage of NLP is really mixed, but now-a-days it has strong importance by several groups whose background are mainly influenced by one or more of the disciplines. The types among the contributor to the discipline of NLP are as follows.
Linguistics
IT is focused on formal and structural reproduction of language and the invention of language universals. In real the field of NLP was initially referred to as Computational Linguistics and Computer Science which is related with developing internal demonstration of data and good at the job of processing of these structures.
Cognitive Psychology
Its task is to look at language usage as a whole into human cognitive processes and it has the goal of representation the use of language in a psychologically reasonable way.
Divisions
As the entire field is referred to as Natural Language Processing, but there are mainly two different center of attention which are language processing and language generation.
The first of these which is language processing refers to the analysis of language for the reason of producing a meaningful representation, whereas the latter refers to the creation of language from a representation. The job of Natural Language Processing is equal to the function of reader/listener, at the same time as the task of Natural Language Generation is that of the writer/speaker. While much of the theory and technology are collective by these two divisions, Natural Language Generation also needs a planning ability. So the generation system needs a plan or model of the goal of the dealings in order to decide that what the system should generate at each point in an interaction. We will learn the task of natural language analysis, as this is mostly significant to Library and Information Science.
Brief History of Natural Language Processing
Research on natural language processing has been truly achieving for several decades as the earlier 1940s. Machine translation (MT) was the earliest computer-based application which was associated to natural language. As there are two earliest MT projects which were Weaver and Booth and were started 1946 on computer translation based on knowledge in breaking enemy codes in the duration of World War II, it was commonly agreed that it was Weaver’s message of 1949 that brings the idea of MT to broad notice and motivated many projects. He recommended using ideas from cryptography and information theory for language translation. Then the research was started at different research institutions in the United States within a few years.
Early work in MT captures the basic view that the only differences between languages exist in their vocabularies and the acceptable word orders. Systems developed from this point of view that simply used the dictionary-lookup for suitable words for translation and rearrange the words after translation to fit the word-order rules of the target language, without taking into description the lexical ambiguity inherent in natural language. This provided poor results. The noticeable failure made researchers understand that the task was very much difficult than expected, and they required a more sufficient theory of language.
On the other hand, it was not in anticipation of 1957 when Chomsky published Syntactic Structures launch the idea of generative grammar, so by this the field expand better closer into whether or how mainstream linguistics can be helpful for MT.
At the same time, other NLP application areas start to come forward, for example speech recognition. The language processing community and the speech community then was opening into two sites with the language processing community govern by the theoretical point of view of generative grammar and aggressive to statistical methods, and the speech community govern by statistical information theory and aggressive to theoretical linguistics.
Because of the progress of the syntactic theory of language and parsing algorithms, there was very much favor in the 1950s that people supposed to that completely automatic high quality translation systems could be able to provide results impossible to differentiate from those of human translators, and these systems should be in function within a few years. It was not only impractical given than existing linguistic knowledge and computer systems, but also impossible in principle.
The insufficiency of existing systems that lead to the ALPAC (Automatic Language Processing Advisory Committee) to make a report in1966. The report done that MT was not at once reachable and suggested it not be funded. This cause the effect of uncertain MT and most work in other applications of NLP as a minimum within the United States. Even though there was a considerable decline in NLP work during the years after the ALPAC report, there were some important developments, both in theoretical matter and in construction of prototype systems. Theoretical work from 1960 to 1970’s concern on the subject of how to present the meaning and developing computationally tractable solutions Chomsky introduced the transformational model of linguistic competence in the year of 1965.
On the other hand, the transformational generative grammars were very syntactically oriented to allocate for semantic concerns. They as well did not provide themselves easily to computational implementation. As a response to Chomsky’s theories and the work of other transformational generativists, case grammar of Fillmore semantic networks of Quillian and theoretical dependency theory of Schank were developed to explain syntactic irregularity, and present semantic versions. Amplified transition networks of Woods which extensive the powers of phrase-structure grammar by add in mechanisms from programming languages for example LISP.
Along with theoretical development a lot of prototype systems were constructed to display the efficiency of particular principles. Weizenbaum’s ELIZA was programmed to repeat the conversation between a psychologist and a patient, simply by permuting or hollow the user input. Winograd’s SHRDLU simulated a robot that control blocks on a tabletop. In spite of its limitations, it demonstrates that natural language understanding was indeed possible for the computer. PARRY effort to represent a theory of paranoia in a system. As an alternative of single keywords, it used collection of keywords, and used synonyms if keywords were not found.
Other researchers have also made important contributions.McDonald’s reaction generator MUMMBLE used rhetorical predicates to produce declarative descriptions in the form of short texts, usually paragraphs. TEXT’s ability to generate consistent reactions online was considered a major achievement.
In the very early 1980s, forced by the availability of significant computational resources, the rising awareness within every community of the limitations of inaccessible solutions to NLP problems and a general thrust in the direction of applications that worked with language in a broad and real-world context. Researchers began re-examining non-symbolic methods that had lost fame in early days. At the end of 1980s symbolic methods had been utilized to concentrate on many important problems in NLP and statistical methods were exposed to be balancing in many respects to symbolic methods.
At the end of year’s millennium, the field was rising quickly. This can be credited to greater than before availability of large amounts of electronic and the availability of computers with greater than before speed and memory and the beginning of the Internet.
Statistical techniques succeed in dealing with a lot of generic problems in computational
linguistics for example part-of-speech identification and word sense disuncertanity .
Now-a- days the NLP researchers are now making the next generation NLP systems that will work sensibly well with general text and account for a good quality portion of the inconsistency and ambiguity of language.
Natural Language Processing Structure
The major trouble in making a natural language processing structure is to search a useful method to convert a possibly confusing input phrase into an clearly identifiable form that may be apply within a computer system. This trouble is more complex when the computer has to handle the given input which has to be understood in several methods.
The transition from an uncertain phrase to a clear version is called parsing. By the means of NLP, parsing contains alternative collection of words in the sentences with a additional common symbols. This method of symbol replacement may be duplicated recursively unless sentences are changed into an arrangement allowed by the system. There are five different techniques to parsing and are design for use in natural language processing. They are as follows.
- pattern matching
- syntactic or grammar-based
- semantic or knowledge-based
- neural-network parsers
Pattern matching
Pattern matching is the oldest and simplest approach to parsing sentences for processing. Parsers using this method search for definite linguistic patterns in the sentences without any kind knowledge of grammar. When a given sentence matched then it is known as pattern, this system may react in a positive way, that it will manipulate the original sentence to offer a new, interrelated sentence.
Pattern-matching parsers or grammar-based, parsers uses a collection set of principles that explain how the sentences are build in an exact language. These principles are said to be as grammar. These principles make it achievable to construct a tree that present accurately how the words are given in the sentence behave with each other. Where the leaves of the tree, that have the genuine words in the sentence, are said to be terminals, although the remaining part of the tree belongs to the relationships among the words, and are called as nonterminals.
One of the best natural language processing system that role in this style is ELIZA.
ELIZA was a great conversational computer based program and the very first example of modern standards of advance natural language processing. ELIZA function by processing users, by these responses to the scripts, the mainly well-known was been shaped as DOCTOR, He was a simulated a psychotherapist. In this manner, ELIZA typically say differently and rephrased the statements of the users as questions and reply the answers of those questions to the 'patient. ELIZA was programmed by Mr.Joseph Weizenbaum in nearly from 1964 to 1966.
In this simulation based program, when the given input sentence matches then it is known as pattern, the program react by choosing a different pattern from a list of suitable answers for the state of affairs, alternate the words from the original sentence as essential for the sentence to make logic (sense). The straightforwardness of this pattern-matching technique make it easy to apply, but this program is not efficient for the grammatical knowledge, and it is also not very good for evaluate the sentences until other components of the natural language processing system balance for this deficiency.
Syntactic or grammar-based
There are four several types of grammar used in syntactic parsers, everyone of them is using the different types of rewrite principles. The easiest type of grammar to understand is the type 3 which is known as finite-state grammar. Its function is to produce very simple and easy sentences. Where the type 2 which is known as the context-free grammar and it allows
to build a little more complex sentences, but in this type the left side of each rewrite principle is restricted to only one nonterminals symbol. In type 1 which is also known as context-sensitive grammar its function is to allow multiple symbols on the left side of each rewrite rule, on condition that if there are greater symbols on the right side than on the left. In the end, the greatest complex type of grammar type 0 which uses principle that do not have the particular pattern. Even though the type 0 grammar is the very flexible grammar but it is also the most complex to parse.
Syntactic parsing has the ability to deal with the capturing the structure of sentence. However semantic parsing go a further step by focusing on the inherent (inbuilt) meanings within sentences, rather than just their structure. Semantic techniques can in fact be further divided into two categories as follows.
Case grammar
C. Fillmore developed the theory of case grammar. Fillmore tells that every sentence has the ability to represent its meanings and this representation of meanings hold the verbs and the different noun phrases which are related to the verb. The relationships among noun phrases and verbs in sentences are said to be as cases.
Semantic or Knowledge-Based
Semantic grammar is very much similar to the syntactic parsers because in that they both uses the sets of rewrite principles in the processing of sentences rather than using general word classes for example..However semantic grammar uses particular classes for instance ship-properties. Which help to make the sentences much easier to parse while particular word classes have smaller quantity of replacement possibilities than general word classes, however this speciality also makes it tricky to reuse the same rewrite principle in other contexts.
The earlier three approaches to parsing which are pattern matching, syntactic, and semantic depend completely on the use of sentence-rewrite principle for natural language processing. Still, systems are using the knowledge-based approaches, for instance word-expert parsers, that can collect a closer understanding of a sentence by accessing a database which is holding the interpretation of words in a particular subject area .This technique help out to maintain as much of the original shape of a sentence as possible without avoiding the possible loss of meaning that may arise when natural language is reduced to a limited set of rewrite principle.
Neural-network parsers
The most modern approach to natural language processing does not contain the simple rewrite principle or a focused knowledge base, where a network of neural-like computing units.
A computing unit which has the ability to accept numbers of inputs, where every input is allocated a confidence value between 1 and 1. All of these inputs are biased based on their confidence value, and if certain conditions are met, then the computing unit calculates the output value which can provide the input to other computing units. While individual computing units have the output that has only a small amount of data which is calculated as a function of its inputs, complex computation can be achieved by making a group of large number of these units collectively into what is known as a neural network.
A neural network used in a NL parser contains three different types of computing units as they are as follows
- Lexical
- Word-sense
- Case-logical
Lexical
Lexical units are located at the input end of the network; it is mapped to the words in the sentence.
Word-sense
Outputs of these units are connected to the word-sense units, which is use to represent the meanings of the words.
Case-logical
As a final point, the case-logical units join the meanings of the words in the sentence to form predicates and objects. From the five different approaches to parsing, the way humans process language which is neural networks model (NNM) is most accurate, but it is as well the most difficult to implement.
If one of the five parsing approaches are been chosen for a natural language processing (NLP) system, the parsing methods must be considered. So here I am telling about three general parsing techniques
- Top-down
- Bottom-up
- Deterministic
Top-down parsers
Its purpose is to matching the input sentence with the most general rewrite principle and functioning its way to downward towards the most particular principle. Top-down parsers are very easy to implement and update.
The downside is that they may perform the tasks more slowly than other techniques. If all of the rewrite principles at a definite level in the process fail, the parser will have to go into reverse one or more levels and try different principles until it be successful. This may result in reanalyzing the principle that has been passed before. As a result, to optimize the performance of a top-down parser, The successful principle have to be placed towards the top of the list so that they are considered early in the process.
Bottom-up parsers
It starts on by processing the lowest level, most exact principle first, then building up larger parts along the way. Not like top-down parsers, which have hard to process deformed (not formed) sentences. Bottom-up parsers have a better possibility of create sense out of them. However, the bottom-up way of this process will cause one sentence to create numerous possible parses, and the best one cannot be decided until all the paths have been considered. However, this will remove wasted time which is caused by repetitive evaluation of principle on a sentence.
Deterministic parsers
Its work in a method that is similar to bottom-up parsers, but other than examining all possible throughout paths of the whole tree, it performs additional task to determine which node to be follow upwards. This technique provides the deterministic method which gives the speed advantage than the normal bottom-up method, but this look-ahead algorithm can use for only select the most excellent path to follow based on syntactic information.
There is the one major important uses of natural language processing is in the making of a natural language query system, known as NLQS, Which can be interface with a database management system, or DBMS. Usually if a user wished to query a database, then the user should have to learn the database query language which is used by the database management system but the user also have to learn that how structure of the database is to be accessed. In this case the learning curve is very much reduced when a natural language query system is been used for the front-end to the database, which is use o translates the English sentences directly into the query language understandable by the back-end database management system If allowing users to send queries to the system then they ask questions from another person in English, they do not have to spend much time for learning the details of the database management system but have more time to getting results out of it.
To the side from the issues about natural language processing systems normally, natural language query systems create some additional problems that have to be considered. First, the system should be able to recognize many-to-one mappings, wherever several different words can be used to recognize the same database field. For example, the system may be used to refer to the same database field, beside of this; the system should be capable to deal with uncertainty which is caused by one-to-many mappings, where one word can be refer to a number of different fields in the database. For instance, the word is uncertainty, because it may be referring to the fields.
At the moment, natural language processing remains an immature field of computer science. on the other hand, improvements and innovations in computer hardware as well as in computer software is support in the development of betterment natural language processing systems.
In the future, it is expected that computers will be controlled through the intelligent users interface by which the user will be capable to interact with several different types of programs by using typed or spoken sentences in English or Urdu or the another natural language.
Research and future
There is a lot of research made on natural language processing to grow and get better systems that are more human and that can realize easy instructions. There are a lot of research organizations functioning on different projects of NLP. A number of those are as listed below.
Microsoft Natural Language Processing Groups
The objective of this group is to draw and build a computer system which will examine and produce Natural Languages. these system takes input text and move about to various phases of linguistic processing from lexical to morphological analysis all the way through syntax, semantic and finally pragmatics and conversation. These approaches it make important use of the information available in online dictionaries and other written works as the result of this they are capable to take out a much knowledge base, which can be used in highly developed phases of machine understanding. The word “Projects” has increases the scope of the NLP attempting by developing the parallel systems in some languages. The languages enclosed are Chinese, Urdu, Arabic, English, French, German, Japanese and Spanish.
Canon Natural Language Processing Groups
The canon NLP group's actions are paying attention are on utilization and advance progress of its language independent continuous speech recognition technology and explore and expansion of great vocabulary speech understanding software for the improved spoken systems
Canon also interested in search of applications of NLP procedures in a amount of other areas, for example information retrieval by which it is possible to find the markets for its technologies and make use of them.
Conclusion
Hence it is obvious that Natural Language Processing play a very vital function in new machine human interfaces. As when we take a look of the products that are related to the technology of NLP then we preserve that they are very highly developed but very helpful. Except there are numerous limitations, that needs the enhancement and development of NLP based systems. For instance the language that we speak is extremely ambiguous. This compose it very complex task to understand and analyze. Moreover with a lot of languages spoken all over the world it is extremely hard to design a system that is 100 % perfect. These troubles get more complex when we imagine of different people speaking the similar and sometimes the same language with special styles. That is the reason that the most of research on speech recognition is more concerted on these regions. Information retrieval could be better when it give very perfect results for different searches. This will engage intelligence to search and classify all the results. As a result these intelligent systems are being tested immediately so then we will be capable to observe the enhanced applications of NLP in the future.
PrintShare it! — Rate it: up down flag this hub









