Part Of Speech Tagging Or POS Tagging

What is Part-of-speech (POS) tagging ?

It is a process of reading the text(corpus) in a language and assigning some specific token (Parts of Speech) to each word such as nouns, pronouns, verbs, adverbs, and so on. It is also called grammatical tagging or word-category disambiguation.

What are challenges in Part-of-speech (POS) tagging ?

This is challenging task to identify the part of speech for as word, because a single word may have different part of speech tag in different sentences based on contexts.

What are the different POS Tagging Techniques?

There are mainly 4 POS tagging techniques used

  1. Lexical Based Methods
  2. Rule-Based Methods
  3. Probabilistic Methods
  4. Deep Learning Methods

Lexical Based Methods — Assigns the POS tag the most frequently occurring with a word in the training corpus.

Rule-Based Methods — Assigns POS tags based on rules. For example, we can have a rule that says, words ending with “ed” or “ing” must be assigned to a verb. Rule-Based Techniques can be used along with Lexical Based approaches to allow POS Tagging of words that are not present in the training corpus but are there in the testing data.

Probabilistic Methods — This method assigns the POS tags based on the probability of a particular tag sequence occurring. Conditional Random Fields (CRFs) and Hidden Markov Models (HMMs) are probabilistic approaches to assign a POS Tag.

Deep Learning Methods— Recurrent Neural Networks can also be used for POS tagging.

To apply the POS tagging on text we perform 2 steps:

  1. Tokenize text ie word_tokenize
  2. apply the pos_tag

EXAMPLE:

import nltk

from nltk import word_tokenize

from nltk.corpus import stopwords

from nltk import pos_tag

text=”Great! Prem and his friends are the only brilliant boys with highest score in class.”

text = text.lower()

tokeninize_text = word_tokenize(text)

print(tokeninize_text)

nltk.pos_tag(tokeninize_text)

Tokenization of Words and Sentences using NLTK

Natural Language Processing An Introduction

Leave a Comment

Your email address will not be published. Required fields are marked *