Thursday, October 18, 2012

Simple Bi-gram finder (top 10 by frequency) by Python

Problem: To capture bi-grams from a text document.
Input: Test_File.txt


I had to put forth more effort as saving to go to go to go  to go  to go  to go  to go  to go  to go  to go  to go in my retirement . I had to go to school to get ticket to go to movie tomorrow  to go  to go .


Output:

(('to', 'go'), 15)
(('go', 'to'), 13)
(('I', 'had'), 2)
(('had', 'to'), 2)
(('forth', 'more'), 1)
(('retirement', '.'), 1)
(('to', 'put'), 1)
(('tomorrow', 'to'), 1)
(('to', 'movie'), 1)
(('movie', 'tomorrow'), 1)

Python Code:
import itertools
from collections import Counter

f = open('C:\Python27\Test_File.txt')
data = f.readlines()

for line in data:
words = line.split()

nextword = iter(words)
next(nextword)

freq = Counter(zip(words,nextword))
for item in freq.most_common(10):
print item

Sunday, October 14, 2012

Python as Feature Vector Generation

This post I will have two Python scripts which is pretty much used as a start point for any NLP based text mining solution. Vector generation for most frequent 500 words set.

Most frequent 500 BoWs (500_freq_words.py):

from string import punctuation
from operator import itemgetter

N = 500
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("C:\Python27\All_Cleaned_Comment_Loyality_Change.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1
 
top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)[:N]

for word, frequency in top_words:
  print "%s %d" % (word, frequency)
-------------------------------------------------------------------------
Once you generate the most frequent 500 BoWs from a dataset, then you can store in a file using re-directional operator. Then you can use the file for the below vector generation.

Vector Generation:

from string import punctuation
from operator import itemgetter

words = {}
total_words = 0

import sys

for arg in sys.argv:
words_gen = (word.strip(punctuation).lower() for line in open(arg)
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1
 
top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)

#Capture Total Words
f = open(arg)
data = f.readlines()
for lines in data:
all_words = lines.split()
total_words += len(all_words)
f.close()

#print total_words

#Read the lines from Top 500 Words list
f_500 = open('C:\Python27\Top_500_words.txt')
data_500 = f_500.readlines()

#Loop in most frequent 500 List

for lines_500 in data_500:

 for word, frequency in top_words:

if lines_500 == word+'\n':  
    #print "%s %d" % (word, (frequency/float(total_words))*1000)
  print ', %f' % ((frequency/float(total_words))*1000),
  else:
print ', %f' % ((1/float(total_words))*1000),

print ', '+arg

Tuesday, March 06, 2012

Why analyzing sentiments of customer feedbacks are IMPORTANT for business?

Customer Experiences on Products:Most of the time, when I buy any (small to medium) electronic product(s), I logged into amazon.com and place an order. The fun part is without touching the product from dimensional aspect, i am good most of the times (with 85% or above on satisfaction) on the purchases made. Do you think the pictures (adding various angular aspect with multi dimensions), which is posted in the sites triggering to buy, certainly not. The best reliable part from those market place, are those review comments. Those opinions gives broader aspect in evaluating a product as they are written by the consumers, who already used the product and they are consious in writng those reviews (most of the time) unless got tangled with spammed reviews (no luck - very rare).

Moving on from a tangible product reviews to in-tangible product reviews, (ex: for financial company customers feedbacks): this case the organizations are interested to capture the feedbacks and analyze those to extract sentiments (opinions) drivers from VoC (voice of customers) for achieving these important goals.



Goals:


  • Making direct customer outreach

  • Identifying needs more effectively by understanding specific customer requirements

  • Very effective targeted marketing communications aimed specifically at customer needs

  • Separating customers from segmentation

  • Utmost important aspect is of more personal approach and the development of new or improved products and services in order to win more business in the future

If an organization achieves the above, then that helps in enhancing customer retention and satisfaction which ensures that your good reputation in the marketplace continues to grow over time.

Once your business starts to look after its existing customers effectively, efforts can be concentrated on finding new customers and expanding your market. The more you know about your customers, the easier it is to identify new prospects and increase your customer base.

This seems to be a challenging task by itself, when automatically figuring out how to achieve all these. But think of, if there would be automatic process which can capture the sentiments of those customers, then it would be dream turns to reality to know your customer more closely, their needs and that may lead to build the strongest customer relationships.

This year May 8 at NYC (Sentiment Analysis Symposium): Academia, Companies (users perspectives, vendor perspectives, media) are meeting together, who are practicing or aware of value driven by performing sentiment analysis. If you would like to attend, you can register as early as possible (March 9) for a discount.

Link: http://sentimentsymposium.com/
Feel free to reach me at: sobhan [ DOT ] hota [ AT ] fmr [ DOT ] com

Tuesday, January 03, 2012

Sentiment driven words from Top-10 SMO Prefixes (-ve) Features [Present/Absent vector]

awf: awful, awfully, awfulness
bad: bad-ass, bad-asses, bad-boy, bad-enough-to-be-good, bad-guy, bad-guys, badalamenti's, badalucco, badass, baddass, baddeley, badder, baddie, baddies, badge, badger, badgers, badgers', badges, badham, badies, badlands, badly, badly-dubbed, badly-written, badmouth, badness, badu
dia: diabetic, diablo, diabolical, diabolically, diabolique, diagnosed, diagnosis, diago, dial, dialect, dialectic, dialects, dialog, dialogue, dialogue-, dialogue-driven, dialogue-laden, dialogue-those, dialogueless, dialogues, dialouge, dialouge/short, dials, diametric, diametrically, diamond, diamonds, dian, diana, diane, dianne, diaper, diapers, diaphragm, diaries, diarist, diarrhea, diary, diasappointing, diatribe, diatribes, diaz, diaz's
fem: female, female-fearing, females, fembots, feminine, feminism, feminist, femke, femme, femme-butch-femme
neg: negate, negated, negates, negating, negative, negatively, negatives, negativity, negin, negin's, neglect, neglected, neglectful, neglecting, neglects, negligees, negligence, negligent, negligible, negotiable, negotiate, negotiates, negotiating, negotiation, negotiations, negotiator, negotiators
pai: paid, paige, pain, pained, painful, painfully, painkiller, painkillers, painless, pains, painstaking, painstakingly, paint, paint-by-numbers, painted, painted--you, painter, painterly, painting, paintings, paints, pair, pair's, paired, pairing, pairings, pairs, pais
poo: pooch, pooches, poodle, poodles, poof, pooh, pooh-bah, pool, poolboy, poolman, pools, poolside, poombah, poon, poop, poopie, poopies, poor, poor-rich, poor-taste, poorer, poorly, poorly-developed, poorly-done, poorly-employed, poorly-integrated, poorly-motivated, poorly-paced, poorly-shown, poorly-staged, poorly-written
sav: savage, savaged, savagely, savagery, savages, savannah, savant, save, save-the-earth, save-the-world-from-aliens-or-environment-, saved, saves, saville, saving, savings, savini, savior, saviors, savor, savoring, savory, savour, savoured, savoy, savviness, savvy
uni: unicorns, unidentifiable, unidentified, unified, uniform, uniformly, uniforms, unifying, unimaginable, unimaginative, unimaginatively, unimaginatve, unimitible, unimpeachable, unimportant, unimpressed, unimpressive, uninfected, uninformed, uninhabitable, uninhabited, uninhibited, uninitiated, uninsightful, uninspired, uninspiring, uninsured, unintelligble, unintelligent, unintelligently, unintelligible, unintended, unintentional, unintentionally, uninterest, uninterested, uninteresting, uninterestingly, uninterrupted, unintriguing, unintrusive, uninvited, uninviting, uninvolivng, uninvolved, uninvolving, union, union's, unions, uniqe, unique, uniquely, uniqueness, unironic, unisol, unison, unispiring, unit, unite, united, unitentionally, unites, units, unity, universal, universal's, universality, universally, universe, universes, university, university's
veh: vehemence, vehemently, vehicle, vehicle's, vehicles, vehicular