Tuesday, December 27, 2011

Sentiment driven words from Top-10 SMO Prefixes (+ve) Features [Present/Absent vector]

Positive Features and all words from corpus:

bef: befall, befalls, befitting, before, before---and, before--almost, before--round, before--sometimes, beforehand, befouled, befriend, befriended, befriending, befriends, befuddled

due: duel, dueling, duelling, duels, dues, duet, duets, duetting

esp: espadrilles, especially, espionage, espionnage, espn, esposito, esposito's, espoused, espouses, espousing

gal: gal's, gala, galactic, galactica, galahad, galapagos, galas, galaxies, galaxy, gale, gale's, gales, galeweathers, galileo, gall, gallagher, gallagher's, gallant, gallantly, gallantry, galleries, gallery, gallifrey, gallo, gallo's, gallons, gallop, galloping, gallows, gallup, galore, galoshes, galpal, gals, galvanizing

hyp: hype, hyped, hyped-up, hyper, hyper-capitalism, hyper-color, hyper-colorized, hyper-drive, hyper-erratic, hyper-kinetic, hyper-real, hyper-silly, hyper-tense, hyperactive, hyperbolic, hyperdrive, hyperjump, hyperkinetic, hypernaturally, hyperreal, hypersleep, hyperspeed, hyperviolent, hypnosis, hypnotherapist, hypnotic, hypnotically, hypnotised, hypnotises, hypnotism, hypnotist, hypnotize, hypnotized, hypochondriac, hypocrisy, hypocrite, hypocrites, hypocritical, hypodermic, hypothalamii, hypothalamuses, hypotheitically, hypothesis, hypothetical, hypothetically, hypsy

lig: ligaments, light, light-hearted, light-heartedness, light-mood/serious-mood, light-plane, light-saber, light-weight, lighted, lighten, lightening, lighter, lighters, lightest, lightheaded, lighthearted, lightheartedness, lighting, lightly, lightness, lightning, lightning-fast, lightning-lit, lights, lightsaber, lightsome, lightweight, lightyear, lighweight,

mod: mode, model, model's, model-citizens, model-friends, model-turned-actress, model-wife, modeled, modeling, modelled, modelling, models, modem, moderate, moderately, moderately-successful, moderation, modern, modern-day, modernisation, modernist, modernity, modernization, modernize, modernized, modernizing, modes, modest, modestly, modesty, modicum, modified, modifier, modify, modine, modine's, modulated, modulation, module, modus,

per: perado, perceive, perceived, perceives, perceiving, percent, percentage, percentages, perceptible, perception, perceptions, perceptive, perceptively, perceptiveness, perch, perched, perches, percolating, percussion-heavy, percussive, percy, perdy, perdy's, perennial, perennially, perez, perf, perf's, perfect, perfect--you, perfected, perfecting, perfection, perfectionist, perfectly, perfectly-assembled, perfectly-groomed, perfects, perfekt, perfomances, perform, performace, performaces, performance, performance-of-which-he-should-be-ashamed, performances, performances-, performances--always, performed, performence, performences, performer, performers, performig, performing, performs, perfs, perfuctory, perfume, perfume-drenched, perfunctory, perhaps, peri, pericles, pericles', peril, perilous, perilously, perils, perimeter, period, period's, period-piece, periodic, periodical, periodically, periods, peripheral, periphery, periscope, perish, perishes, perjuring, perk, perkily, perkiness, perking, perkins, perkins', perks, perkuny, perkus, perky, perky/cute, perlich, perlman, permanant, permanence, permanent, permanently, permanetly, permeate, permeated, permeates, permeating, permission, permissive, permissiveness, permit, permits, permitted, permutation, permutations, pernilla, peron, perpetrated, perpetrator, perpetrator's, perpetrators, perpetual, perpetual-motion, perpetually, perpetually-changing, perpetuate, perpetuated, perpetuates, perpetuation, perplexed, perplexes, perplexing, perpretrators, perps, perreault, perrier, perrineau, perry, perry's, persay, persecuted, persecution, perseverance, persevere, persia, persian, persist, persistence, persistent, persistently, persists, persnickety, person, person's, persona, personable, personage, personal, personal--but, personalities, personality, personality-challenged, personality-impaired, personalized, personalizes, personally, personas, personification, personified, personifies, personify, personnel, persons, perspective, perspectives, persson, persuade, persuaded, persuades, persuading, persuasion, persuasions, persuasive, persuasively, persues, pertain, pertaining, pertains, perth, pertinent, perturbed, pertwee, pertwee's, perused, perusing, pervaded, pervades, pervasive, pervasiveness, perverse, perversely, perversion, perversions, pervert, perverted, pervious

pos: pose, posed, poseidon, poseidon's, poses, posess, posessed, posesses, posessing, posession, poseurs, posey, posh, posing, posit, positing, position, positioned, positioning, positions, positive, positively, positives, positronic, posits, poslethwaite, posse, posses, possesion, possess, possessed, possesses, possessing, possession, possessions, possessive, possessors, possibilites, possibilities, possibility, possible, possiblities, possiblity, possibly, possum, post, post-, post-_scream_, post-apocalypse, post-apocalyptic, post-break-up, post-cannibal, post-chasing, post-cold, post-credit, post-death, post-feminist, post-forensic, post-industrial, post-marriage, post-modern, post-movement, post-operative, post-party, post-post-feminist, post-production, post-prologue, post-psychedelia, post-revolutionary, post-salary, post-secondary, post-shower, post-snl, post-there's, post-torture, post-traumatic, post-twin, post-vietnam, post-w, post-war, post-watergate, post-world, post-wwii, posta, postage, postal, postapocalyptic, postcard, posted, poster, posterior, posters, posthlewaite, posthumous, posting, postino, postlethwaite, postman, postman's, postmodern, postmodernism, postponed, postponement, posts, postulate, postulated, posture, posturing, posturings,

won: won t, won't, won92t, won=92t, wonder, wonder-bred, wonderbra, wondered, wonderful, wonderful--i, wonderfully, wondering, wonderland, wonderment, wonderous, wonders, wonders', wonderully, wondrous, wondrously, wong, wong's, wonie, wonka, wonsuk, wont,

Prefix (of length=3) on Presence/Absence Vector makes big leap in sentiment accuracy on movie reviews





After analyzing the relative frequency for BoWs, bi grams, tri grams, prefixes (length=3) and their combinations on few, I came to know that there is reasonably some line (ie hyperplane) can really separate the positive Vs negative sentiments from movie reviews. The top features performers were prefixes/BoWs (and combinations).

This time i tried out with a vector as a presentation of presence (1) or absence (0) on BoWs(most frequent 500 list) and prefixes (All). The prefixes came up with 97.2% accuracy (accuracy remained the same when i converted all 0's to 0.5 to avoid sparseness). Also captured the top 10 prefixes from both class labels.





Positive:



-0.7018 pos,-0.1363 per,-0.1271 hyp,-0.1158 mod,-0.1116 lig,-0.106 gal,-0.104 due,-0.0995 bef,-0.099 won,-0.0984 esp

Negative:
1.5197 neg,0.2067 poo,0.1712 pai,0.1336 bad,0.1308 awf,0.1196 sav,0.117 dia,0.116 uni,0.1157 fem,0.1133 veh

Currently analyzing the concordance lines corresponding to these top features from both class values.

Tuesday, December 20, 2011

Prefixes are playing important role: Movie Review Classification



I tried to capture the first three letters [ie Prefix] of the words and then ran against movie review corpus with SMO and result was promising with 71.7% alone. Then captured the top 500 Prefixes (same as early length = 3) and combined with most frequent 500 BoWs (which has already proven the winner); this time the accuracy moved up to 76%. So BoWs provided some boost with these prefixes. It would be nice to see how these features are being used in the real reviews. Also plan is to run the corpus with features as only 0/1 [absent/present] for most frequent BoWs and Prefixes. Stay tuned with more results (importantly features)!

Sunday, December 18, 2011

Sentiment Analysis (Movie Review) from couple of Feature Sets


As observed the BoWs has performed consistently. BiGrams tried to help a bit in MostFrequent (500) test, but couldn't push a lot. TriGrams are only informative, but still were not a booster for this task.

Gender in Shakespeare via Style

Did Shakespeare use different language style for male and female characters in his plays? Were those language projection similar some way the language presentation used by real men and women in real world on formal texts? Did he use different language style in his Early vs Late plays? Would these findings change thought process among humanitarians, who use Shakespearean plays as an exmaple for their studies? If the findings capture male and female charater usage, can those lead to capture any (gender innuendoes like cross dressing etc)?

To have answer to these questions, please browse through these papers.

Papers:
http://www.csdl.tamu.edu/~furuta/courses/06c_689dh/dh06readings/DH06-082-088.pdf
http://knol.google.com/k/gender-in-shakespeare-automatic-stylistic-analysis-of-shakespeare-s-characters

Saturday, December 17, 2011

Sentiment Analysis: SMO Output on Movie Reviews from Bag of Words Features

In order to observe sentiment analysis, used a Cornell corpus on movie reviews with reasonable accuracy
(73.3%) and these are the top positive/negative words sorted based on weights given by SMO.

Good/Best/Great/Very/quite are in Postive where as bad/mad/poor/better/mess in Negative side.
Usage of various female gender (daughter/women/girl) are observed in top negative sentiments, where as male representer (man/son) are part of top positive sentiments.
Adjectival usage is more found in positive sentiments.
Relationship holders (pronouns) are observed in positive reviewed sentiments.

Among proper nouns, Spielberg in positive to Batman in negative.
Verbs were observed more in negative sentiments.
Negations (no/nothing) are observed in Negative sentiments.


Positive Sentiments Negative Sentiments
-3.6362 and 5.0594 bad
-3.0145 per 2.9062 there
-2.9843 great 2.8591 any
-2.5268 most 2.7724 could
-2.1792 well 2.6392 only
-2.1392 one 1.9204 mad
-2.0488 also 1.8398 look
-1.9767 than 1.8177 off
-1.874 man 1.6521 then
-1.8239 very 1.6086 on
-1.8116 real 1.4662 act
-1.8103 am 1.4573 point
-1.7231 life 1.3847 just
-1.6721 col 1.352 suppose
-1.6385 from 1.3375 attempt
-1.6225 you 1.2757 ass
-1.5916 other 1.2522 spawn
-1.5394 his 1.2505 character
-1.4975 star 1.2262 enough
-1.4512 de 1.2143 wes
-1.448 fu 1.1934 this
-1.4436 is 1.1367 eve
-1.4409 being 1.133 give
-1.3563 best 1.1041 better
-1.3558 aliens 1.0884 been
-1.3555 as 1.0864 kill
-1.3025 many 1.0843 west
-1.2927 while 1.0832 mars
-1.2152 horror 1.0514 nothing
-1.2135 back 1.0485 women
-1.1946 ever 1.0331 made
-1.1703 out 1.0173 no
-1.1662 com 1.0028 director
-1.1562 son 0.9838 poor
-1.1542 chan 0.9758 daughter
-1.14 dark 0.9635 eddie
-1.1375 won 0.9604 seagal
-1.1032 quit 0.9587 about
-1.1019 movies 0.9556 such
-1.0899 quite 0.9241 god
-1.0403 by 0.9097 brother
-1.0192 world 0.9 try
-0.9972 take 0.8979 big
-0.9967 hunt 0.8331 thriller
-0.9871 hollywood 0.8286 batman
-0.9781 perfect 0.8286 all
-0.977 trek 0.8238 pot
-0.9739 beau 0.8018 thrill
-0.9695 good 0.8001 least
-0.9481 the 0.7886 talent
-0.945 music 0.786 actors
-0.9434 three 0.7653 bat
-0.9139 always 0.7577 here
-0.9084 day 0.7571 girl
-0.8948 fiction 0.7463 given
-0.8943 again 0.7453 boring
-0.8903 disney 0.7094 mess
-0.8781 spielberg 0.7094 couple
-0.8696 les 0.6888 write
-0.8685 home 0.6849 interesting

Monday, December 12, 2011

Batch Script to Execute Java program for multiple files

for /f %%a IN ('dir C:\Sentiment_Analysis\src\com\***\sa-mov\file\/b *.txt') do java MovieReviewsCollector %%a