Sunday, June 18, 2006

Introducing a Classification Problem

The problem of automatically determining the gender of the author of a literary document (i.e.
fiction/non fiction), an electronic document (i.e. text collected from e-mail), an informal text (i.e.
essays) etc. has been studied and have achieved reasonable accuracy in the range of 70-80%. These works have proven that noticeable differences exist, which discriminates author gender using simple lexical and syntactic features. The aim here is to extend this research on automatically determining gender of literary characters (i.e. from plays) from the playwright’s word use. The earlier mentioned works and this work is different because real men
and real women are involved as the authors of various forms of documents where as in plays, literary character’s gender is characterized by playwright’s word usage. Then the first person in plays comes into picture is Shakespeare...

Questions:
Did Shakespeare use different style to present the character gender? How accurately he discriminated the male and female character features? What are the top features from male and from female? Are these features similar or different to the features used by real men and real women? Did this classification changes with time, here what i mean is Early and Late Shakespeare are same or different in presenting character gender? These are the first questions in understanding the gender of Renaissance period?

We have used computational method to address few of the above reseach questions. This proposal has been accepted in Association for Computers in Humanities(ACH-06) and Third Mid-West Computational Linguistics Colloquim-2006. The accepted paper is given below.

http://l2r.cs.uiuc.edu/~cogcomp/mclc/finalPapers/hotasob_AT_iit.edu__Hota-MCLC06-Full-Paper.pdf

Current focus is to understand the PoS and Lemma features for Shakespeare. For this we are touching base with Nameless Shakespeare edition from Northwestern University, Chicago, IL.
Some more in next time.