Wednesday, May 03, 2006
Get In There
So I finally heard the news I had been expecting for the past couple of months: our paper was accepted to CoLing/ACL. For anyone who doesn't know, this conference brings together two of the largest (if not the two largest) computational linguistics conferences around. A super conference if you will. This year it is being hosted in Sydney. Which will be very nice. It will technically be winter over there, but no doubt it'll still be nicer than much of the summer over here.
I shall say more about the contents of the paper when I get a chance to post a link to it (I'm blogging from home right now, and I'd like to see the reviewers comments first). The title kind of lets you know what it's about though:Whose thumb is it anyway? Classifying author personality from weblog text.The title refers to fact that many of the early papers in sentiment analysis made some allusion to their movie review data (for example) as being good or bad - thumbs up or down. Sentiment analysis is very popular at the moment. There were a good many talks/posters at the recent symposium which dealt with detecting and classifying sentiment from weblogs. Our main argument though, and this has been suggested by others, is that one persons four star review is another persons two (Pang & Lee, 2005).
Think about it.
If women are more comfortable expressing feelings and emotions than men, surely the language they use to review something will be different. The same can be said for people with different personality types. Consider these totally made up examples - entirely believable if you've read a lot of blogs:
Imagine these were written by the same person. It's obvious they liked film one more than film 2. Imagine however, that the first is written by an expressive highly Open Extravert, and the second a less Open Introvert. It is possible that these two people saw the same film, that they liked it very much, and if asked to give it a score, would both give it full marks. Any direct comparison between the two and the second is far less positive. If you were to take into account the natural style of the two authors however, it might be possible to rank the second as high as the first, giving a much more accurate picture of how much a product is liked (or of course disliked).
- Wow! That was incredible! The acting was fantastic, the dialogue was sharp and polished and it held me from the off. Brilliant. Superb! I enjoyed that so very very much. Film of the year. Amazing.
- Went to the cinema tonight. Film was really good. Had a good night.
Well anyway, that is just a brief bit about why we are classifying text by author personality. I'll hopefully get the paper up soon, and I shall definitely say more about what we did. I also hope to tell you more about a little project that's recently been confirmed.