Thursday, October 27, 2005
This thesis describes a linguistic investigation of individual differences in online personal diaries, or `blogs.' There is substantial evidence of gender differences in language (Lakoff, 1975), and to a lesser extent linguistic projection of personality (Pennebaker & King, 1999). Recent work has investigated these latter differences in the area of computer-mediated communication (CMC), specifically e-mail (Gill, 2004).
This thesis employs a number of analytic techniques, both top-down (dictionary-based) and bottom-up (data-driven), in order to explore personality and gender differences in the language of blogs. A corpus was constructed by asking authors to submit a month of text and complete a sociobiographic questionnaire. The corpus consists of over 400,000 words and five-factor personality data (Buchanan, 2001) for 71 subjects.
The thesis begins by framing blogs in the context of other genres, both CMC and traditional, in order to show the distinctiveness of the genre. Top-down content analysis techniques are then employed to investigate the relationship between personality and linguistic features. A number of features correlate with each trait, but upon regression, very little variance is explained.
Bottom-up techniques are more successful. The corpus was stratified into high, low and neutral personality groups to identify distinctive collocations for each. Returning to the raw personality scores, it becomes clear that even a small amount of n-gram context helps account for much more variance in personality. A measure of contextuality/formality (Heylighen & Dewaele, 2002) shows that authors considered high in Agreeableness pay more attention to differences between their extra-linguistic context and that of their audience.
Attention then turns to gender, where the same methods are applied to investigate gender differences in language. A number of previous findings are confirmed in the blog corpus. In addition, women are found to write more in their blogs than men. More generally, using the British National Corpus, it is shown that men are more formal, except in the most formal of genres (academic writing) where there is no difference.
The study concludes by confirming that both gender and personality are projected by language in blogs; furthermore, approaches which take the context of language features into account can be used to detect more variation than those which do not.
It's Like Riding a Bike
Yes, I am still here, but I've been writing again. I'm not yet updating my thesis...though it pains me to admit that I have in fact noticed small errors in it. I am making note of these in preparation for my viva.
No, the last week or so, along with working for pay, I have been writing a small paper for a weblog symposium I hope to attend. The deadline is tomorrow so I've been a little preoccupied. It was originally too close to my submission date for me to prepare anything, but a colleague pointed out that they had extended it, so I can have a good go.
There has also been some interest in my thesis. Of course, I can't expect people to download a 300 page thesis to find out what it is about, so I've added the abstract to the introduction page, and I'm going to post it here for all to see.
Tuesday, October 11, 2005
The Language of Weblogs: A study of genre and individual differences
That's right, it's thesis time. I'm sure that after all the wait, reading about my progess, reading that I'd submitted, it was some what of an anticlimax not to get to see the finished product. My apologies. I had meant to prepare some posts on the content, summarise my results for you, make it more digestible. However, until such time as I manage this, I present for your reading pleasure... my thesis.
Wednesday, October 05, 2005
It's Been One Week
It's been a week now since I submitted my thesis and I thought I really ought to post so that people know I'm still here. Indeed, with the viva to come, I will still be working on things bloggish. In fact, it would be nice to continue beyond that. What I'd like to find is someone interested in individual differences and language, possibly in blogging, with an aim to putting together a research proposal for further work. If there is anyone out there interested in working in this field, wherever you may be in the world, drop me a mail, I would love to hear from you.
So, the first observation I can make is that the intense diligence that comes with the final months of the PhD doesn't seem to stop. On Wednesday I submitted after 4 years work and certainly the last 10 months very hard work. I deserved a couple of days of. However, every time I catch myself doing nothing in front of the TV, I feel guilty for not working. I shouldn't be having so much relaxation I have a thesis to finish. No wait, I already did. My wife is the same, she keeps thinking she needs to send me back to work, but she really doesn't.
Still, despite the guilt, I did manage to have a relaxing few days. Obviously I am back now...gotta earn a crust...but it really does feel good to be done, at least for now. I can now spend time with my wife in the evenings and weekends. I can enjoy all the things we cut back on so I could work more...TV, games, fun...I highly recommend finishing the all Postgrad students :D