Wednesday, July 13, 2005
Contextuality of Blogs, 1 - Some Background
Weblogs are a growing amorphous entity on the internet and are coming increasingly to the attention of academics. The work I am most interested in concerns language, and the web is increasingly being considered as a resource for linguistic study. There has been previous work of computer mediated communication (CMC) formats, but this work most closely follows a study of email.
Previous work on implicitness of language has led Heylighen and Dewaele to develop a linguistic measure of contextuality. It is felt that certain kinds of words require more context than others in order for them to be unambiguously understood.
Consider for example Pronouns. If I were to say, "he loves it" you would not know who he is or what it is. You need more context in order to fully understand: "The President of the United States just bought a new bike. He loves it."
Using this classification, a formula was created that summed relative frequencies of parts-of-speech and results in a single score, the F-measure of a text. A low score means that a text is very contextual, while a high scoring text is termed formal.
I have collected a corpus of blogs (the personal diary kind), along with data about the authors. I have used the F-measure for two investigations.
This is the introduction to the work I am about to present and have published. The results of my work will follow here shortly. For those interested in the full details of the work, you can find a PDF of the paper here
- First, I want to know if the F-measure is any good. So I took a selection of genres from the British National Corpus (BNC) and scored them alongside my blog corpus, and the previous email corpus. This let me see if the genres were plausibly placed along a contextual/formal scale.
- Secondly, I am interested in individual difference within the blog corpus. I wanted to know how personality and gender might result in different F-scores.