Thursday, July 14, 2005
Contextuality of Blogs, 2 - Blogs as Genre
In the linguistic research community there is a lot made of genre. There are many studies that try to tell the differences between genres. Most studies that examine language, like this one, normally take their language from one or more genres. I chose blogs as my area of study, and so it is interesting to see if blogs can be considered as an individual and unique genre. It would certainly seem intuitive to say that they are at least a cross between diaries and personal homepages.
In a number of interesting papers Herring et al. have investigated, amongst other things, quite what defines a genre. They make a very convincing argument that weblogs can in fact be classed as a genre, placing them on a continuum between static HTML homepages and ever changing newsgroups.
The first analysis I conducted with the f-measure described in the last post was to compare my sample of blogs to samples taken from other genres. I chose a selection from the BNC (a collection of 100 million words of spoken and written English) that included both spoken and written texts, scientific and fiction writing.
I calculated the F-score for every file. The averages for each genre are in the table below. Remember that low scores mean the language is more contextual, while the high scores use more formal language.
As you can see, there are a number of very plausible differences and similarities: on the whole, spoken genres are more contextual than written; professional letters are more formal than personal; university essays are more formal than school essays though they are similar to academic publications. These quite understandable results suggest that our technique, the F-measure, clearly detects an aspect of differing language between genres.
So we turn to our texts, the emails and blogs. Emails are understandably similar to both personal letters and text from a mailing list. Interestingly, blogs are only as formal as school essays. They are however (significantly) more formal, or less contextual, than emails. This is understandable for a number of reasons:
This part of my study was to show that on at least one aspect of language, blogs differ from other genres, and the F-measure is a useful tool. In the next post, I will talk about how the F-measure can show difference in formality/contextuality between individual authors.
- Audience: the emails were written to close friends, but blogs can be read by anybody, which means the author does not know everyone who will read it. They cannot expect the reader to know everything about them.
- Space: since the blog author may not know the reader, they know that they may not share knowledge of the same things. They may have to give more descriptions of people, places and activities, than someone writing to a friend.