Thursday, November 06, 2003
Tuesdays Work UpdateI never did repost my brilliant words from Tuesday, and since I've just had a meeting, I might as well do it now.
So, progress has been slow, because I was ill at the weekend, but basically I am still working my way through my data classifying it as YES, NO and MAYBE. Things have taken a turn for the worst of late, as the percentage of NOs and MAYBEs is on the increase. I am currently sitting on about 60% YESs, which is quite low, but I have a feeling it will end closer to 70%, which is better, and if hopefully half the MAYBEs become YESs, that'll be a decent amount of data.
YESs and NOs are easy to deal with. Blogs are either good enough or they aren't. The problem come, and the time is taken, over the MAYBEs. It all comes down to my definition of the diary/journal style i'm looking for, along with the ratio of diary/journal posts to posts that purely link and discuss the outsdie world.
We have a couple of options for dealing with these maybes:
1 - So far, my judgements are purely based on eyeballing the data, but the final decision on the MAYBEs will have to be more scientific. One option is to determine an exact point at which data becomes a YES or a NO. This could be word count or post count, and it
would require marking the text within each blog as either personal or comment/link. We could say that a blog needed to be at least 50% personal for it to get in.
2 - We could also continue to rely on human judgements, but not just mine. We get a number (at least 2) of raters to read the MAYBE blogs (along with a couple of YES NO blogs for guidance) and get them to decide independently if each is a YES or NO. Then you compare the ratings and get an answer for each blog.
After the meeting we've decided to lean towards a variation of method 1. For the moment, all MAYBEs where this is an issue, will for now be included. I will begin to markup all the data, including marking posts as either personal or comment/link. Once this is complete we can get a look at the statistics, word count, posts count etc. This should allow us to decide a suitable threshold for decision making.
Also, it is best to keep as much data as you can, because once you throw it away, you lose it forever, you cannot later slip it back into the study. But if you keep it in, and at some point you realise you can't keep it, it is easy to remove.
Once the data is marked-up, it may even be possible to include all the MAYBEs but not include the comment/link text in the analysis.
A lot to think of for next weeks meeting.