Friday, January 30, 2004
Guess who's back!!
First up, to everyone who expected me back last week, I whole-heartedly apologise. I'm sorry if you thought this was another of the many blogs to fall by the wayside. Not at all, I just got lazy. Sorry Sorry Sorry.
Second, this is a work blog, and not for personal use, but I feel it would be remiss of me not to tell you my news. I got engaged. :D Sometime this summer I plan to marry the most amazing girl I have ever met, I intend to spend the rest of my life with her, I'm completely utterly madly in love, and I've never been so happy. :D :D :D
Third, finally, some work. It took me a few days to get back into the swing of things, but I had my come back meeting as previously advertised. Basically, it was to recap where i was, and to discuss the plan for the coming months. Ultimately, it was a discussion about how I really need to be doing more work, and making a lot more progress.
I fully acknowledge that I've not been the most conscientious PhD. I do waste a lot of time. And the less work I do, the more it gets me down that I'm not doing enough work. Which in turn...well, it's a vicious cycle which I'm sure we are all familiar with. So my supervisor made a couple of suggestions. The obvious one was milestones. I need to set stronger deadlines, and stick to them. Another suggestion was to help my better time manage: time logs. The idea is, one the hour, write down what you were doing for the majority of the previous hour in 15 minute intervals. This will let you see where you are wasting time, and let's you do something about it.
Along with this, I am also aware that I need to get in earlier and stay later. Work more. Work better.
So on Monday, I implemented the new plan. Task for the week: get all the blog data from my email account into my filespace, and filter it ready for tagging. This involves getting each subject's data, compiling it to one file, and running some processing scripts one it. Simple right?
Wrong. The scripts are essentially to strip the HTML out of the file, but also use it to encode the file with some XML tags that I have chosen. Basically, my next big task is tagging the content of the text, but I have also chosen to tag structural data. This includes thing like the use of italics, and links. This information I can get automatically from the HTML code, which in the long term is going to be faster than doing it by hand later on.
It's not as easy as it looks. I wrote a number of scripts, and spent longer than I expected getting them working. And then, rather than running 4 or 5 scripts on each file, I decided to put them together in a pipeline. I'd never done that before. But eventually, with help, it worked.
Then I started on the data. Then I started on the problems. Some people do their weblogs by hand, but these days most people use templates, or software like Blogger. And it turns out that each of them use HTML in a slightly different way, or format the file differently. So, I spent quite a lot of time early one tweaking my scripts to handle this. Then I was encountering structural elements I hadn't thought of, and they had to be dealt with.
So my week started slower than I intended it to. But once we got rolling, I just rolled and rolled. Data was pouring through thick and fast. Some people write a lot, and that's great. Some people don't write so much, but do you know what, that's great to. By the time of my meeting yesterday (that's right, weekly to make sure I was actually making progress) I was well on course to finish at the end of the week, as predicted.
There were a number of issues to discuss in the meeting that I had come upon in the week. First, there were a couple of subjects who submitted data to me, but told me they had edited it. I can understand that people will have a reason for this, and i must respect this, and even though blogs are generally publicly readable, it would be unethical for me to retrieve the original unedited data for myself. It does mean though that I cannot include them in my study, because I cannot use data that I know is inaccurate. To fairly assess elements of text as they relate to individuals, I need the whole text. It may affect my results if I were to include data that has been edited, and so I cannot knowingly do this. It only affects a couple of subjects though, so it is not too damaging.
And apparently it looks good to discard data, for valid reasons, because it shows you have standards.
The other issue, was the time logging. Clearly it worked because my work was going well, but not quite as intended. The idea is, if you write down in detail what you do every day, you can see where you are wasting time. You may not know that you actually spend half the day reading forums, blogs, news sites and web comics, and the other half on email. However, the very idea of writing down what I was doing, that I was then going to show this to my supervisor, made me fully aware of what I was doing, and made me slack of a whole lot less. The whole accountability of it was very refreshing.
We also discussed my plans for the future, but I'll discuss them more in time. The best part of the meeting for me was that although my supervisors were pleased with my progress, they were most pleased that I was pleased with my progress.
So I am now at the end of the week, and I've finished my weeks task. I honestly cannot tell you how good it feels. As I said in our meeting, it really does help making progress. You begin to see results, you see there is a point to it, that you are getting there. I haven't felt this good about my PhD in a long time. This is one of the best weeks I've had in quite some time. Of course, the effect that being engaged to a beautiful, smart and funny girl has on my state of mind cannot be overestimated. :D