Monday, December 22, 2003
A very brief post. The yearly holiday season is upon us. So whilst I've ben beaviering away on PERL scripts for stripping files (it went well) I've been getting tied up in holiday dooberries. Not only that, but this is my last post...no, i'm not giving up...for some time.
As you well know, I'm not one for discussing my personal life, but I will say it's been a tough year. I'm very much looking forward to the next stage of my life, and I am starting it with a holiday. Not just over christmas, but beyond into the new year.
In fact, the next you'll hear from me will probably be after my come back meeting on Thursday 22nd January.
"But blogging is on-line...you can do it anywhere!" I'm sure you are insisting. And you'd be right, only since I won't be working, and this blog is about my work, I can't promise anything.
Anyway, time to dash. Thanks for reading my blog, you've been a great help, and I hope you will all come back in the new year.
Thursday, December 11, 2003
Notes from a normal Joe
So, let's see if I can remember all the things I wanted to say.
First, I was in Blackwells, and saw that the Salam Pax, the Baghdad Blogger has a book out. Actually, it seems to be the transcription of his actually blog for about a year. Which is of great interest to anyone interested in the evolution of the 'blogosphere' or whatever your preferred term is. Diaries have been published for years...but this might be the first journal blog to be published. That's cool. I also need to look more closely at it to see how they dealt with the printing of links.
What else? Well, i know how to indent my post text, as you can see (I hope). However, since my title is still part of the text it is also indented. I'm not quite sure what to do about that right now, so i've taken the rather simple, individual post specific path of using the <BLOCKQUOTE> tag. I'll have a look once I post, but i'm sure there must be a better way of doing it.
Oh that was of vague interest: a couple more people have done my experiment this last week. Apparently they both found me from searching for blog research. Which of course, I do. It all came as bit of a surprise.
I think there was more, but I'm quite excited to have a look ath the layout of this post, so i'll be back later with more.
Tuesday, December 09, 2003
An ickle updateSo let's try out this whole small update thing. Guess who's computer messed up yesterday.
That's right. Mine.
It might have been my own fault for once however. I was using a graphics package, and it kept giving me an error message which basically said:
cannot open swap file, something bad will happen soon
i mean what is that supposed to mean?! Who is going to believe an error message like that. Well, I should have.
Things seem to be ok now, bar the fact that I've lost my toolbar. Not quite sure where it's gone, but since everything else is working (or appears to be) I'll leave that in the capable hands of our support team.
So I'm working on the XSL transforms. And I'm going try and programme sensibly. I'm going to test each function as I write it, rather than writing the whole thing and going through a major debugging phase.
So I was having troubles with the < and > characters. Turns out is have to remember to use the mark-up code for them: < and > respectively. Just like I did in previous posts here. However, as you'll see, they look a little weird:
< some text > (< some text >)
This is because when I write them with no spaces, a la <some text, the some gets lumped in with the LT and confusion ensues.
However, as I try and type < into <oXygen/>, I get as far as the &, and it bring up a completion menu, which contains the very elements I need. Not only that but they are followed by semi colons, which would appear to deal with the space problem:
<some text> (<some text>)
So now my very simple stylesheet it valid. However, when i try to apply it to a valid XML file, i'm told the software failed to parse stylesheet. But it's valid... :(
Friday, December 05, 2003
Smaller and smaller and smallerIn my browser, I reckon my font is too big. Apparently it was 15px, according to my template. I made it 12px. It looks better but then there was a whole lot of white space. Turns out the template also defines line height, which I have now dropped for 22px to 15px. It looks better, neater, more concise. Don't you think?
I still feel it's missing something of course. It doesn't feel right. Maybe i've just seen too many blogs these past few months...
The long and the short of itWe can all see that recently I've not been posting much. It is also plainly evident however, that when I do post I post quite a lot.
This is mainly because I am too busy and/or forget to update the blog. So when I do, I have a lot to catch up on.
So I have a question: do you prefer to read frequent short update posts, or irregular but lengthy ones?
Lots Done, Lots Still To DoSo obviously it's been a busy week, yeah. I've done quite a bit, learned an amount and come closer to getting procedures finalised. Biggest lesson of the week: things ALWAYS take longer than you think they will.
So last weeks meeting was a technical meeting. I met with my second supervisor to discuss the software I had been looking at, that she had been involved in:
< longstory >
The software was of no use.
< /cutshort >
It just wasn't really appropriate for what I wanted it for. However, coincidentally (and there will be a few of those in this post) a friend of my office mates' had been in visiting the day previously and had downloaded an XML toolkit, < oXygen/ >. So we thought we'd take a look at that. Lo and behold, they do a Linux version. Champion.
Dowload. Install. Load.
It really was that simple. Initial impressions are that it certainly looks nice. And it seemed to have this option whereby if I highlighted a block of text, then I could chose to surround it in a tag, selected from a list. This is exactly what I want. Some tagging will be automatic (the more the merrier) but I will need to do much of it by hand. This click 'n' pick technique is just what I'm looking for: I thought by hand meant literally that, traversing each text and hand coding every tag.
So, the next problem: how to extend the list of available tags to include those I wish to use. I perused the Users Manual, but it was just that: It told you how to use the tool, the menus, the options, how not actually what you did to make things work. It was assumed you knew that part. Rats.
Then, by coincedence, a new office mate showed up. There wasn't really anywhere for him to be, but we had a chat anyway (I did my Masters with him). Turns out, he's done a bit of XML. So he knows all about what is involved. I need two things:
- In order to do the automatic tagging, i need to perform an XSL Transform. This is a series of stylistic operations which will allow me to find all the HTML tags I need, and turn them into my form of tagging. XSLTs work on XML files, which is handy because it's what I need out, but it's also what i can put in: < oXygen/ > can import HTML files, resulting in XHTML, and tidy can output in that format (XHTML effectively being HTML files in XML form).
- I need a Document Type Definition, or DTD. These define the building blocks of an XML files, it defines the grammar of your tag structure. It turns out that once I create this file, I can create an XML file in < oXygen/ > using said file, and there are my tags ready to be used. Wicked.
So off I set, using the links I've made above, to teach myself the fundamentals of XSLT and DTD writing. Obviously, with the transform coming first in the process, I got on with writing my DTD.
Armed with my new knowledge, I went for my meeting yestreday. It was disappointing the number 1 supervisor didn't show up, but thats obviously the perogative of busy senior academics. Number 2 supervisor and, for some reason, given that were each had our own offices, and the coffee room was real close, opted to sit outside number 1's office. That's right, we had a meeting in the corridor.
But what a good meeting it was. We looked over my DTD so far, and discussed what more should go in it, and what the XSLT should cover. We settled on the fact that I should keep absolutely as much surface data as I can, and then remove all remaining HTML, because as much as I wanted to keep it, we couldn't actually come up with a reason why. The only argument to be made was that it would help with that tagging, but if while tagging the stripped down XML file, I have the HTML file open in a browser, I can use that to help.
I've still not entirely settled on one aspect of my DTD, but all that affects is for the later extraction (another XSLT) of the raw plain text from the file for the linguistic analysis. And of course, there is a chance I will discover some feature in a blog that I've not yet thought of, and I will need to evolve my DTD, but that shouldn't be too much problem.
Well, that's pretty much the work I've been doing this past week, as far as I can remember. There are a couple more things I'd like to say, but i'm going to put them in seperate posts, for reasons that shall become clear.