Wednesday, August 01, 2007
An Interesting Corpus
MySpace removes 29,000 accounts of sex offenders. Rearrange those words in countless ways and you have the myriad headlines that surfaced across the internet last week. I don't pretend to know all the details, and I'm certainly not going to get into any discussions of 2nd chances or something like that. No, I come at this story from purely objective academic interest.
There is increasing interest in detecting undesirables on the internet. I've seen adverts here, in the UK and the US which are essentially making it clear that internet chat rooms, for example, are monitored. You've all seen the adverts, with adults speaking with a kids voice. It IS a dangerous world out there, and people do need to be careful online.
It seems that to take a stand and be seen to be doing something, MySpace deleted profiles of known sex offenders. Whether or not they knew about these because people used their real names, had suspicious interests or some other cunning plan, is not clear. What they have done by doing this, however, is create a ready made corpus of profiles of undesirables. This is exactly the kind of data that academics would love to get their hands on. If you want to be able to detect something, you need to have concrete examples of it. But it is hard to go out yourself and find such data. I doubt that this "corpus" will ever become publicly available but they have it there, ready to go. Imagine what you could do with this.
And imagine you will have to, because I'm not giving you any of my ideas.