OK, that is not news, but I am paraphrasing Techcrunch‘s coverage of the AOL Research data release.
For a while, AOL research put data on 20 million web searches by 650000 of their subscribers up for download. The link was fairly quickly taken down, but once information is released it is very hard to take it back. I am sure you can find a mirror or torrent if you look.
Because is it data on a selection of logged in AOL users, it contains a continuous record of their searches over time (March to May 2006). Because you have a record of searches over a period of time, you can start to make some assumptions about the user or the household and depending on the information the user has searched for you can sometimes identify them.
Most Many commenters on Digg don’t seem to see it as a problem, but then maybe their search history does not make it look like they are searching for information on their family tree, information for English teachers in a conservative US state, the website of a local church, chamber of commerce, and rotary chapter in the same state in between searching for MySpace, cheerleaders, preteen sex and strap on sex toys. AOL has kindly replaced these people’s screenname with a sequential integer but I am guessing if you went to that church, Chamber of Commerce, or Rotary chapter you would be able to pick an English teacher with that surname.
Maybe he made all those searches and deserves to be found out. Maybe he shares one internet connection with his son. Maybe his nextdoor neighbour steals his WiFi. In any case, I expect that the free AOL CD he picked up a while ago might have suddenly become pretty expensive.