Archive for the 'Software Engineering' Category

OSCON 2008: SNAP - PHP Taint Tool

Wednesday, July 23rd, 2008

Here are the slides for my talk today at OSCON.

Keep the disclaimer at the start at the front of your mind.

This tool is fragile and not ready to be called alpha quality
It is definitely not ready to be useful on large programs
We will release it under an OSI license … soon

SNAP Presentation (PDF)

Is computer Science Dead?

Tuesday, March 13th, 2007

Mainstream media are still keen to swallow the line that “real soon now” computer specialists will be redundant because fourth generation languages are so clever that clever people are not needed any more.

This fatuous pap by Neil McBride from De Montfort University (Rated by the Guardian’s University Guide as the 83rd best University in all of England) gives them the sound bites they need.

“Now vastly complex applications for businesses, for science and for leisure can be developed using sophisticated high-level tools and components.” he prattles. “Computer science curricula are old, stale and increasing irrelevant.”

Towards the end of his article it all becomes clear. “Here at De Montfort I run an ICT degree, which does not assume that programming is an essential skill. The degree focuses on delivering IT services in organisations, on taking a holistic view of computing in organisations, and on holistic thinking.”

I have never grasped the point of that kind of course. So you cater to people who want an IT career, but don’t have the core skills of the discipline? Why on earth do these people want to work in IT? Is there not some occupation they could find where they might be capable of grasping the essential skills?

He loves the car/software analogy. “Like cars, a limited number of people are interested in their construction, more live by supporting and maintaining them; most of us accept them as a black box, whose workings are of no interest but which confer status, freedom and convenience.”
Sure, the car industry needs many, many black box buyers, a moderate number of mechanics, a few engineers and designers, and very few theoretical purists. All industries, including computing do.

How many fresh graduates do you think the automotive industry need who take “a holistic view of” cars, but think understanding how an engine works is not “an essential skill”? Not very many I’ll wager.

The death of computer science is not just a fairy tale, it is also an enduring fairy tale. I am in the process of moving house, and cracked opened an old books on its way to the bin. Understanding Computer Science Advanced Concepts by Ray Bradley, Hutchinson Education, 1987 was a high school text book. He refers to the then current computers (late 1980s) as the fourth generation of computers. I don’t think that terminology has endured.

Under a heading “The Future” he writes “The development of the fifth generation machines promises to be the most significant yet. This is because of a fundamental re-think in the basic design of the machine. For example it should be possible to communicate with the machine in a natural language such as English. […] It should be possible for users to define their problems to the machine and for the machine to then develop the programs to solve them.”

That is not exactly how I recall computing in the 1990s panning out.

The death of computer science was a fairy tale in 1987, and 20 years later it is still a fairy tale. More powerful computers are not replacing programmers any more than calculators are replacing accountants or power tools are replacing carpenters.

What is considered a hard problem in computing changes over time but each era still has its hard problems that need smart people with a deep understanding of the fundamentals to solve.

Neil McBride

I ♥ register_globals

Tuesday, March 13th, 2007

I am aware that there are some things so shocking that you are not supposed to say them in polite company “Hitler had some good ideas”, “Tori Spelling is really pretty” or “I think I look really good in a beret” are all ideas so confronting that they are best kept to yourself regardless of how strongly you believe them.

I have a similarly shocking sentiment that I feel I have to share.

I really like register_globals in PHP.

There, I’ve said it. I can go away and order my I register_globals shirt now.

I (heart) register_globals

Sure, choosing to mingle untrusted user data and internal variables is a bad idea. Sure, if you are too lazy to initialise important variables with a starting value it gives you one extra way to shoot yourself in the foot. Sure, polluting global scope with form variables is going to be a mess in a larger app.

There remains something to be said for simple, elegant, readable ways to shoot yourself in the foot. PHP, like any reasonably complete programming language provides a whole host of other ways, so removing one is not particularly useful.

I used to teach PHP to beginners as a first programming language. I have introduced a few thousand complete novices to programming via PHP.

With register_globals on, this example is a short step from the “Hello World!” example:

<?php
if($name)
{
 echo "Hello $name";
}
else
{
 echo
  '<form>
   Enter your name: <input type="text" name="name">
   <input type="submit">
  </form>';
}
?>

It flows nicely from a “Hello World!” example. It can introduce variables and control structure if you did not provide an even softer introduction to them. It can be turned into an example with a practical use without making the code more complex.

This version may not look very different to you:

<?php
if($_REQUEST['name'])
{
 echo "Hello {$_REQUEST['name']}";
}
else
{
 echo
  '<form>
   Enter your name: <input type="text" name="name">
   <input type="submit">
  </form>';
}
?>

To an experienced eye, the two versions are almost identical. The second requires a little more typing, but nothing to get excited over.

To a complete beginner though, the second is a couple of large leaps away from the first. To understand the second version, somebody has to understand arrays, and PHP string interpolation. Both of these are important topics that they will have to come to in their first few hours of programming, but without register_globals, they stand in the way of even the most trivial dynamic examples.

I miss being able to assume register_globals as default behaviour. It made the initial learning curve far less steep. It made little examples cleaner and more readable. Like most safety measures, it does not really protect people who are determined to get themselves into trouble anyway. People who don’t understand the reasons behind it just run extract() or some code of their own to pull incoming variables out anyway. The user submitted comments in the manual used to be full of sample code for doing exactly that.

Oh, but just a side note to all beret wearing white supremacist Tori spelling fans, just because I am willing to speak up for one unpopular cause does not mean I am interested in yours. Sorry.

Pair Programming

Wednesday, March 8th, 2006

This pair programming study was being passed around at work, which always sends a chill down my spine.

In case you are not familiar with this confronting practice,

The official definition of pair programming is two programmers working together, side by side, at one computer collaborating on the same analysis, design, implementation, and test. In other words, consider it like two programmers using one pencil.

Its main thrust of the paper is that working as a pair each programmer achieved 227% of the average level of output in the organisation. Fascinating stuff, but not without its difficulties. I am not convinced that lines of code is a meaningful way to measure programmer productivity. I suspect that the average programmer might easily double their output by this measure just by being told that the lines of code they commit is being tallied.

By this measure I should have been paying to come to work this month. My main task has been refactoring about 6000 lines of code into about 2000 lines of code. I appear to have been doing negative amounts of work.

But that is all a side note, I was much more fascinated by the effect of pair programming on error rates. The summary data is helpfully provided as an image of a table. Errors decreased by three orders of magnitude.

Which sounds great, but think for a minute what that means. The author chooses not to disclose the normal error rate for solo programmers in the organisation, except to say that it was in line with the industry at the time. One data point given though is that a 10000 line task delivered by pairs had only two coding errors and one design error. That sounds pretty good to me … but if the pairs were producing 0.001 times the errors of solo programmers as the study claims, then solo programmers would have been expected to deliver 2000 coding errors and 1000 design errors in the same 10000 line task.

One coding error per five lines of code, and one design error per ten lines of code seems highly improbable to me, unless each pair consisted of one programmer doing 80% of the work and one rhesus monkey who without supervision would have just bashed keys randomly. Today of course, people would probably assume the output produced by the monkey was a valid Perl program, but as Perl had not been invented when the experiment was conducted I guess they spotted them as errors.

I remain unconvinced, and cannot think of pair programming without thinking of the PairOn chair … with apologies to Cenqua who I stole the image from and HermanMiller who they stole the dot com bubble icon the Aeron from.

PairOn

Avoid Not Using Double Negatives if you Don’t Want Digg Readers to Not Misunderstand What You are Not Telling Them Not To Do.

Tuesday, February 14th, 2006

This Top 10 list of bad programming advice has some very defensible ideas, and some sections were the author seems to have missed the point on how conventional wisdom became conventional wisdom.

Digg commenters are not always the most insightful bunch, so the fact that the article copped a pasting there made me want to like it, but it has two main problems. The nested double negatives make it very hard to read, and for most of the advice you would be at least as bone headed to dogmatically never apply the presented advice as to dogmatically always apply it.

Can somebody explain “The a square is a rectangle problem” to me? To my mind, “a square is a rectangle” is a basic fact, not a problem. Maybe I need to learn to think outside the box more.

From the article:

People who think in such parallels are likely to find themselves confused if they run into the “a square is a rectangle” problem. In math, squares may well be subclasses of rectangles but making square inherit from rectangle is plainly wrong.

Why is it plainly wrong?

Paper Acceptance - Waterfall2006

Thursday, February 9th, 2006

My paper Applying A Waterfall Methodology to Web Development has been accepted for Waterfall2006.

I have been hoping to hear Alistair Cockburn (pronounced “Jones”) speak silently about cube farms for some time.

Alistair Cockburn

Tagged

Waterfall 2006 CFP Open

Tuesday, February 7th, 2006

If you are not doing anything on April 1st, and think adopting a Pig Latin naming convention could help your code progress up the Job Security Index for Software Measurement, take a look at:

Waterfall 2006