I am not a rock star 16
I am not a rock star. I am a computer programmer. I think I’m quite a good one.
You are not a rock star either.
387,000 matches to that query. Can we all just… I don’t know… grow up please?
Mutter… grumble… chunter… I’m 40 you know!
Updates
I have it on reliable authority that James O’Kelly is a Ruby on Rails Rockstar that would make a great addition to any team!
Joined up thinking: why your resources want links 4
Remember the good old days? The days before Google? The days before Altavista? The days when a 14k4bps modem was fast? Did I say good old days?
In those days, the web had to be discoverable ‘cos it sure as hell wasn’t searchable. The big, big enabling technology of the web was the humble <a href='http://somewhereelse.com'>Go somewhere else</a>. Placing the links right there in the body of the document turned out to be exactly the right thing to do.
And it continues to be the right thing to do. Consider the two pieces of YAML below the fold.
Patterns and principles 2
Recently I’ve been thinking about the way that patterns on different scales interact with each other. If you read Christopher Alexander’s A Pattern Language, the first pattern in the book is Independent Regions, which are talked about within the context of a World Government, so it seems like a huge pattern. And it is, sort of, but it’s scale invariant – it applies at the level of countries, but it also applies to states, cities, neighbourhoods, streets, houses and arguably even rooms within those houses.
Or maybe it emerges from the patterns that apply at those scales. As Kipling has it:
As the creeper that girdles the tree trunk, the law runneth forward and back;
For the strength of the pack is the wolf, and the strength of the wolf is the pack.
Can we see similar scale invariant patterns in our programming practice? Of course we can, but we tend to call them principles. Programmers reading this will, I hope, be familiar with “The DRY (Don’t Repeat Yourself) Principle”, which I first came across by that name in Hunt and Thomas’s The Pragmatic Programmer. It’s such a fundamental pattern that, it suffuses every pattern in Beck’s Smalltalk Best Practice Patterns, but isn’t actually expressed as a pattern there.
There are other macro patterns, one that I’m starting to appreciate more and more is:
Fail Fast
Fail Fast is the principle that, when something starts going wrong, you shouldn’t cover it up, but raise the issue as quickly as possible, hopefully to a level where it can be dealt with. As a pattern it informs almost every activity:
- I want to go and photograph the Angel of the North. I plan to be at the angel half an hour before sunrise to get that magical pre dawn golden light. But the weather forecast says tomorrow will be entirely overcast, so I scratch that plan and decide to shoot some still life stuff in diffuse window light instead.
- I’m working on adding something to the work site and I realise that I’m not going to get it done in time, so I take it to the boss immediately. We work out how to reduce the scope of the change so that we’ll still have something useful, but which can/should be extended in a future iteration.
- A low level method gets some data it didn’t expect and doesn’t know how to deal with, so it throws an exception and attaches what it knows about the problem – hopefully something up the caller chain will have enough information to deal with the problem.
- I’m looking for a new house. I check the details to see make sure there is room for our (huge) dining table, which means the room needs to be at least 18 feet long. If the dining room isn’t big enough, I’m probably not going to like the rest of the house either, so I can reject houses quickly as I’m looking at details.
All reasonably obvious applications of the pattern, I hope you’ll agree.
It gets fun is when Fail Fast affects other patterns. For instance, there’s a pattern for choosing the name of the each block parameters. It says that you should always use the same name, usually each or ea. Many people rebel against the idea: surely it’s better to reflect the parameter’s type or rôle, or something. And they’re right, sort of. However, those are rules for naming method parameters (type suggesting) and temporary variables (rôle suggesting or ‘explaining’). If an each block gets long enough that you want to give the parameter a ‘better’ name, then it’s time give the block a name too. Pull the body of the block out into a method, ideally on the parameter’s class (which spares you the headache of naming the parameter – it’s called self – and, if you’re using ActiveSupport or something like it, you can replace the block with &:method_name.
By naming your block parameters this way, you’re applying the Fail Fast pattern. Your block becomes obviously ugly far sooner than it would if you gave it a more suggestive name, and getting ugly fast is often a good strategy. Similarly, if you’re stuck with an old fashioned for loop, call your iterator i and when the body of the loop gets unwieldy, replace it with a method or function call that takes the counter and (probably) a Collecting Parameter as arguments.
Fail Fast is why I’m using Haml more and more in personal work. Haml recasts HTML in a YAML like structure, doing away with all the line noise involved in closing tags and letting me concentrate on the structure and content of the page. In an ERB template, it’s all too easy to fall into the trap of writing complex logic in view code where it doesn’t belong. In Haml, that gets ugly quickly, which makes me factor the logic into helpers. Then, because generating markup in a helper is a pain in the arse, it’s easier to set things up so that the conditional logic simply selects which candidate partial to render. At the end of the process, the template, helpers and partials are working together, but each element is doing one thing and one thing only, and that makes for more comprehensible code. At least, it makes it more comprehensible to me.
Smalltalk people have been doing this sort of thing forever. A common complaint from new Smalltalkers is that the code editor isn’t very capable compared to, say emacs, or vi, or whichever IDE the newbie is used to. Seasoned Smalltalkers will reply that, if you’ve reached the point where you wish you had a more capable editor, the method you’re working on is probably too big. Limited text editing capability is just another way of failing faster, getting to the point where the code is telling you, loudly, that it needs to be better factored.
When getting to ugly hurts
When a pattern or programming language starts to get ugly fast if you start down a dodgy road, the programmer wins. But sometimes the wrong sort of code gets ugly. When I’m asked why I don’t code in Perl 5 any more (Ruby ‘til 6 is still my motto) I usually reply that “I got fed up of unrolling @_.”)
For those unfortunates who are unfamiliar with Perl 5, Perl subroutines are odd in that they don’t have named parameters. Almost every method ends up beginning like:
sub some_method {
my $self = shift;
my($other, $thing) = @_;
...
}There are arguments about whether or not to use shift to pull $self off the front of the parameter array, some folk argue for my($self, $other, $thing) = @_; as the One True Way, but they are heathens and should be shunned it really comes down to taste and local coding standards.
The problem with this style of argument passing is that you have to do it for every bloody method. One or two lines of precious vertical space are always lost to unrolling the argument list. Vertical space is precious. Losing one or two lines of space for every method is fine when your methods are long, but well factored methods are anything but long. When your method bodies are usually 3 or 4 lines long, that repeated chunk of code is adding 25-30% to your line count, and those added lines are almost pure repetition. The temptation was always to let that method get a little bit longer, swallowing the extra complexity rather than waste another few precious lines on doing the same damned thing again. Perl 6’s implicit self and named arguments are, on the face of it at least, only minor improvements, but they’re the sort of improvements that make all the difference.
What did I miss?
I’m sure you’ve got pet examples of this pattern, things that I’ve overlooked or never thought of. Tell us about it – comment here or blog it. Let’s all start failing earlier and winning bigger.
The authentication tarpit 8
At work, we’re looking at adding the Atom Publishing Protocol in a few places where it makes sense. APP’s got a lot going for it – the spec is a great example of how to design a Resourceful API and is worth reading even if it’s not an immediately good fit for your application.
But…
It’s one of the givens of good application security that you don’t store passwords in clear text and you do your level best not to send them over the wire in cleartext. That way, if someone pinches your user database, they should have their work cut out for them if they want to find out what your password is (because, unless you’re very good or are using something like 1password, you probably use the same password for lots of different websites).
A decent authentication protocol should ensure that the clear text of the password is never sent over the wire and doesn’t need to be stored in the clear on the server. Also, it shouldn’t be subject to reply attacks. One way to do this is to use SSL for authenticated sessions and rely on that protocol’s encryption to solve the problem of the password going over the wire in the clear. Or you could use the standard HTDigest authentication method. The basic trick with systems that don’t send the plain password over the wire works a little like this:
Alice and Bob have agreed a secret password and a hashing algorithm.
Alice wants to prove to Bob that a particular request comes from her, so she comes up with a unique ‘nonce’ string. She then concatenates this string with the agreed secret, and generates a digest string using the agreed hashing algorithm. She attaches the nonce string and the resulting digest to her request. When Bob receives the request, he concatenates the nonce string with the agreed secret, runs it through the hashing algorithm and, if he gets the same digest value as the one attached to the request, then it’s very probable that Alice is the real requester.
There are variations with different protocols, of course, but the general rule is to send a set of inputs and a result that can only derived from the results by someone who knows the agreed secret. It’s the sort of thing your bank does when you phone them: “Can I have you postcode? Surname and initial? What’s the first letter of your password? And the last letter? Your memorable address?” The theory is that only you know how to answer those last 3 questions, but at no point are you required to say “My password is ‘flapdoodle’” loudly and clearly in a crowded restaurant. Also, it ensures that the bank employee doesn’t get to see your whole password either. Where the banks fall down is when they phone you and immediately try to take you through the security questions without doing anything to prove that they are who they claim to be.
The HTDigest protocol has a neat little wrinkle in its hashing algorithm. Instead of generating the digest directly from a combination of the user identifier1, password and nonce string, it generates an intermediate digest from the user identifier and password, and then uses the same hashing algorithm to calculate a digest from this intermediate result and the nonce string. This means that the server can store the intermediate result instead of the plaintext password.
The problem with implementing the Atom Publishing Protocol is that one of the client apps that we want to support, Nokia’s LifeBlog mobile app, only supports the adaptation of the WSSE UserToken authentication protocol recommended by Marc Pilgrim. There’s lots to like about this protocol, especially the way it allows CGI based servers to take control of authentication without needing access to Apache’s .htaccess or requiring mod_digest to be installed. However, the design of the protocol is such that there’s no way to avoid storing the user’s password in plain text on the server. Which we really, really, really don’t want to have to do.
Mutter. Grumble. Chunter. Bloody WS-* – biting the big one again.
1 The user identifier is a combination of a username and a ‘realm’, a little like the way that email addresses often take the form username@domain
Martin Fowler's big mouthful 8
Martin Fowler is writing a book about Domain Specific Languages and, because you could never accuse Martin of a lack of ambition, he’s trying to write it in a reasonably (implementation) language agnostic fashion.
It’s fairly easy to write an implementation language agnostic book about old school DSLs, what used to be called little languages – there’s a fairly well established literature and theory to do with lexing, parsing and interpreting. These are all about algorithms, and algorithms are implementation language neutral by their very nature.
Where Martin has his work cut out for him is trying to talk about what he calls ‘internal DSLs’ and what I’ve been calling ‘pidgins’. These are the sorts of languages where you don’t write a lexer or parser but instead build a family of objects, methods, functions or whatever other bits and pieces your host language provides in order to create a part of your program that, while it is directly interpreted by the host language, feels like it’s written in some new dialect.
The Lisp family of languages can be said to be all about this. A good ‘bottom up’ lisp programmer will shape a language to fit the problem space, essentially building a new lisp which makes it easy to solve the problem at hand. Lisp’s minimal syntax, powerful macros and the way it blurs the boundary between code and data really support this style.
Once you move from Lisp to more ‘syntaxy’ languages, things get hairier. As Martin himself says
Another issue with book code is to beware of using obscure features of the language, where obscure means for my general reader rather than even someone fluent in the language I’m using. [...] this is much harder for a DSL book. Internal DSLs tend to rely on abusing the native syntax in order to get readability. Much of this abuse involves quirky corners of the language. Again I have to balance showing readable DSL code against wallowing in quirk.
He’s dead right. When I’m thinking about writing a pidgin in Ruby for instance, my first thought is usually to start with some kind of tabula rasa object which I can use to instance_eval a block. That lets me start to shape my language by lexically scoping the change:
in_pidgin do
...
end
But, though it’s easy to illustrate what I’d do with my tabula rasa, the implementation is somewhat tricky, and the tricks needed are unique to Ruby.
That sort of construct’s not really available to someone trying to write a pidgin in Java or Perl. In Perl, there are other odd corners of the language that can be abused to good effect. Dynamic scoping can let you ‘inject’ methods into a block even though there’s no Perl equivalent to instance_eval, or you can do some quite staggering things with the otherwise really annoying Perl function prototypes. For instance, here’s part of a Jifty definition of a persistent object:
column title =>
type is 'text',
label is 'Title',
default is 'Untitled post';
column body =>
type is 'text',
label is 'Content',
render_as 'Textarea';
Doesn’t look much like Perl does it? But it’s parsed and executed by perl with no source filters or eval STRING in sight. And there’s no unsightly :symbols scattered about the place either come to that.
These things all work by making the language do something unexpected, and generally, the way to do that is by knowing your host language inside out and playing with it. One of Damian Conway’s more inspired moments in recent years was List::Maker, in which the good doctor managed to find a corner of Perl where he could wedge a proper old school, complete with full on parser to build the AST, Little Language right in the heart of Perl without it looking like he was taking a plain old string and interpreting it. So, having found this odd little corner, he proceeded to implement a remarkably neat tool for building complex lists that are beyond the capabilities of Perl’s .. operator.
@odds = <1..100 : N % 2 != 0 >;
@primes = <3,5..99> : is_prime(N) >;
@available = <1..$max : !allocated{N} >
You may not think that’s all that sexy, but, and trust me on this, it’s just gorgeous. Yet more proof that Damian Conway is an (evil) genius.
Frankly, once you’ve seen the best of the pidgins available in Perl, some of highly praised ‘DSLs’ in Ruby start to look a bit ordinary. Ruby makes a great deal of stuff that a pidgin breeder needs to do really easy. In Perl it’s often rather hard with a huge amount of hoopage to deal with. But some of the things that are hard in Perl are impossible in Ruby.
Anyhoo… coming back to my point. I do find myself wondering if Martin’s bitten off more than he can chew in attempting to write a book that covers implementing pidgins without getting bogged down in the nitty gritty of individual languages. The problem he’s facing is that different languages don’t just have different quirks, they have different idioms too. What reads naturally in the context of a Ruby program will read very weirdly in, say Java or a lisp. Any patterns of implementation beyond broad (but important) strokes like “Play to your host language’s strengths” will surely end up as language specific patterns. Designing and implementing a good pidgin is hard. Doing it effectively means getting down and dirty with your host language and its runtime structures. And that’s not the sort of thing you can cover effectively in a language agnostic book.
Martin, if you’re reading this, good luck. I think you’re going to need it. I look forward to being proved wrong.
Getting to grips with Javascript
I’ve been busily adding AJAX features to the work website, and I got bored of writing Form handlers. I got especially bored of attaching similar form handlers to lots of different forms on a page, so I came up with something I could attach to document.body and then plug in handlers for different form types as I wrote them.
So, I wrote FormSender and set up my event handler like so:
FormSender.onSubmit = function (e) {
if (canDispatch(e)) {
YAHOO.util.Event.stopEvent(e);
YAHOO.util.Connect.setForm(e.target);
YAHOO.util.initHeader('Accept', 'application/javascript, application/xml');
YAHOO.util.Connect(e.target.method.toUpperCase(), e.target.action,
callbackFor(e));
}
};
jQuery(document.body).each(function () {
YAHOO.util.Event.addListener(this, "submit", FormSender.onSubmit);
});jQuery(e.target).hasClass('ajax'), but there was a snag. We had two sorts of forms on our pages, forms built using form_for(..., :class => 'ajax') and forms built using button_to(..., :class => 'ajax'), and they attached their classes in different places. In the form_for case, the class was on the form tag, but in the button_to case, it was on the generated form’s submit field. One option would be to monkey patch button_to, or roll my own ajax_button_to, but I ended up writing canDispatch like so:
function canDispatch(e) {
jQuery(e.target).find(':submit').andSelf().hasClass('ajax');
}This uses jQuery to build a list of the form, and its submit button, and then checks to see if any member of that list has the class ‘ajax’.
So, we can now tell if the source of a submit event is a form we should be doing AJAX dispatch with. The next trick is to work out what needs to be done with the results of sending the form. One option is the Prototype trick of simply evaluating the returned javascript, but it often makes sense to keep the behaviour clientside and just have the server return a datastructure. I decided that the way to do this would be by adding a second class to a form which used a none default handler, and then keep a hash of callback constructors keyed by class. This made callbackFor look like:
function callbackFor(e) {
var candidates = candidateClasses(e.target);
for (var i = 0; i < candidates.length; i++) {
if (FormSender.callbacks[candidates[i]]) {
return new FormSender.callbacks[candidates[i]](e);
}
}
return new FormSender.callbacks.ajax(e);
}candidateClasses is, again, a little more complex than I’d like, by virtue of the differences between button_to and form_for differences, but still reasonably straightforward, thanks to jQuery:
function candidateClasses(element) {
return
jQuery(element).find(':submit').andSelf()
.filter('.ajax').attr('className')
.replace(/ajax/, '').trim().split(/ +/);
}JQuery gets the form and its submit button, then selects the tag that has the ‘ajax’ class and pulls out the full className string. The replace gets rid of ‘ajax’, trim chops any useless whitespace off either end, and split(/ +/) turns it into an array of classnames. The replace -> trim -> split pipeline has the feel of something that must already exist in some DOM interface somewhere, but I’m not sure where, so I rolled my own.
Once we have a list of classes it’s easy to just cycle through the candidates until we find one that matches a callback constructor, falling back to the default where nothing matches.
For completeness, I’ll show you my current default handler, which I expect to be extending to deal with a couple more media types and, in the case of the failure handler, more failure statuses.
FormSender.callbacks.ajax = function (e) {
var form = e.target;
this.scope = form;
};
FormSender.callbacks.ajax.prototype.success = function (o) {
switch (o.getResponseHeader['Content-Type'].replace(/;.*/, '')) {
case 'application/javascript':
case 'application/x-javascript':
case 'text/javascript':
eval(o.responseText);
break;
default:
YAHOO.log("Can't handle AJAX response of type " + o.getResponseHeader['Content-Type']);
}
};
FormSender.callbacks.ajax.prototype.failure = function (o) {
switch (o.status) {
case 401:
Authenticator.loginThenSubmit(this);
break;
default:
switch (o.getResponseHeader['Content-Type'].replace(/;.*/, '')) {
case 'application/javascript':
case 'application/x-javascript':
case 'text/javascript':
eval(o.responseText);
break;
default:
YAHOO.log("Can't handle AJAX failure response of type " + o.getResponseHeader['Content-Type']);
}
}
};You’ll notice a reference to Authenticator.loginThenSubmit in the 401 handler, but that’s something I’ll save for another day.
A note on namespacing
Although I’ve been showing the various FormSender helper functions as if they were in the global namespace, in the real code they’re wrapped in a function call:
var FormSender = (function () {
var candidateClasses = function (element) {...};
var callbackFor = function (e) {...};
...
var onSubmit = function (e) {...};
return {onSubmit: onSubmit, callbacks: {}};
})();I love the (function () {...})() pattern – it’s a great way of keeping your paws out of the global namespace until you really, really need to.
FormSender Benefits
Aside from the obvious benefit of drastically reducing the number of onSubmit event handlers registered with the browser, I found that using FormSender has simplified some of my response handlers. For instance, one form would get a chunk of html back from the server and would use that to replace the div that contained the form. But the new div also contained a form that needed to have Ajax behaviour, so a chunk of the handler code was concerned with reregistering onSubmit handlers for the new form (or forms). No fun. By switching to a single, body level, form handler, that problem simply disappears – so long as the new forms have the right class, they automatically get the appropriate behaviour. Result.
Obviously, FormSender is unobtrusive javascript, which is nice, and its pluggable nature means it’s easy to extend just by writing new response handlers and registering them with the FormSender object.
Future Directions
One obvious extension to FormSender is to pull out the meat of the onSubmit method into the callback object to allow for forms that don’t simply send themselves to the server. Another is to wrap my head around the workings of Javascript’s object model to make it easy to build handlers that don’t duplicate the behaviour of the default handler through the medium of copy and pasting code…
Your comments please?
I’m still very new to Javascript as a programming language and I’m sure I’m doing plenty of boneheaded things here. Please let me know if there’s things I can do to improve this, or point me at any libraries that already cover this ground.
The secret of comedy is...
... timing. You either have it or you don’t.
Does this count as good timing?
- Finish up some improvements to the way Typo sweeps cached pages
- Announce Typo 5.0
- Go down with a horrible cough and cold that leaves you exhausted and incapable of hacking
- Discover that the ‘improvements’ in Typo’s cache sweeping can, occasionally, cause it to wipe the entire Typo installation directory
- Stagger out of bed. Attempt to fix problem
- Release Typo 5.0.1
- Discover that the fix doesn’t work
- Bleargh!
- Let your co-maintainer deal with the fall out before releasing Typo 5.0.2 which does fix the cache sweeper
- We hope
- Recover enough to write a blog entry
Okay folks, Typo 5.0.2 is out and it appears to be working. I’m running it here, and I’ve had no problems so far. I’ve still got the cold, but it’s nowhere near as horrible as it was (went to bed at 5pm on New Year’s Eve, woke up at 11am on New Years Day – first time I’ve missed the turning of the year in ages).
Typo 5 is out - and more on the future 7
Right, we’ve cut a Typo 5 gem and it’s on rubyforge and heading to various mirrors I hope. Frédéric’s writing the release notification which will be appearing on Typosphere Real Soon Now.
It’s been a surprisingly tricky process – we’re now requiring Rails 2.0.2 because the workings of view_paths have changed in a way which means we can’t quite make themes with Rails 2.0 and 2.0.2 and working with the edge seems like the more sensible proposition. If you’re on the bleeding edge, you should find that you get the right Rails via svn:externals anyway.
Typo futures
Meanwhile, I’ve been playing with stet and I’ve come to the conclusion that, although there’s mileage to be had in a radically slimmed down approach to the way Typo works, I’m better off simply removing the misfeatures from Typo and building from there – there’s a surprising amount of stuff that needs to be done in a competent blogging engine that Typo gets right – starting again would be throwing the baby out with the bathwater I think.
However, this does mean that if you’re following the Typo SVN trunk, you’ll be seeing a reduction in features in the short term. We’ll be copying the current trunk to a 5-0-stable branch before we start with the featurectomies though, so if you’re just after bugfixes, you’ll be better off there.
Multiblogging
We’re aiming to have multiblogging in the next release, but we’re rethinking the how of it. Right now, the ‘Blog’ object adds a bunch of complexity to code that would be much happier simply assuming that it has the database to itself. So we’re going to look at switching to a database per blog approach, that way our core code can pretty much forget about the complexities of multiblogging, and (at least initially) anyone who wants multiblogging can get there by monkeying with configuration files – of course, we intend to add a web based admin interface once things settle down and we know how things are going to work.
Caching
Caching is always a bugbear in any typo installation. Because we want to be installable on the widest possible range of hosts, we can’t rely on the presence of handy tools like ‘memcached’. Also, some of our users are operating under some fairly severe memory and process constraints, so it makes sense to have the webserve serve static files as much as possible. Meanwhile, tools like Evan Weaver’s Interlock are pointing the way towards seriously effective fragment caching. I shall be looking into implementing something that conforms to the interlock interface, but which can use an arbitrary cache backing store for fragments and maintain a full page cache. It’ll be interesting to find out if this is doable…
Atom Publishing Protocol
ActionWebService is going to go away – it’s already in the ousted branch of the rails SVN repository, and including it in Typo to support the various different admin APIs is getting painful. So, we’re going to preempt it. We won’t be getting rid of the various XMLRPC APIs until the pain becomes too great, but we are going to be concentrating on implementing, and strongly favouring, the Atom Publishing Protocol.
Feeds for everything
In particular, we’ll be adding atom feeds for all sorts of administrative data as a means of enabling people to write external tools for, say, spam protection, comment moderation and notification tasks. Right now, there’s a great deal of computation happening on the server side every time someone, say, comments on a post – in the kind of resource limited environments some people are running Typo in, that’s too much work. Switching to a feed + APP approach should help enormously with resource utilization.
Speaking of resources…
Using the server to render article previews is… suboptimal. Expect to see a javascript based preview system akin to the one I use for comments here.
Rails 2.0 and the Future of Typo 4
So, if you’ve been watching the Typo tree, you’ll see there’s been a fair amount of activity on it since Rails 2.0 got released. There’s a new default theme replacing the rather creaky ‘azure’, and a fair amount of work on getting our code compatible with the current state of Rails. As we work on this, it becomes apparent that Typo’s code is getting horribly brittle. I have said before that there’s been several places where we’ve zigged before Rails zagged, and we’re paying the price for that. It doesn’t help that our test coverage is distinctly ropy either – and I’m probably guiltier than most for letting things get into that state.
So, our goal is to get what we have cleaned up and working with Rails 2 before releasing Typo 5.0. Once that’s done, that line of code will go into maintenance mode – there are still plenty of bugs to fix and documentation to write, but I’m afraid that extending that base is becoming too much of a chore.
Which is why I have a new path in my local svk repository, //stet. I’m using this for experimental development of a new, slimmed down blogging engine that will be, first and foremost, a capable Atom Publishing Protocol host. Things like spam processing will be removed from the core of the application, but we’ll provide a suite of webservice clients that will consume the ‘unmoderated feedback’ webfeed and use APP to either approve or delete the feedback as appropriate.
Theming (at least initially) will probably be confined to Javascript and CSS changes, and I’m even thinking of exposing the sidebars as Atom collections – certainly I expect that, in the first cut, sidebars will be static – if you want content that looks dynamic you’ll have to do it via javascript.
My initial goal is to slim things down as far as I possibly can – I want to build a blogging engine that can cope with the tight memory constraints of shared hosting by off loading much of the heavy lifting to client boxes. After all, I have far more processing capability available to me on the laptop I’m typing this on than the slice of Site5’s hosting infrastructure that’s actually running the blog. By making things small and static, I also hope to wring good performance numbers out of the tool as well – expect aggressive page caching at the very least.
Another important goal is easy migration of Typo databases. I expect to be writing models and controllers from the ground up, but converting the database should just be a matter of running a migration.
Experimental
Of course, stet’s currently very experimental – about the only thing that’s actually written so far are a couple of routing plugins which should help radically simplify our routes.rb (expect an article’s url to change from /articles/2007/12/16/rails-20-and-the-future-of-type to /article/2007/12/16/rails-20-and-the-future-of-typo, but with a redirect in place to cater for the old style urls). I may have grandish plans for the thing, but I could equally discover that I’m off up a blind alley, in which case you can expect me to return to the current typo codebase with a few more lessons learned.
ActiveResource?
I remain unconvinced by ActiveResource as a technology. I agree with the authors of RESTful Webservices – good webservices are joined up. They take full advantage of what could be described as the defining technology of the world wide web, the URL based hyperlink to knit resources together in a discoverable fashion. An ActiveResource based webservice may well be a good HTTP citizen, but it’s still not really ‘webby’ enough for my taste. Which means the Atom Publishing Protocol will remain my friend for most of the things I hope to do with stet. It may be harder to write a good APP server, but I’m convinced that it’s a much better interface for clients, and you should always favour ease of use over ease of implementation. If nothing else, we’re aiming to have more users than developers. Many more.
Comprehensible sorting in Ruby 3
Here’s a problem I first came across when I was about 13 and helping do the stock check at the family firm. The parts department kept all their various spare parts racks of parts bins. Each bin was ‘numbered’ with an alphanumeric id. We had printouts of all the bin numbers along with their expected contents and we’d go along the racks counting the bins’ contents and checking them off against the print out. What confused me at the time was the way the printouts were organized. Instead of the obvious ordering, “A1, A2, A3, ..., A99”, the lists were ordered like “A1, A10, A11, ..., A2, A20, A21, ...”. After a bit of thought I realised that the computer was sorting the numeric bits of the bin numbers as if they were just sequences of strange letters. A bit more thought made me realise why, post computerisation, people were starting to use bin numbers like “A01, A02, ...”. Computers were more important than people so, in order to make sorting things easier, just add spurious leading 0s to make the number field a fixed width and Robert’s your parent’s brother.
27 years later and computers are still crap at sorting things in a sensible fashion. Back before Moore’s Law was really kicking in, I suppose it was excusable, but surely we’ve moved past that now.
Over on the labnotes blog, there’s an example of some ruby code that attempts to do ‘human’ sorting:
module Enumerable
def sensible_sort
sort_by { |key| key.split(/(\d+)/).map { |v| v =~ /\d/ ? v.to_i : v } }
end
endIt’s okay, as far as it goes. It certainly solves the parts bin problem I outlined above, but it’s not ideal. For example, you might expect ['-1', '1', '1.02', '1.1'].sensible_sort to leave the order unchanged, but what you actually get is ‘1, 1.02, 1.1, -1’. Not ideal. Let’s rewrite sensible sort as
module Enumerable
def sensible_sort
sort_by {|k| k.split(/([-+]?\d+(?:\.\d+)?(?:[-+]?[eE]\d+)?)/).map {|v| Float(v) rescue v}}
end
endThat ugly regular expression should match a far wider selection of string representations of numbers. Certainly our ‘bad’ list is now sorted correctly.
But what about “a-1”, “a-2”. Using the implementation above, they’d get sorted as “a-2, a-1”, which can’t be right, can it? Let’s extend it a bit more and make sure we only worry about the ’+’ and ’-’ if they’re at the beginning of a line or preceded by whitespace.
module Enumerable
def sensible_sort
sort_by {|k| k.to_s.split(/((?:(?:^|\s)[-+])?\d+(?:\.\d+)?(?:[eE]\d+)?)/ms).map {|v| Float(v) rescue v}}
end
endAnd that works fine, until you find that “B” sorts before “a”. Let’s catch that as well:
module Enumerable
def sensible_sort
sort_by {|k| k.to_s.split(/((?:(?:^|\s)[-+])?\d+(?:\.\d+)?(?:[eE]\d+)?)/ms).map {|v| Float(v) rescue v.downcase}}
end
endYay!
Oh, wait a minute, what about version numbers? How should we sort, say “perl 5.8.0” and “perl 5.10.0”? The 5.8.0 form should definitely come first… Hmm…
How about
module Enumerable
def sensible_sort
sort_by {|k| k.to_s.split(/((?:(?:^|\s)[-+])?\d+(?:\.\d+?(?:[eE]\d+)?(?:$|(?![eE\.])))?)/ms).map {|v| Float(v) rescue v.downcase}}
end
endHow far down does this thing go?
I just noticed that ”.1” sorts after “1”. Time for another tweak…
module Enumerable
def sensible_sort
sort_by {|k| k.to_s.split(/((?:(?:^|\s)[-+])?(?:\.\d+|\d+(?:\.\d+?(?:[eE]\d+)?(?:$|(?![eE\.])))?))/ms).map {|v| Float(v) rescue v.downcase}}
end
endbut that doesn’t work with version numbers like ”.8.2”, ”.10.2”...
Time passes… Thorin sits down and sings about gold
I was planning on giving an extension of the regex that caught this issue as well, but I’m afraid I’ve stumped myself – I can’t do it with a single regular expression unless I can use a fixed width lookbehind assertion, but they’re only available in Perl. Of course, it’s still possible to fix it, but doing so will take more thought than I have available to me at this time on a Sunday morning. And all this is before we get onto making sure that “1/2” sorts between “0” and “1”. And phone numbers. After all, “01915551238” is ‘obviously’ the same as “0191 555 1238” and “0191 555-1238”, so they should end up next to each other in the sorted list.
It looks like this is a ‘three pipe problem’ after all. I shall probably return to this…
