I just completed some SubText hacking and the results should be visible now. In addition to upgrading to version 1.9.5 (which should officially release here shortly), I implemented a new feature: Tag Clouds. I'd had enough of my feature envy from all the cool kids who had them, so I went and rolled my own. If you're at my actual site (as opposed to a feed reader), the Tag Cloud is off on the right.

This was a non-trivial feature to add. In an email conversation on the SubText developers' list a couple months ago, Phil Haack (semi-benevolent project dictator) indicated that he planned on tags being first-order objects—meaning that they'd have their own table and post cross-reference. They'd also be allowed to penetrate into other public interfaces (as they'd pretty much have to do). The main point from the majority of developers in that discussion was that they wanted Tags to be done well and fully and support all the different tag providers out there.

Like RSS and other Internet wonders, tags are fundamentally simple in concept. A tag is defined as simply any hyper link with a "rel" attribute of "tag". Something like this:

<a href="http://technorati.com/tags/competence" mce_href="http://technorati.com/tags/competence" rel="tag">competence</a>

The biggest gotcha in this setup is that officially, the "name" of the tag is determined by the link and not by the text of the link. So if your link were like this:

<a href="http://technorati.com/tags/competence" mce_href="http://technorati.com/tags/competence" rel="tag">incompetence</a>

 The tag is officially "competence" even though it displays "incompetence".

Which means that the core of the feature is scanning posts on insert and update to catch any tags they might contain and adding/updating the links needed for the tags in the post. All I can say is thank heavens for RegEx. It's a pain in the pinky toe to understand, learn, or debug, but you just can't beat it for parsing text. For the curious, and to open myself to ridicule, I'll give you the expressions I used (both case insensitive).

To find a link: <a(?<element>.*?href=[\"'](?<url>.*?)[\"'].*?)>.*?</a>

To figure if the link is a tag: rel=[\"']tag[\"']

Oh, be careful there if you use or copy those expressions; this was done in C# so quote characters are escaped using the \ character.

Anyway, it's done now, and I'd appreciate hearing any trouble you might have. I already caught a choke when the tag includes a "." in it. Fortunately that one just took a RegEx change in web.config so the tag display handler would pick it up. I'll create another post later detailing how to add a tag cloud to a random skin once this upgrade gets added to the SubText project officially.