(archive 'newLISPer)

November 14, 2007

Tyrannosaurus Regex: Markdown for newLISP

Filed under: newLISP — newlisper @ 11:38

If you’ve ever tried to write prose (or even poetry) in raw HTML, you’ll know how awkward it can be to fight your way through all those angle brackets and tags. Just emphasizing a four-letter word requires an additional seven or more keystrokes. So it’s no surprise that there are a number of alternative techniques for writing for the web. One of the emerging standards in this area is John Gruber’s Markdown, which he introduces thus:

Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).

Markdown is both a set of punctuation conventions and a script that converts the marked-up plain text. The original script is a 1400-line Perl file, carefully written and thoroughly documented. At its heart is a scary collection of carnivorous regular expressions, so fierce that other parts of the document have to be protected to avoid being devoured.

Versions of Markdown have been written in other languages too – there’s a partial list at the Markdown wiki. As well as Perl, PHP, and Python, you’ll find Ruby, Java, Javascript, Common Lisp, nu, Lua, and Haskell implementations.

Looking over these it’s interesting to see which of them have chosen to emulate the ‘Tyrannosaurus Regex’ of the Perl original, and how many have decided to write something more sophisticated, such as a language parser (or whatever a better alternative to those ugly regexen might consist of!). Very few have tried!

The different versions are intriguing. There’s Haskell, looking impressive in the form of a complete document translation system called Pandoc. Then there’s Lua, cleverly avoiding the Perl-compatible regex approach (which it doesn’t have) by running scanning algorithms instead. The Python and Common Lisp versions are seriously heavy-duty software development projects, with dependency graphs, class definition files, extensive test suites, bug fix lists, release schedules, and heaven-knows what else. I really admire these guys’ attention to detail and their ability to take almost infinite care over small details. They do things properly.

However, I’ve managed to avoid all kinds of complications by just coding it up in newLISP. I’m not enough of a programmer to be able to develop an alternative to the Regexosaurus method, so the newLISP version looks more like Perl than Lisp. But in a way that’s one of the really cool things about newLISP – you can write in any style you want.

One of the things I found useful when I was writing my newLISP version of Markdown was an interactive test application I built using the newLISP-GS server. The window has three panes: plain text with Markdown comments, raw Markdown HTML output, and rendered HTML. You type into the left pane, and the other two panes update as you type. Surprisingly, considering the nature of the process, newLISP-Markdown is fast – it can keep up with my typing, with just a bit of flickering when the text grows.

I’ve been using markdown/newLISP for a few months now, and it’s also the comment-processor for this blog. It’s currently operating at a reasonable level of compatibility with John Gruber’s Perl original. There are a few places where the output doesn’t agree with his, and these can be considered failures for the purposes of running the Markdown Test suite (which you can read about here. At present, 7 of the 23 test files produce output that differs from the HTML generated by markdown.pl:

  • Backslash escapes.text
  • Code Blocks.text
  • Images.text
  • Links, inline style.text
  • Links, reference style.text
  • Markdown Documentation – Syntax.text
  • Ordered and unordered lists.text

Some of these are minor ‘edge’ cases, such as URLs with parentheses in them, or line breaks in the middle of link definitions. But they should be fixed – one day. Occasionally I spend a few minutes trying to find and fix a bug – in fact I found one today.

If you fancy joining in and helping me complete the project, you’d be very welcome!

You’ll find the code for markdown-interactive.lsp on the downloads page, and the markdown.lsp file itself is also there.

About these ads

Leave a Comment »

No comments yet.

RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: