(archive 'newLISPer)

November 26, 2007

What’s the difference?

Filed under: newLISP — newlisper @ 16:40

Seen on a mailing list today:

Hi. I’m trying to figure out if it is at all possible to compare two files, and then output the list of missing lines in one file to a separate file.

Say, I have two documents, each containing a long list of items. The names of the documents are Document FULL and Document PARTIAL. I sort both alphabetically and do a difference between them.

I’d like to capture the list of items (rows) that exist in Document FULL but don’t exist in Document PARTIAL. How would I go about this?

This looks like a newLISP one-liner. But I hate one-liners for their ugliness and forced nature. So here’s a multi-line version:

(print "in the first but not the second: "  
(difference 
    (parse 
        (read-file 
            (string 
                (env "HOME" ) "/list1.txt" )) "\n"  0) 
    (parse 
        (read-file 
            (string 
                (env "HOME" ) "/list2.txt" )) "\n"  0)))

November 18, 2007

Colour me orange

Filed under: newLISP — newlisper @ 12:53
Tags:

Not relevant for the archive version

You might be surprised to learn that – despite the bright orange tint everywhere on this site – I’m a bit of a monochromatic minimalist at heart, preferring dark text on light backgrounds and avoiding bright colours on web pages. However, I’ve decided that it’s time that newLISP code on this site should be displayed in colour rather than in dull grey. (I was partly inspired by Cyril’s excellent work on the vim syntax module, which you can read about on the newLISP forum, and by the colour schemes available in the newLISP editor, named after composers.)

Up to now, I’ve been using Lutz’ syntax.cgi program for colouring the code in the downloads section. I thought it would be cool to adapt this to use CSS styling, rather than use the old-school tags. And then I also noticed that this syntax program didn’t always process every file perfectly – there’s a few known problems mentioned in the file itself. So I thought I’d have a go at a new version of the program.

I’ve set it up so that there are four different CSS styles you can use for newLISP source code. These are selected by one of four classes:

  • c for comments
  • k for keywords
  • s for strings
  • p for parentheses

So the HTML code for marking up a function definition looks like this:

(
define 
(encode-backslash-escapes t
)
;

which is a lot of text for a simple task (it was even longer until I abbreviated the class names). I’ve installed the syntax processor to paint the files on the downloads page, and it’s working quite well so far. I admit that painting code this way isn’t very quick. But I bet Michaelangelo often said that about painting the Sistine Chapel.

A more interesting problem, though, is how to integrate the code painting with Markdown, which I use for writing the text and which is also running as a comment processor. The problem is that Markdown doesn’t provide any syntax or convention for specifying which language a piece of code is written in. The only convention available is to indent text in the source file by at least four spaces. Then such text will formatted using white space inside


and tags. You use this convention for all code listings – HTML, CSS, and any other language, and for anything else that relies on formatting defined by white space.

Obviously we don’t want to run a newLISP formatting script on a block of HTML code. But there doesn’t seem to be a convention for specifying a language, so there are two choices: detect the language using some form of scanning (or guessing), or stick an indicator at the start of the code-block to specify which language to assume.

I’ve taken the easy way out. The convention I’ve used in the latest newLISP version of Markdown is simple: if you want to display a code block but you don’t want it processed by the newLISP code painter, put an exclamation mark (!) at the start, on a line of its own. Like this:

! 
(def-inline-matcher 'link
 (make-inline-scanner
  '(:sequence
    brackets
    (:greedy-repetition 0 nil :whitespace-char-class)
    #\(
    (:register (:greedy-repetition 0 nil (:inverted-char-class #\))))
    #\)))
#'link-match)

So that bit of Common Lisp won’t be painted in colour. (To show the exclamation mark, I used a second one, but followed it with a space so that it didn’t get processed.) But this bit of newLISP will:

(define (process-source source-code-segments)
  (let ((result {})
       )
  ; work through segment list
  (dolist (pair source-code-segments)  
    (set 'start (last (first pair)))
    (set 'end   (last (last pair)))
    ; put any white space back in
    (while (< cursor start) (print (cursor 1 Txt)) (inc 'cursor))
    (set 'type  (first (first pair)))
    (set 'source-string (slice Txt start (- (+ end 1) start)))
    (cond 
      ((= type 0)
         (push (highlight-keywords source-string) result -1))
      ((= type 4)
        (push  (string {} (escape-html source-string) {}) result -1))
      (true
         (push  (string {} (escape-html source-string) {}) result -1)))
    (set 'cursor (+ end 1)))
   result
  )
)

It’s a user-friendly solution. Most of the code examples here are in newLISP, and presumably that’s true of most of the comments as well, so it’s easier to say when you don’t want painted code, not when you do.

The main syntax painting code is in syntax.lsp (the syntax.cgi file loads this and builds an HTML page). I’ve relied heavily on code by newLISP guru Fanda and newLISP creator Lutz. The heart of the script is Fanda’s routine that scans newLISP source and records the character positions where the mode (code, string, or comment) changes. Then this list is used to rebuild a copy of the source in which the different sections are enclosed in tags, and the white space gaps are copied over from the original.

This still requires some testing, and some additions (I haven’t included the single-character operators yet, because I’m not sure what form of escaping some of them will need). Please let me know of any problems or improvements. And if you can work out how to choose and apply your own colour schemes as well, please share.

Comment from newlisper

During testing, I’ve noticed a few problems with this syntax-painting module. One to watch is that consecutive strings that are not separated by a space are not processed correctly. For example:

(println {like}{this})

won’t be handled correctly. It’s like the scanner doesn’t get the time to switch from string to code then back to string. The solution at the moment is to not write strings like that!

Another – more cosmetic – problem is that some of the reserved words with question marks aren’t picked up. For example:

(number? list list?)

should all be matched. The regex that matches them needs more work.

Comment from Fanda

Hello newlisper!

I am happy that you are using my code processing script :-)

The error you are talking about has been fixed. I removed ‘group’ function and rewrote ‘group-types’. Download it from:
http://www.intricatevisions.com/ source/newlisp/code.lsp

Fanda

PS: I don’t know about speed up yet ;-)

Comment from cormullion

Thanks Fanda! I’m now trying to put your new work into my new work! I’m using your group function as the main tokenizer function for my formatter…

Comment from Fanda

Group function is obsolete, because its functionality was added to explode:

(explode ‘(1 2 3 4 5))
((1) (2) (3) (4) (5))
(explode ‘(1 2 3 4 5) 2)
((1 2) (3 4) (5))
(explode ‘(1 2 3 4 5) 2 true)
((1 2) (3 4))

‘true’ flag works opposite than in ‘group’ (if I had to guess).

Fanda

PS: You can fix my English grammar mistakes since you are reviewing comments :-)))) [joking]

November 14, 2007

Tyrannosaurus Regex: Markdown for newLISP

Filed under: newLISP — newlisper @ 11:38

If you’ve ever tried to write prose (or even poetry) in raw HTML, you’ll know how awkward it can be to fight your way through all those angle brackets and tags. Just emphasizing a four-letter word requires an additional seven or more keystrokes. So it’s no surprise that there are a number of alternative techniques for writing for the web. One of the emerging standards in this area is John Gruber’s Markdown, which he introduces thus:

Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).

Markdown is both a set of punctuation conventions and a script that converts the marked-up plain text. The original script is a 1400-line Perl file, carefully written and thoroughly documented. At its heart is a scary collection of carnivorous regular expressions, so fierce that other parts of the document have to be protected to avoid being devoured.

Versions of Markdown have been written in other languages too – there’s a partial list at the Markdown wiki. As well as Perl, PHP, and Python, you’ll find Ruby, Java, Javascript, Common Lisp, nu, Lua, and Haskell implementations.

Looking over these it’s interesting to see which of them have chosen to emulate the ‘Tyrannosaurus Regex’ of the Perl original, and how many have decided to write something more sophisticated, such as a language parser (or whatever a better alternative to those ugly regexen might consist of!). Very few have tried!

The different versions are intriguing. There’s Haskell, looking impressive in the form of a complete document translation system called Pandoc. Then there’s Lua, cleverly avoiding the Perl-compatible regex approach (which it doesn’t have) by running scanning algorithms instead. The Python and Common Lisp versions are seriously heavy-duty software development projects, with dependency graphs, class definition files, extensive test suites, bug fix lists, release schedules, and heaven-knows what else. I really admire these guys’ attention to detail and their ability to take almost infinite care over small details. They do things properly.

However, I’ve managed to avoid all kinds of complications by just coding it up in newLISP. I’m not enough of a programmer to be able to develop an alternative to the Regexosaurus method, so the newLISP version looks more like Perl than Lisp. But in a way that’s one of the really cool things about newLISP – you can write in any style you want.

One of the things I found useful when I was writing my newLISP version of Markdown was an interactive test application I built using the newLISP-GS server. The window has three panes: plain text with Markdown comments, raw Markdown HTML output, and rendered HTML. You type into the left pane, and the other two panes update as you type. Surprisingly, considering the nature of the process, newLISP-Markdown is fast – it can keep up with my typing, with just a bit of flickering when the text grows.

I’ve been using markdown/newLISP for a few months now, and it’s also the comment-processor for this blog. It’s currently operating at a reasonable level of compatibility with John Gruber’s Perl original. There are a few places where the output doesn’t agree with his, and these can be considered failures for the purposes of running the Markdown Test suite (which you can read about here. At present, 7 of the 23 test files produce output that differs from the HTML generated by markdown.pl:

  • Backslash escapes.text
  • Code Blocks.text
  • Images.text
  • Links, inline style.text
  • Links, reference style.text
  • Markdown Documentation – Syntax.text
  • Ordered and unordered lists.text

Some of these are minor ‘edge’ cases, such as URLs with parentheses in them, or line breaks in the middle of link definitions. But they should be fixed – one day. Occasionally I spend a few minutes trying to find and fix a bug – in fact I found one today.

If you fancy joining in and helping me complete the project, you’d be very welcome!

You’ll find the code for markdown-interactive.lsp on the downloads page, and the markdown.lsp file itself is also there.

November 11, 2007

Looks OK to me

Filed under: newLISP — newlisper @ 18:10

I recently came across this deftly argued defence of Lisp’s use of parentheses, from François-René Rideau ‘s blog. Here’s an extract:

Firstly, the nested parenthesized syntax is actually not bad at all, as far as understanding goes. It is actually so simple that anyone can grasp it and master it in 20 minutes. Parentheses might be tricky to balance by hand, but since the 1970s at least, interactive development environments match parentheses visually for you, so you never have to worry about counting them. Moreover, properly indenting code, which may be done automatically by editors, makes code quite readable. Using Lisp with improper user interfaces meant for other languages may be tedious, but this is hardly a technical claim against Lisp.

By comparison, it took me many weeks to initially master the intricacies of the C syntax: the operator precedence, the semi-colon as terminator not separator, the confusion between assignment and comparison, the overloading of the parenthesis and comma syntax (precedence overriding, function calling, or sequencing of expressions?), the misleading (and in C++ ambiguous) similarity between declarations and statements, the trap of nested braceless if-else constructs, the trickiness of declaring type of pointers to functions, the weird limitations of the preprocessor and its macros, etc.

This interesting post doesn’t get round to addressing the aesthetic visual appeal of programming languages. Even writing that phrase sounds odd, so I expect it’s not an area which has been explored very much. Or perhaps I’ve just never thought about it much.

I think that Lisp code – more precisely newLISP code – can be pleasing to the eye. I don’t claim to be a programmer, so I have no problem about valuing a programming language for its appearance, rather than for its technical abilities or deficiencies. Once you’ve overcome your initial reaction to the omnipresent parentheses, it’s the relative scarcity of other punctuation, a more English-like vocabulary, a less algebraic feel, and a flexible approach to layout that give a relaxed and uncluttered space in which to work. At least, that’s how it feels to me. But it’s a lot to do with what you’re used to, and your own tastes and experiences. It’s a personal thing.

For example, there’s no doubt that many people would find the following easy to read and comfortable to work with. It’s a short extract from a Python script, and it should have been indented 22 characters in from the left, I think, but that’s another story:

if title:
   title_str = ' title="%s"' \
       % title.replace('*', g_escape_table['*']) \
              .replace('_', g_escape_table['_']) \
              .replace('"', '"')
else:
   title_str = ''
if is_img:
   result = '= anchor_allowed_pos:
   result_head = '' % (url, title_str)
   result = '%s%s' % (result_head, link_text)
   curr_pos = start_idx + len(result_head)
   anchor_allowed_pos = start_idx + len(result)
   text = text[:start_idx] + result + text[match.end():]
else:
   curr_pos = start_idx + 1
continue

Good stuff – compact, expressive, powerful, I would guess. But I find it visually more taxing than a similar passage from a newLISP script:

(if alt-text
     (replace {"} alt-text {"} 0))          
(if  title 
     (begin 
         (replace {"}  title {"} 0)
         (replace {\*} title (lookup {*} escape-table) 0)
         (replace {_}  title (lookup {_} escape-table) 0))           
(replace {\*} url (lookup {*} escape-table) 0)
(replace {_} url (lookup {_} escape-table) 0)          
(string 
      {}  alt-text {}))

I don’t really know why, though. Perhaps it’s because, after all this time, I’ve started to visually ‘tune out’ (odd phrase) the effect of the parentheses.

Let’s try another one. What about this? See if you can guess the language.

source = try $ do 
  char '('
  optional (char '<')
  src  \t\n")
  optional (char '>')
  tit <- option "" linkTitle
  skipSpaces
  char ')'
  return (removeTrailingSpace src, tit)

linkTitle = try $ do 
  skipSpaces
  optional newline
  skipSpaces
  delim <- char '\''  char '"'
  tit > skipSpaces >>
                notFollowedBy (noneOf ")\n")))
  return $ decodeCharacterReferences tit

I don’t know this language at all, but I slightly prefer it to the Python, and prefer the newLISP more.

Is it worth putting in a Perl extract for comparison?

$alt_text =~ s/"/"/g;
    if (defined $g_urls{$link_id}) {
        my $url = $g_urls{$link_id};
        $url =~ s! \* !$g_escape_table{'*'}!gx;
        $url =~ s!  _ !$g_escape_table{'_'}!gx;
        $result = "<img src=\"$url\" alt=\"$alt_text\"";
        if (defined $g_titles{$link_id}) {
            my $title = $g_titles{$link_id};
            $title =~ s! \* !$g_escape_table{'*'}!gx;
            $title =~ s!  _ !$g_escape_table{'_'}!gx;
            $result .=  " title=\"$title\"";
        }
        $result .= $g_empty_element_suffix;
    }
    else {
        $result = $whole_match;
    }

They say that it’s no use arguing about matters of taste. But discussion is OK, so – what’s your opinion? Does it matter what code looks like? What’s your favourite?

Comment from m i c h a e l

As an artist (a bricoleur to be exact), I’m particularly sensitive to the aesthetic qualities of a language.

newLISP is a beautiful language. And that beauty is more than skin-deep. Its simplicity is what makes it beautiful, and newLISP is simple through-and-through.

I feel another motto coming on:

newLISP: Simply beautiful :-)

m i c h a e l

November 2, 2007

Macro economics

Filed under: newLISP — newlisper @ 17:53

My previous attempt at writing macros (Rabbit season) for function tracing worked well, but I should have paid a bit more attention to the business of ‘variable capture’. This happens when symbols inside and outside a Lisp macro have the same name. To illustrate, here’s a simple macro, which adds a new looping construct to the language that combines dolist and do-while. A loop variable steps through a list while a condition is true:

(define-macro (dolist-while)

  (letex (var (args 0 0)

    lst (args 0 1)

    cnd (args 0 2)

    body (cons 'begin (1 (args))))

  (let (y)

  (catch (dolist (var lst)

       (if (set 'y cnd) body (throw y)))))))

You call it like this:

(dolist-while (x (sequence 20 0) (> x 10))

  (println {x is } (dec 'x 1)))



;->

x is 19

x is 18

x is 17

x is 16

x is 15

x is 14

x is 13

x is 12

x is 11

x is 10

The problem here is that you can’t use a symbol called y as the loop variable, although you can use x or anything else. Put a (println y) statement in the loop to see why:

(dolist-while (x (sequence 20 0) (> x 10))

    (println {x is } (dec 'x 1))

    (println {y is }  y))

;->



x is 19

y is true

x is 18

y is true

x is 17

y is true

...    

If you try to use y, it won’t work:

(dolist-while (y (sequence 20 0) (> y 10))

  (println {y is } (dec 'y 1)))



;-> 

y is 

value expected in function dec : y

– in fact we know that y appears as a true/nil value, so it can’t be decremented. The problem is that the y is used by the macro, even though it’s in its own let expression.

Luckily in newLISP there’s an easy fix to this problem, and it’s something that I got slightly wrong in the earlier post. What you do is enclose the macro in a context, and make the macro the default ‘function’ in that context:

(context 'dolist-while)  

(define-macro (dolist-while:dolist-while)

      (letex (var (args 0 0)

        lst (args 0 1)

        cnd (args 0 2)

        body (cons 'begin (1 (args))))

      (let (y)

      (catch (dolist (var lst)

           (if (set 'y cnd) body (throw y)))))))

(context MAIN)

This can be used in the same way, but without any problems:

(dolist-while (y (sequence 20 0) (> y 10))

      (println {y is } (dec 'y 1)))

;->    

y is 19

y is 18

y is 17

...

By the way, another way to define this ‘context’ macro is by using the context prefix throughout, instead of switching to the context:

(define-macro (dolist-while:dolist-while)

  (letex (dolist-while:var (args 0 0)

          dolist-while:lst (args 0 1)

          dolist-while:cnd (args 0 2)

          dolist-while:body (cons 'begin (1 (args))))

  (let (dolist-while:y)

  (catch (dolist (dolist-while:var dolist-while:lst)

       (if (set 'dolist-while:y dolist-while:cnd) 

           dolist-while:body 

           (throw dolist-while:y)))))))

But it’s easier to do it with a context switch!

Timing

I tried this ‘redefine the define’ style of macro for something else – a version of define that records the time taken by each function call. Here’s the context:

(context 'timing)

(set 'timings '())

(define (finish)

   (push (list 'finish (now)) timings -1))

(define-macro (timing:timing farg) 

  (set (farg 0) 

    (letex (func   (farg 0) 

            arg    (rest farg) 

            arg-p  (cons 'list 

              (map (fn (x) (if (list? x) (first x) x)) 

                   (rest farg)))

            body   (cons 'begin (args))) 

           (lambda arg 

             (push (list 'func (now)) timings -1)

            body    ; body of function

            ))))



(context MAIN)

(constant (global 'newLISP-define) define)

(constant (global 'define) timing)

To run a script with this simple timing system, you load the context before you run:

(load {timing.lsp})

then stop timing at the end with

(timing:finish)

After that, the list (timing:timings) contains a record of your script’s functions, with the time when each function was started.

(MAIN:generate-html (2007 11 1 9 43 0 349070 305 5 0 0)) 

(MAIN:div-header (2007 11 1 9 43 0 349291 305 5 0 0)) 

(MAIN:div-header-content (2007 11 1 9 43 0 349310 305 5 0 0)) 

(MAIN:div-left (2007 11 1 9 43 0 349364 305 5 0 0)) 

(MAIN:list-posts-by-title (2007 11 1 9 43 0 349381 305 5 0 0)) 

(MAIN:select-query (2007 11 1 9 43 0 349398 305 5 0 0)) 

(MAIN:list-most-commented-posts (2007 11 1 9 43 0 354983 305 5 0 0)) 

(MAIN:list-most-fetched-posts (2007 11 1 9 43 0 371015 305 5 0 0)) 

...

(MAIN:show-badges (2007 11 1 9 43 0 407344 305 5 0 0)) 

(MAIN:div-footer (2007 11 1 9 43 0 407409 305 5 0 0)) 

(MAIN:div-footer-content (2007 11 1 9 43 0 407426 305 5 0 0)) 

(MAIN:finish (2007 11 1 9 43 0 407499 305 5 0 0))))

(This is a list produced by a run of the index.cgi script that builds this web page.) I tried to make the extra code inserted into each function as small as possible – (push (list 'func (now)) timings -1) seemed pretty minimal. I think (now) is the only newLISP function that can produce microsecond timings – and perhaps they’re not precise to the microsecond either.

Now that you’ve got this list, analysis is easy. You’ll probably want to produce a list of durations:

(define (now-list-to-seconds now-list)

  ; convert minutes/seconds/microseconds of a 

  ; (now)-generated list to microseconds

  (apply add 

    (map mul '(3600000000 60000000 1) (select now-list 4 5 6))))



(define (make-interval t-list)

  ; the list must be sorted into time order, with earliest first

  (sort t-list (fn (x y) (< (last x) (last y))))

  ; add duration

  (dotimes (i (length t-list))

    (set 's (now-list-to-seconds (last (t-list i))))

    (set 'e (now-list-to-seconds (last (t-list (+ i 1)))))

    (push (list (first (t-list i)) (sub e s)) results -1))

  results)

to produce this data, still in evaluation rather than duration order, but showing something akin to duration time for each function call, in microseconds:

((MAIN:generate-html 221) 

 (MAIN:div-header 19) 

 (MAIN:div-header-content 54) 

 (MAIN:div-left 17) 

 (MAIN:list-posts-by-title 17) 

 (MAIN:select-query 5585) 

 (MAIN:list-most-commented-posts 16032) 

 (MAIN:list-most-fetched-posts 4079) 

 (MAIN:list-most-recent-comments 17505) 

 (MAIN:div-center 24) 

 (MAIN:search-posts 32) 

 (MAIN:select-query 7901) 

 (MAIN:search-comments 16) 

 (MAIN:select-query 6510) 

 ...

Already I can see that the functions that take longest are the ones that run a complex SQLite query and format the results.

A list of the most frequently called functions is easy too:

(sort 

   (unique 

      (map (fn (x) 

        (list x (count (list x) (first (transpose data)))))

        (first (transpose data))))

(fn (a b) (> (last a) (last b))))



;->

((select-query (3)) 

 (div-footer-content (1)) 

 (div-footer (1))

 (show-badges (1))

 ...

Only a few more lines (not shown here) are required to produce a list of functions organized by the percentage of the total evaluation time they occupy:

34.22%   MAIN:select-query

29.96%   MAIN:list-most-recent-comments 

27.44%   MAIN:list-most-commented-posts 

 6.98%   MAIN:list-most-fetched-posts

 0.38%   MAIN:generate-html

 0.12%   MAIN:div-footer-content

 0.11%   MAIN:show-badges

 0.10%   MAIN:show-search-form

 0.09%   MAIN:div-header-content

...

The function that runs for the most time is select-query. That’s the main interface to the SQLite library, so if I wanted to make the script run faster, I might start by looking at this function, and the other two query/format functions, to see if I could tighten up the code.

Obviously, all this should be taken very much as a rough guide rather than a precise audit. For one thing, I’m not sure that the microseconds returned by now are as precise as they look. More seriously, running the same script repeatedly shows enough variation to indicate that much depends on what else the computer’s doing at that moment.

Another problem is that the time intervals are measuring elapsed real time, rather than allocated CPU time. Still, the data could provide some clues for further investigation.

Comment from don Lucio

Using newLISP to write a profiler for newLISP programs :-), meta programming is, where Lisp really shines.

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.