(archive 'newLISPer)

December 2, 2007

Formatting

Filed under: newLISP — newlisper @ 23:55

I decided to have another go at the syntax colouring and code formatting stuff this week. And this time I decided to try and do the job more thoroughly.

The first part of the task is to convert the newLISP source code into a list of elements. So, given some code like this:

(cond
    ((= start end)
        (push (list "white-space" source-string) token-list -1))

the resulting list of lists looks like this:

("left-paren" "(")

("code" "cond")

("left-paren" "(")

("left-paren" "(")

("code" "=")

("code" "start")

("code" "end")

("right-paren" ")")

("left-paren" "(")

("code" "push")

("left-paren" "(")

("code" "list")

("quoted-string" "\"white-space\"")

("code" "source-string")

("right-paren" ")")

("code" "token-list")

("code" "-1")

("right-paren" ")")

("right-paren" ")")

This format is a bit wordy, but it’s readable, and useful for when things go wrong… The module that does the tokenizing – Tokenize – draws on useful code from newLISP guru Fanda.

The second part of the task is to take this list and produce some reasonably-formatted output. This is quite difficult; it’s surprising how many subtle layout ‘rules’ you use. Trying to automate all of these was way too hard, so I quickly decided to think of this utility script as a rough first-draft tidier, rather than a final-draft polisher.

Say, for example, you found this poorly-formatted piece of newLISP lying in the trash:

(define (process_sof marker , len
precision height width components) (set  ' len (read_2_bytes)) ( set 'precision (
read_1_byte)) (set ' height
(read_2_bytes)) (set '
width (read_2_bytes)) (set ' components
(read_1_byte)) (unless (= len (+ 8 (*
components 3)    )  ) 
(throw "process_sof: bogus length"))
(return (list width height))) 

The formatter renders it like this (and adds colours for HTML display):

(define (process_sof marker , len precision height width components)
    (set 'len
        (read_2_bytes))
    (set 'precision
        (read_1_byte))
    (set 'height
        (read_2_bytes))
    (set 'width
        (read_2_bytes))
    (set 'components
        (read_1_byte))
    (unless (= len
            (+ 8 (* components 3)))
        (throw "process_sof: bogus length"))
    (return
        (list width height)))

which is at least more readable, and a good place to start applying your own choice of styling. The rigorous indenting for each new level of parentheses is probably too extreme – you usually want this more for indicating the start of new structures, rather than midway through set expressions, as here.

While we’re outputting the various elements of the source, we can apply any sort of formatting. For HTML, we can apply CSS tags. The advantage of having first produced a list of tokens is that keywords can be processed with more control. For example, we could distinguish between the different types of element – primitives, strings, numbers, arrays – using code like this:

(set 'type-number
    (& 15 (nth 1 (dump
                (eval-string k)))))

(case type-number
    (0 (set 'result
            (list {} code-element-string {})))
    (1 (set 'result
            (list {} code-element-string {})))
    (2 (set 'result
            (list {} code-element-string {})))
    (3 (set 'result
            (list {} code-element-string {}))) ...

It would be easy to modify this to produce different output for all functions that belong to a more general category, such as arithmetic or loop functions.

You could also add some checking and testing routines to the script. For example, the superfluous parenthesis in the following code:

(define (process_sof marker , len precision height width components)
(set 'len
    (read_2_bytes))
(set 'precision
    (read_1_byte))
(set 'height)
    (read_2_bytes)))

produces a question mark:

(define (process_sof marker , len precision height width components)
    (set 'len
        (read_2_bytes))
    (set 'precision
        (read_1_byte))
    (set 'height)
    (read_2_bytes))
)
?

although sadly it doesn’t notice the parenthesis in the wrong place, just that there was one too many by the end. Well, it’s early days yet!

Comment from Lutz

What happened to the beautiful blue ‘Santa Claus’ theme, which was on this site yesterday? I liked that :-)

Comment from cormullion

Still here! :)

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: