(archive 'newLISPer)

October 25, 2008

Project Nestor: part 1

Filed under: newLISP — newlisper @ 09:42
Tags:

Most of my newLISP projects are unfinished and nameless, but my current project, still an early prototype, and still far from being finished, has already managed to acquire a name. It’s called Nestor. I needed to call it something, and since it deals with nested lists quite a lot, this was the first name that sprang to mind.

For those of you with a classical education, and/or fans of the film Troy, Nestor was the old soldier in Homer’s Iliad who advised King Agamemnon of Mycenae. To be honest, though, I was thinking more of Captain Haddock’s butler, Nestor, who runs Marlinspike Hall. After some desperate reverse acronymization, I managed to make NESTOR stand for “newLISP Source Translator Or Reducer.

The general aim is to convert newLISP source code into s-expressions, to extend the classic Lisp principle of allowing a rich interplay between code and data. Although newLISP has functions for converting text into code (read-expr), and for evaluating code passed as text data (eval and eval-string), information is lost in the process. And I’m hoping to preserve the two aspects of source code that are usually discarded: the comments and the white space (ie the formatting). Although these are solely for the benefit of the human reader and writer of code, I think they’re an important part of the process, and should be treated as such rather than always discarded.

My first attempt at a recursive descent parser for an LL(0) grammar worked OK. In plain English, that simply meant that the read function scanned the source input from left to right, and called itself recursively to look for the next token. On large source files, though (over 3000 lines), it did require a stack larger than the newLISP default of 2800 elements (so it would have to be run with a larger stack setting), and it wasn’t very quick either. I found it hard to control the nesting of the output as the various lists within lists were encountered in the source. And recursion, elegant though it undoubtedly is, has always made me scratch my head a little.

For the next version I went for a more iterative approach, and used a plain list as a stack, controlling the nesting by maintaining a stack pointer in a list:

(define (Stack:Stack i)
  (push i stack sp)
  (setf (sp -1) (+ 1 (sp -1))))

(define (Stack:inc-nest)
  (push 0 sp -1)
  (push '() stack sp)
  (inc 'nest-level))

(define (Stack:dec-nest)
  (pop sp -1)
  (setf (sp -1) (+ 1 (last sp)))
  (dec 'nest-level))

In the Stack default function, which is used to add items to the stack, push is putting an item on the stack at the location determined by the stack pointer sp, which is just a list of index numbers. The nesting can then be easily controlled by adding or removing indices to sp.

The main scanner works its way through the source, and calls the relevant function. For example, an opening parenthesis is easy:

(define (open-paren-token)
  (Stack:inc-nest)
  (Stack (list "open-paren" "(")))

but some of the others are more complicated, such as the number scanner, which is recursive:

(define (read-number-scanner list-so-far)
  (let ((next-char (peek-char)))
    ;; if next-char is a digit then call self recursively
    (if (and (char-numeric? next-char) next-char)
          (read-number-scanner (cons (get-next-char) list-so-far))
          (reverse list-so-far))))

The top-level function, Nestor:read, converts the following newLISP source:

; get stdin POST method parameters if present
;
(set 'inline (read-line))
(if inline
  (set 'params (get-vars inline)))

to this list:

("comment" "; get stdin POST method parameters if present")
("comment" ";")
  ("open-paren" "(")
  ("symbol" "set")
  ("whitespace" "IA==")
  ("quote" "'")
  ("symbol" "inline")
  ("whitespace" "IA==")
    ("open-paren" "(")
    ("symbol" "read-line")
    ("close-paren" ")")
  ("close-paren" ")")
("whitespace" "Cg==")
  ("open-paren" "(")
  ("symbol" "if")
  ("whitespace" "IA==")
  ("symbol" "inline")
  ("whitespace" "IAoJ")
    ("open-paren" "(")
    ("symbol" "set")
    ("whitespace" "IA==")
    ("quote" "'")
    ("symbol" "params")
    ("whitespace" "IA==")
      ("open-paren" "(")
      ("symbol" "get-vars")
      ("whitespace" "IA==")
      ("symbol" "inline")
      ("close-paren" ")")
    ("close-paren" ")")
  ("close-paren" ")")

It’s wordy, isn’t it?! Perhaps I’ll switch to abbreviations or code numbers one day, but at the moment I find the longer names really useful. And the format isn’t going to be used for permanent data storage, just for temporary processing. The whitespace is stored in base64 encoding – again, this may not be the best choice, but for now it works OK.

Now that I’ve got this newLISP expression, or nl-expression, perhaps the most obvious and important function is the logical inverse of Nestor:read, Nestor:nlx-to-text, which converts the nl-expression back to plain source form. Fortunately it’s easy, and a natural application for a recursive function:

(define (nlx-to-text nlxp)
  (dolist (i nlxp)
  (if (atom? (first i))
     (begin
       (cond
         ((= (first i) "symbol")
              (write-buffer buff (string (last i))))
         ((= (first i) "open-paren")
              (write-buffer buff (string  {(})))
         ((= (first i) "close-paren")
              (write-buffer buff (string  {)})))
         ((= (first i) "whitespace")
              (dostring (s (base64-dec (last i)))
                (write-buffer buff (string (char s)))))
      ; and so on with other element types
      ))
     ; not an atom, recurse
     (nlx-to-text i))))

When this completes, the buffer buff should contain the de-nested source code.

When I tested this approach against the official newLISP installation, I was pleasantly surprised that all the modules converted to and from nl-expressions without error. In other words, given some newLISP in source-text:

(= source-text (Nestor:nlx-to-text (Nestor:read source-text)))

usually returns true.

However, there are problems. For one thing, I’m embarassed that the presence of one or more Unicode characters will kill the translator stone dead. I’ve no idea what the cause is yet, but I hope it’s fixable.

There are other things on my to-do list that will be done soon: nested braces in strings, for example. But there are some aspects of the language that I probably won’t be able to handle at all. The newLISP interpreter is a strange and powerful beast, and my own scripting is never going to be able to match it:

1.2.3e4e5                          ; three tokens: 1.2 .3e4 e5 ?
(println"hi there")                ; no space?
string}                            ; weird symbol or string typo?
(set '[ you're (not) joking ] 0)   ; wacky symbol names in brackets?

That first one was aptly described as a monster! Does anyone use these newLISP constructions in their code?

The first application for Nestor is to re-introduce the syntax colouring of newLISP source code that appeared about a year ago on this site and disappeared during the revamp in March this year. If you’re seeing newLISP code in colour then it’s working. More on that another day.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: