(archive 'newLISPer)

July 14, 2006

List, list, O list!

Filed under: newLISP — newlisper @ 14:12
Tags:

>

Lisp lovers love lists. In newLISP you certainly get plenty of list practice. You use lists for your data, and you use them for your code. Perhaps one of the good things about Lisp is that you don’t have to learn much in the way of syntax, other than the ubiquitous parenthesis-enclosed sequence of elements – the list.

It has to be said, though, that working with lists isn’t always easy. You can control the structure of your own code, and you also have some say in shaping the data structures you devise. But there are some lists that you just have to accept the way they are.

One problem with Lisp lists is that the individual parentheses aren’t in any way numbered or differentiated, so you can’t tell at a glance which parenthesis matches which, or how deep in the list structure you are. If you don’t use a powerful newLISP editor that knows how to format and analyse lists automatically, you’ll have to rely on your current editor’s trusty ‘Balance’ command and find your own way around the maze of parentheses.

Consider, for example, XML files. XML may well be the solution to all humanity’s file format problems, but I don’t find it that user-friendly. How can we use newLISP to hack our way through the thicket of angle brackets like this:

<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="0.92">
<channel>
    <docs>http://backend.userland.com/rss092</docs>
    <title>newLISP Fan Club</title>
    <link>http://www.alh.net/newlisp/phpbb/</link>
    <description>Friends and Fans of newLISP </description>
    <managingEditor>dooright101@yahoo.com</managingEditor>
    <webMaster>dooright101@yahoo.com</webMaster>
    <lastBuildDate>Mon, 10 Jul 2006 22:40:56 GMT</lastBuildDate>
<item>
    <title>newLISP and/or Whatever Else ::
    RE: humble contribution:</title>
    ...?

One idea I had is to replace the parentheses with numbers indicating the nesting depth. It renders the XML like this:

➀ rss
 ➁ @
  ➂ version  0.92 ❸❷
 ➁ channel
  ➂ docs  http://backend.userland.com/rss092 ❸
  ➂ title  newLISP Fan Club ❸
  ➂ link  http://www.alh.net/newlisp/phpbb/ ❸
  ➂ description  Friends and Fans of newLISP ❸
  ➂ managingEditor  dooright101@yahoo.com ❸
  ➂ webMaster  dooright101@yahoo.com ❸
  ➂ lastBuildDate  Mon, 10 Jul 2006 22:40:56 GMT ❸
  ➂ item
    ➃ title  newLISP and/or Whatever Else ::
     RE: humble contribution: ❹
    ...

(Here’s a picture in case I’ve mucked up the Unicode:

Here’s the newLISP code that produced it:

#!/usr/bin/newlisp
(define
  (start-list)
  ; open parenthesis
  (print "\n"
     (dup " " level)
     (char
        (+
          (int
             (append "0x"
                (string 2780)) 0 16) level)))
  (inc 'level))

(define
  (close-list)
  ; close parenthesis
  (dec 'level)
  (print
     (char
        (+
          (int
             (append "0x"
                (string 2776)) 0 16) level))))

(define
  (process-xml l)
  (dolist
     (e l)
     (unless
        (list? e)
        (print { } e { })
        (begin
          (start-list)
          (process-xml e)
          (close-list)))))

(xml-type-tags nil nil nil nil)

(set 'the-data
  (xml-parse
     (read-file
        (main-args 2))
     (+ 1 2 4 8 16)))

(set 'level 0)
(start-list) (process-xml (first the-data)) (close-list)
(exit)

(Sorry for the unusual formatting of the above newLISP code – I’m still working on my autoformat routines, and the current version is a little indent-happy today.)

The hard work here is done by newLISP’s xml-parse function. You can adjust the knobs of this and the xml-type-tags functions to get exactly the output you want. The start-list and close-list functions use Unicode symbols instead of parentheses. The script doesn’t always work perfectly, but it sometimes gives a few clues as the structure of an unfamiliar XML file.

The xml-parse function produces a newLISP list that normally looks like this when unencumbered with Unicode digits:

((rss
     (@
        (version "0.92"))
     (channel
        (docs "http://backend.userland.com/rss092")
        (title "newLISP Fan Club")
        (link "http://www.alh.net/newlisp/phpbb/")
        (description "Friends and Fans of newLISP ")
        (managingEditor "dooright101@yahoo.com")
        (webMaster "dooright101@yahoo.com")
        (lastBuildDate "Mon, 10 Jul 2006 22:40:58 GMT")
        (item
          (title "newLISP and/or Whatever Else ::
           RE: humble contribution:")
...

I believe this is called SXML, a Lisp-like way of looking at an XML structure.

Since this is a newLISP-friendly format, we can work with it using familiar newLISP tools. Let’s load an XML file of the newLISP forum’s RSS news feed and convert it to SXML:

#!/usr/bin/newlisp
(xml-type-tags nil nil nil nil)
(set 'sxml-data
  (xml-parse
     (get-url {http://www.alh.net/newlisp/phpbb/rss.php})
     (+ 1 2 4 8 16)))

The SXML version of the XML newsfeed is now in the sxml-data symbol.

A good tool to start with is ref, which looks for the first occurrence of something in a list, regardless of how deep it’s buried. For example, where’s the first news ‘item’?

(ref 'item sxml-data)
;-> (0 2 8 0)

newLISP reports that there are four digits in the ‘address’ of the first news item (0 2 8 0), suggesting that the first occurrence in the SXML list is four levels deep. (I think this matches the (3) in the Unicode-numbered list above, assuming that the numbering in my script is 0-based.)

Once we have a working address, we can access the information there by passing the address either to the nth function or by using implicit indexing. nth has the advantage of being more readable (to my eyes), but implicit indexing is faster, and more versatile, so I’ve learnt to use that. To use implicit indexing to extract information, follow the symbol with the index numbers, either in a list or rattling loose:

(sxml-data 0 2 8 0)
;-> item

(nth 0 2 8 0 sxml-data)     ; same thing
;-> item

(sxml-data '(0 2 8 0))      ; but implicit indexing accepts a list too
;-> item

(sxml-data (ref 'item sxml-data))
;-> item

That last example shows that we don’t have to obtain the index numbers first, we can just get the address and extract the data in one operation.

However, we don’t want to find just the ‘item’ keyword: we want the whole of the data for a news item. This is easy: just cut off the last number of the four digit address (the ‘house number’), and you’ll get the level above, which refers to the whole ‘street’:

(sxml-data (chop (ref 'item sxml-data)))
;->

(item
  (title "newLISP and/or Whatever Else :: RE: humble contribution:")
  (link "http://www.alh.net/newlisp/phpbb/viewtopic.php?p=6755#6755")
  (description "Author
  ...

Another useful function for SXML is lookup. The structure of most of the ‘item’ section we’ve just seen looks like a keyword/value structure known as an association list: ((title “title”) (link “link”)…), and this can be explored by functions such as lookup:

(set 'an-item (rest (sxml-data (chop (ref 'item sxml-data)))))
(lookup 'title an-item)
;->     newLISP and/or Whatever Else :: RE: humble contribution:

Although ref finds only the first occurrence, it’s easy to loop through all the items using a dolist on the level above ‘item’, which is ‘channel’.

(set 'channel (sxml-data (chop (ref 'channel sxml-data ))))
(dolist (news-item channel)
    (and
        (list? news-item)
        (find 'item news-item)
        (println (lookup 'title news-item)))
        ...)
;->
newLISP and/or Whatever Else :: RE: humble contribution:
newLISP and/or Whatever Else :: RE: humble contribution:
newLISP and/or Whatever Else :: RE: Ruby-style iterators
newLISP and/or Whatever Else :: RE: Ruby-style iterators
newLISP and/or Whatever Else :: RE: Miscellaneous questions, suggestions
newLISP and/or Whatever Else :: RE: Ruby-style iterators

The and construction helps to avoid possible errors – if any of the tests or searches fail, that item is skipped.

We’ve managed to extract the stuff we want without specifying any index numbers directly. This makes life easier, and we don’t have to count parentheses too much.

To finish off this post, here’s a slightly different example of the same technique. This script displays recent changes to a Wikipedia entry. The data is in Atom rather than RSS format, but I was able to use the same tools to analyse the (S)XML file:

#!/usr/bin/newlisp
;; first, a utility function to tidy a string
(define
  (cleanup str)
  (let
     (replacements
     '(  ({&amp;}           {&})
          ({&amp;}          {&})
          ({&gt;}           {>})
          ({&lt;}           {<})
          ({&nbsp;}         { })
          ({&apos;}         {'})
          ({&quot;}         {"})
          ({(}          {(})
          ({)}          {)})
          ({:}          {:})
          ("\n"             "")))
     (map
        (fn
          (f)
          (replace
             (first f) str
             (last f))) replacements)
     (join
        (parse str {<.*?>} 4) "")))

;; get the Atom source for the Wikipedia page
(set 'xml {http://en.wikipedia.org/w/index.php?title=Zodiac&action=history&feed=atom})

;; convert to SXML
(xml-type-tags nil 'cdata '!-- nil)
(set 'atom-data
  (xml-parse
     (get-url xml)
     (+ 1 2 8 16)))
(set 'entries (atom-data 0))

;; print header
(println {Wikipedia: }
  (lookup 'title
     (rest entries)) { }
  (date
     (date-value) 0 "%Y-%m-%d %H:%M:%S"))

;; now do each entry, getting the updated details and title information
(dolist
  (e entries)
  (and
     (find 'entry e)
     (println
        (lookup 'updated
          (rest e)) { }
        (cleanup
          (lookup 'title
             (rest e))))))
(exit)

In this script, I’ve added a simple ‘get rid of all those encoded entities’ function, called cleanup, but the actual code that compiles a summary of the XML feed is short and simple.

Here’s what a couple of these little newLISP news tickers look like when they’re running on my computer desktop:

Advertisements

5 Comments »

  1. >Hi, newLISPer.So if I understand the newLISP indexing approach, it works something like this:0.0. rss 0.1. @ 0.1.1. version 0.92 0.2.0. channel 0.2.1. docs http://backend.userland.com/rss092 0.2.2. title newLISP Fan Club0.2.3. link http://www.alh.net/newlisp/phpbb/ 0.2.4. description Friends and Fans of newLISP 0.2.5. managingEditor dooright101@yahoo.com 0.2.6. webMaster dooright101@yahoo.com 0.2.7. lastBuildDate Mon, 10 Jul 2006 22:40:56 GMT 0.2.8.0. item 0.2.8.1. title newLISP and/or Whatever Else :: RE: humble contribution:So a reference to 0 2 would return all 0.2.* items from this outline.I got confused by the starting 0. The SXML produced has an extra parentheses around it, like((rss …Is that right?-Noah

    Comment by Noah — July 18, 2006 @ 08:22 | Reply

  2. >It certainly looks that way. I couldn’t find much of note about SXML to read, but it seems like the whole data structure is the first item of a list.I’m still getting confused by the 0-based numbering as well… :-)

    Comment by newlisper — July 18, 2006 @ 08:31 | Reply

  3. >OK, well, looking at the documentation, it shows this:(set ‘pList ‘(a b (c d () e)))(push ‘x pList ‘(2 2 0)) → xpList → (a b (c d (x) e))(ref ‘x pList) → (2 2 0)So I think I’m wrong, the zero is there because the text rss is the first item in the top-level list, and the second item in the top-level list is the att:val pair version=”0.92″, and the third item is the channel text.So the parentheses work like(rss(…and not ((rss(…which would yield an actual index of 0 0 2 8 0 for the item text.Is that right? I think that’s right, but I’m still confused. This is one of those embarassing things for me that even after years of turning it over in my head, I’ll probably still get it wrong sometimes. Argh.-Noah

    Comment by Noah — July 18, 2006 @ 08:54 | Reply

  4. >although the xml-parse function returns the sxml structure as item 1 of a list of 1, so you’ll still get ((rss at the beginning…

    Comment by newlisper — July 18, 2006 @ 13:25 | Reply

  5. >OK:0 – the first element of the top list2 – the third element of the first element of the top list8 – the ninth element of the first element of the top list0 – the first element of the ninth element of the first element of the top list.0 2 8 – the ninth element of the third element of the top list.So the indexing scheme will only index lists and sublists enclosed within a single root element (list).-Noah

    Comment by Noah — July 18, 2006 @ 20:47 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: