(archive 'newLISPer)

November 14, 2010

Mystic rose

Filed under: newLISP — newlisper @ 17:26
Tags:

Draw lines joining all the vertices of a polygon. We used to draw these patterns at school.

mystic-rose1.png

It’s fun to see them cropping up on the internet now. We called them ‘mystic roses’, although Wikipedia is curiously silent on the topic so I haven’t yet had time to track down the significance of the name.

A quick attempt at coding it up persuaded me that there’s really no short and effective alternative to simple iteration for these sorts of tasks.

    (set 'number-of-points 155 'rad 190 'pi 3.141592653)
    ; make points list
    (for (point 0 (- number-of-points 1))
            (push (list  (mul rad (cos (div (mul 2 pi point) number-of-points)))
                                     (mul rad (sin (div (mul 2 pi point) number-of-points)))) points-list))
    ; draw lines connecting points
    (set 'b (- (length points-list) 1))
    (set 'n 0)

    (for (i 0 b)
    (for (j (+ i 1) b 1 (>= j (length points-list)))
       (inc n)
       (draw canvas-name (points-list i) (points-list j) (apply amb primary-colors))))

(I’ve not shown the code for the draw function that draws the lines onto an HTML5 canvas. See canvas.lsp in the official newLISP distribution for help.)

I did try to make it work without the obvious for loops but I gave up. It can’t get much simpler than this, can it? Can you persuade apply or map or series to do it in less code? Why recurse when you can iterate?

Nevertheless, I’m slightly unhappy with the lack of something I can’t quite put my finger on. Why is it always necessary to be subtracting or adding 1 to loop terminators, or checking for overflows, when I’m writing these sorts of scripts? I’m slightly jealous of the Mathematica definition which looks coolly elegant and slightly forbidding at the same time.

Ah well, it’s fun playing with the colours and alpha values. By cranking up the values and inserting some randomness, you get some quite pleasing effects.

mystic-rose2.png

And this 155-a-gon (what is its greek name?) starts to take on some mysterious textures of its own:

mystic-rose3.png

October 9, 2010

egnellahC nilperG ehT

Filed under: newLISP — newlisper @ 22:19
Tags:

I spent a happy half hour today trying to take the Greplin Challenge.

Unfortunately, I managed to complete only the first level. It wasn’t so much that the problem was hard. It was more that I couldn’t find an elegant solution that satisfied me.

The problem is to find the longest substring of a string that is the same in reverse. In other words, to find the longest palindrome.

Here’s my attempt. If the string is in ‘source-string’:

    (for (character 0 (dec (length source-string)))
            (set 'end-of-word
                    (find (source-string character) source-string nil
                            (+ character 1)))

            (when end-of-word
                    (set 'word (slice source-string character
                                    (+ 1 (- end-of-word character))))
                    (when (= (lower-case word)
                                    (reverse (lower-case word)))
                            (push word palindromes))))

    (first (sort palindromes
                    (fn (a b) (> (length a ) (length b)))))

As you can see, it’s not exactly a one-liner. But at least it avoids a completely ‘brute-force’ technique. It looks for likely palindromes rather than testing every possible substring. Apart from that, it just puts all the palindromes in a list and finds the longest later. I’m sure you can produce a more elegant solution!

It might be a few days before I get round to level 2. But don’t wait for me – have a go!

Update

I realised later that this algorithm isn’t very good. Yes, it finds the correct answer for the particular test case on the Greplin challenge, but it doesn’t always find the longest palindrome. For example, if the string contains the substring “hahah”, then the longest palindrome found will be “hah” rather than “hahah”. That’s because I used find to locate the matching end character for each start character. To obtain more accurate results, I’d need to check for every subsequent matching end character in the rest of the string, not just the next one.

Now a brute force technique starts to look attractive. Instead of being intelligent and looking for a prospective end character for each start character, let’s just try every possible substring and test it for palindromicity. Hammer time! Yes, it’s painful to write such brutal code, but sometimes it’s just easier:

(for (begin-char 0 (- (length source-string) 2))
        (for (end-char begin-char (dec (length source-string)))
                (set 'word (slice source-string begin-char 
                    (+ 2 (- end-char begin-char))))
                (when  (= (lower-case word) (reverse (lower-case word)))    
                        (push word palindromes))))
(first
        (sort palindromes
                (fn (a b) (> (length a ) (length b)))))

Finally, a teaser for you. Notice those two lower-case functions? What would happen if you factored them out and placed a single call to it in the line above?

October 1, 2010

Tree fun

Filed under: newLISP — newlisper @ 17:47
Tags:

I stumbled across a neat little piece of code this week, over at Learning Clojure. It’s a simple program that draws fractal trees. I don’t use Clojure here, so I wanted to make a newLISP version that I could draw some trees with. It looked easy enough to translate the algorithm, and, because I didn’t understand the graphical code, I knocked up a simple, and I hope, equivalent-ish version, using the newLISP-GS Java graphics environment. Here’s a screenshot of the luxuriant growth generated so far:

fractal-tree-1.png

And here’s the code.

    #!/usr/bin/env newlisp

    (load (append (env "NEWLISPDIR") "/guiserver.lsp")) 

    (constant 'PI 3.141592653589793)

    (define (deg->rad d)
     (mul d (div PI 180)))

    (define (draw-tree angle x y len branch-angle depth)
        (if (> depth 0)
            (letn ((new-x (sub x (mul len (sin (deg->rad angle)))))
                         (new-y (sub y (mul len (cos (deg->rad angle)))))
                         (new-length (fn () (mul len (add 0.75 (random 0.01 0.1)))))
                         (new-angle  (fn (op) (op angle (mul branch-angle (add 0.75 (random 0 3)))))))
                (gs:draw-line 'T (int x) (int y) (int new-x) (int new-y) '(0.1 0.9 0.1))
                ; for slow drawing: (sleep 10) (gs:update)
                (gs:update)
                (draw-tree (new-angle add) new-x new-y (new-length) branch-angle (- depth 1))
                (draw-tree (new-angle sub) new-x new-y (new-length) branch-angle (- depth 1)))))

    (define (render w h max-depth branch-angle)
        (letn ((init-length (div (min w h) 2)))
            (gs:delete-tag 'T)
            (draw-tree 0 (/ w 2) h init-length branch-angle max-depth branch-angle)))

    ; handlers

    (define (update)
         (gs:set-text 'depth-label (string {depth: } *depth))
         (gs:set-text 'branch-angle-label (string {branch-angle: } *branch-angle))
         (render *width *height *depth *branch-angle))

    (define (depth-slider-handler id value)
        (set '*depth (int value))
        (update))

    (define (branch-angle-slider-handler id value)
        (set '*branch-angle (int value))
        (update))

    ; global variables 
    (set '*width 150 '*height 150 '*depth 12 '*branch-angle 10)

    (gs:init)
    (gs:frame 'f 50 50 530 670 "tree")
    (gs:set-border-layout 'f)
    (gs:set-resizable 'f true)

    ; canvas
    (gs:canvas 'tree-canvas)
    (gs:set-size 'tree-canvas 530 450)
    (gs:set-background 'tree-canvas 0.2 0.1 0.1)
    (gs:set-stroke 1)

    ; controls
    (gs:panel 'controls )
    (gs:set-grid-layout 'controls 3 1)

    ; labels
    (gs:slider 'depth-slider 'depth-slider-handler "depth" 3 15 *depth) 
    (gs:label 'depth-label (string "depth: " *depth) "left")

    (gs:slider 'branch-angle-slider 'branch-angle-slider-handler "branch-angle" 3 18 *branch-angle) 
    (gs:label 'branch-angle-label (string "branch-angle: " *branch-angle) "left")

    ; go for it
    (gs:add-to 'controls 'depth-slider 'depth-label 'branch-angle-slider 'branch-angle-label) 
    (gs:add-to 'f 'tree-canvas "center" 'controls "south")
    (gs:set-translation 200 300)
    (gs:set-visible 'f true)
    (render *width *height *depth *branch-angle)
    (gs:listen)

I like the way that the declaration section of the draw-tree function defines two anonymous functions, new-length and new-angle, which generate new values for the next recursive call. Notice that I had to switch to floating-point arithmetic…it’s easy to forget when looking at other languages.

Fractal trees are cool, and it’s easy to add a few controls to the script so that you can design different types of tree by adjusting the parameters.

It would be simple enough to add parameters for changing line width, adjusting the colour of the branches, and so on.

fractal-tree-2.png

John’s original post was focused on Clojure performance, but that doesn’t interest me here, not being a Clojurer. I’ve never considered the newLISP graphics system to be a speed demon, but that’s probably because it always takes a couple of seconds to start up the Java server, on my machine at least. I wouldn’t expect interpreted trees to grow as fast as the compiled ones!

As for Clojure: I’m very pleased to see more members of the Lisp family, and Clojure looks impressive and exciting. However, as a non-programmer and amateur scripter I’ve tried and failed to make progress in learning it so far. But there’s always tomorrow.

July 12, 2010

Balance those parentheses

Filed under: newLISP — newlisper @ 21:50
Tags:

I called this site “unbalanced parentheses” not just because the phrase had a Lisp-oriented slant but also because it hinted at a lightweight, off-the-wall attitude to the subject, rather than a serious analytical perspective. I spent all of 10 seconds thinking of the name while staring at a registration form with a mind suddenly gone blank, and it probably wasn’t the perfect choice.

Looking at the logs recently, I noticed that some visitors actually appear to be looking for help balancing their unbalanced parentheses! I suspect, though, that they’re not newLISPers, and probably not even Lisp users (who presumably don’t need much help in that department anyway, and have their own opinions about editing code – see Greg’s post for an interesting discussion). It occurred to me to try and write something about the subject. And it’s high time I wrote something – anything – for this site.

The parenthesis is normally employed in written language to indicate something less than essential or subordinate – material that can be omitted without major damage to the sentence. In mathematics and programming, though, parentheses are more likely to be used for organizing selected items into groups or larger entities, and also for controlling the order of evaluation. Like legs, or trousers, they nearly always go around in pairs, so that a left, opening, parenthesis should always be accompanied by a matching right, closing, parenthesis following not far behind.

You know all this, of course.

The problems start when you start using lots of pairs of parentheses. Take this piece of simple mathematics:

(a * (b + ((c * d) / e) - f)

As every schoolboy and schoolgirl knows, to check parentheses in your algebra homework, you read from left to right and count out loud, going up one whenever you see an opening parenthesis, and down one for each closing one. Thus the counting for the above expression would sound like this:

"1 a * 2 b + 3 4 c * d 3 / e 2 - f 1"

and, since you started at 0 but ended on 1, you can conclude that your parentheses are unbalanced. Too many lefts, not enough rights, in this case.

Like me, you can probably do a simple example like this without counting, just by looking at it and mentally matching lefts and rights. But you don’t want to have to do this for long Lisp expressions, or expressions that span a number of lines. Your preferred text editor should help you match the parentheses – each editor offers a few great tools, but I’ve found that few offer everything you want.

A little newLISP script can do the job automatically:

(set 'source-code-list (explode (read-file (nth 2 (main-args)))))
(set 'nest -1)
(dolist (i source-code-list)
  (cond
     ((= i "(")     (inc nest)
                    (print "\n" (dup "  " nest) i))
     ((= i ")")     (dec nest)
                    (print i "\n" (dup "  " nest)))
     ((= i "\n")    (print ""))
     (true          (print i))))

which, when given a piece of source code like this:

(define (mandelbrot)
    (for (y -2 2 0.02)
        (for (x -2 2 0.02)
            (inc counter)
            (set 'z (complex x y) 'c 126 'a z)
            (while (and
                     (< (abs (:rad (set 'z (:add (:mul z z) a)))) 2)
                     (> (dec c) 32)))
            (print (char c)))
        (println)))

outputs something like this:

(define
  (mandelbrot)

  (for
    (y -2 2 0.02)

    (for
      (x -2 2 0.02)

      (inc counter)

      (set 'z
        (complex x y)
       'c 126 'a z)

      (while
        (and
          (<
            (abs
              (:rad
                (set 'z
                  (:add
                    (:mul z z)
                   a)
                )
              )
            )
           2)

          (>
            (dec c)
           32)
        )
      )

      (print
        (char c)
      )
    )

    (println)
  )
)

It’s a curious and spacious layout not to everyone’s taste, but the parentheses line up vertically, so it’s easy to see what’s going on. You can get even closer to the simple counting idea by replacing the parentheses with snazzy Unicode symbols, producing results like this (which won’t look right if your browser isn’t Unicode-friendly):

➀define
  ➁mandelbrot❷

  ➁for
    ➂y -2 2 0.02❸

    ➂for
      ➃x -2 2 0.02❹

      ➃inc counter❹

      ➃set 'z
        ➄complex x y❺ 'c 126 'a z❹

      ➃while
        ➄and

          ➅
            ➆dec c❼ 32❻❺❹

      ➃print
        ➄char c❺❹❸

    ➂println❸❷❶

using code like this:

(set 'level 0)

(define (open-list)
  (print "\n"
    (dup "  " level)
    (char (+ (int (append "0x" (string 2780)) 0 16) level)))
  (inc level))

(define (close-list)
  (dec level)
  (print (char (+ (int (append "0x" (string 2776)) 0 16) level))))

(dolist (c source-code-list)
    (cond
        ((= c "(")  (open-list))
        ((= c ")")  (close-list))
        (true       (print c))))

This technique is useful for analysing big SXML lists, too.

All this is fun, but you might be thinking that there’s a much quicker way to simply find out whether the parentheses are balanced. Why not just count them?

(define (parenthesis-count txt)
     (count '("(" ")") (explode txt)))

(if (apply = (set 'r (parenthesis-count test-code)))
             (println "good code! " r)
             (println "bad code!  " r))

which returns something like:

bad code!  (22 23)
; or
good code! (22 22)

The world, or at least your source code, might well be harmonious and balanced when there are equivalent numbers of left and right parentheses.

But… you’re too smart to be easily fooled by this glibness. You know better than I do that none of the code written thus far will operate perfectly on itself, let alone on the sort of code that crazy newLISPers can come up with. Obviously, the parentheses inside the strings are going to upset the counting. And when source code is formatted with comments and documentation markup, it’s unlikely that these simple tricks are going to give accurate results. I’ll have to analyze source code more carefully.

In my view, one of the few features that newLISP currently lacks is the ability to read its own code into its own nested list format. The powerful list referencing functions such as ref-all and set-ref are unable to operate on source code stored in strongly hierarchical lists or S-expressions, simply because there’s no obvious way to convert source to that form. I use a temporary work round, in the form of Nestor (a utility that you’ll find mentioned on these pages, although the code is unlikely to work on more recent versions on newLISP). Nestor also adds the colours to the source listings you see here, too, by converting raw code into an intermediate form that can then be converted to HTML with lots of CSS SPAN tags to colourize parenthesis-enclosed strings.

Here’s what a piece of source code looks like after Nestor’s manipulations:

((("open-paren" "(")
  ("symbol" "define")
  (("open-paren" "(")
   ("symbol" "mandelbrot")
   ("close-paren" ")"))
  (("open-paren" "(")
   ("symbol" "for")
   (("open-paren" "(")
    ("symbol" "y")
    ("integer" -2)
    ("integer" 2)
    ("float" "0.02")
    ("close-paren" ")"))
   ; ...

Now I can ask for all opening parentheses and obtain accurate references to them. I’ve put the tokenized and deformatted source in `s-list’:

(set 'op-refs (ref-all '("open-paren"  "(") s-list))
(set 'cp-refs (ref-all '("close-paren" ")") s-list))

op-refs
;-> 
((0 0)
 (0 2 0)
 (0 3 0)
 (0 3 2 0)
 (0 3 3 0)
 (0 3 3 2 0)
 (0 3 3 3 0)
 (0 3 3 4 0)
 (0 3 3 4 4 0)
 ;...

cp-refs
;->
((0 2 2)
 (0 3 2 5)
 (0 3 3 2 5)
 (0 3 3 3 3)
 (0 3 3 4 4 4)
 (0 3 3 4 11)
 ;...

The parentheses can now be counted simply and with more confidence, knowing that strings and comments are not going to give false positives:

(length op-refs)
;-> 22
(length cp-refs)
;-> 22

And now it’s possible to see each expression separately. I can work through every reference to an open parenthesis, and for each one, chop the end of the address off to get a reference to the expression as a whole:

(dolist (a-ref (sort op-refs (fn (x y) (> (length x) (length y)))))
    (output-code (s-list (chop a-ref))))

;->
(:mul z z)
(:add (:mul z z) a)
(set 'z (:add (:mul z z) a))
(:rad (set 'z (:add (:mul z z) a)))
(abs (:rad (set 'z (:add (:mul z z) a))))
; ...

This particular code sorts the expressions according to their depth (the most deeply nested first) and then displays them. It’s kind of like how newLISP evaluates expressions, I fancy. Each of these expressions could be checked for unbalanced parentheses or dubious syntax, too.

Alternatively, it’s possible to examine particular constructions occurring in code. For example, I could look at each use of `set’, to check for dodgy assignations:

(set 'setrefs (ref-all '("symbol" "set") s-list))
(dolist (setref setrefs)
   (output-code (s-list (chop setref))))

;->
(set 'z (complex x y) 'c 126 'a z)
(set 'z (:add ( :mul z z) a))

I can also chain these types of queries together, or look for one inside another:

(set 'references-to-while
    (ref '("symbol" "while") s-list match))
(set 'references-to-set
    (ref '("symbol" "set") (s-list (chop references-to-while)) match))
(output-code
    (nth (chop (append (chop references-to-while) references-to-set)) s-list))

;->
(set 'z (:add (:mul z z ) a))

However, I think I’m now barking up the wrong tree. It’s relatively easy to find out whether your parentheses are balanced – but it can be much harder to notice when one or more of them are in the wrong position. This is, perhaps, a paradoxical result of Lisp’s powerful yet simple syntax. Here’s a typical example of what I mean:

(define (test a b)
   (let ((c a)
         (d b)
         (result 0))
     (set 'result (+ c d)))
   result)

(test 2 2)

;-> 
nil

I was hoping to see 4, but I see nil instead. The parentheses are balanced, and the syntax is correct. But one of the parentheses is in the wrong place, and I doubt whether any script or tool could easily identify which one or tell you where it should be located. (Perhaps the mistake is too stupid for that!) Can you see the mistake? And can you imagine a tool or editor that could detect or prevent it happening?

March 2, 2010

newLISP tackles global warming

Filed under: newLISP — newlisper @ 21:50
Tags:

At Armagh Observatory, they’ve been keeping records of the weather since the late 18th century, and detailed records since 1843. The datasets, which are freely available to all, via their web site, are considered to be high quality and very useful, suffering from few of the problems that bedevil other sets of weather observations. There are hardly any problems with gaps in the records (which would have to be filled in by a technique that scientists call ‘sparse data infill’ and which I would call ‘making stuff up’). There are no nearby airport runways or industrial complexes that could upset the microclimates. And, because the information is published openly, there’s little chance that modifications or corrections can be applied without anyone noticing.

Armagh itself is a smallish town in Northern Ireland, less than 1000 miles south of the Arctic Circle although bathed in the warm currents of the Gulf stream. The name is familiar to most UK residents mainly as the place armagh-graph, during the 1970s and 1980s, it was possible to witness gun, bomb, and grenade attacks in the streets, symptoms of the long-running conflict between Catholic and Protestant extremists. This was the time of “the Troubles”, as they became known.

I decided to attempt some simple analysis of one of the sets of weather records that the Armagh Observatory has posted on its website, using newLISP as my magnifying glass and explorer’s machete.

I started at http://climate.arm.ac.uk/calibrated/airtemp/index.html, and downloaded the ‘corrected daily maximum temperature’ file.

(set 'source-file
    (get-url {http://badc.nerc.ac.uk/browse/badc/armagh/data/air_temperature/corrected_daily_max_temp/tccmax1844-2004.txt}))

I parsed this into separate lines.

(set 'raw-data (parse source-file "\n" 0))

Looking at the result, there are three different types of line to process. After the initial comment lines, there are either year indicators or space-separated lists of values (in degrees Centigrade) for every month for a particular day. For example, the line starting with “1″ contains the maximum recorded temperatures for the first of January, February, March, etc. up to and including the first of December. Lines starting with 29 to 31 contain a few odd-looking entries such as “-999″ that indicate that there was no such day for certain months.

("Daily maxmimum temperature at Armagh Observatory compiled and calibrated by John Butler and"
 "Ana Garcia Suarez and Alan Coughlin, Armagh Observatory, August 2003 "
 "Reference: Meteorological Data recorded at Armagh Observatory: Volume II - Daily, Monthly and"

" 1844"
"     1    1.9    6.2    6.8   16.3   19.2   18.7   17.7   16.7   24.8   15.6   10.3    6.6"
"     2   -0.2    5.1    6.4   12.0   22.7   16.7   16.7   18.8   23.4   16.1    9.2    6.0"

"    31    5.3 -999.0   13.3 -999.0   20.0 -999.0   15.3   22.3 -999.0   12.6 -999.0    6.3" ...)

To parse the raw data I used the following code:

(let ((year 0) (values '()) (monthly '()))
    (dolist (line raw-data)
      (cond
        ((and (< (length line) 6)
                        (>= (int line 0 10) 1844)
                        (<= (int line 0 10) 2004))
        ; a year start?
            (set 'year (int line 0 10)))
        ; a data row?
        ((nil? (find "[A-Za-z]" line 0))
            (set 'values (map float (parse line)))
            (when values
               (set 'day (first values))
               (push (rest values) monthly -1)
               ; after 31, start a new month
               (when (= day 31)
                   (push (cons year (transpose monthly)) yearly -1)
                   (set 'monthly nil)))))))

As usual, my code is a quick hack; ungraceful yet yielding a practical solution. It runs just once, because the resulting data are collected in the yearly list, which is then more efficiently accessed when saved in a file and reloaded as needed:

(save {/Users/me/armagh-data.lsp} 'yearly)

; later: 

(load {/Users/me/armagh-data.lsp} 'yearly)

The data list yearly holds all the data in a simple hierarchical list structure. I used the transpose function to ‘flip’ the monthly data lists as they were being processed, thus converting the unusual “first day of every month” order into what seemed a more reasonable chronological “month by month” sequence. The data is now in the following form:

(
    (1844
      (1.9 -0.2 6.7 11.1 11.7 8.9 6.1 6.9 10.3 8.2 8.9  ...
      (6.2 5.1 4.8 6.2 6.2 6.2 4.5 4.5 5.2 4.8 6.7 8.4  ...
      (6.8 6.4 8.2 6.6 3.4 5.7 7.9 11.5 10.4 9.3 8.7 6. ...
      (16.3 12 11.2 10.8 10.7 12.1 14 14.6 17.9 15.7 13 ...
      (19.2 22.7 16.8 16 19.9 17.9 17.1 16.9 15.4 14.4  ...
      (18.7 16.7 20.1 20.8 18.6 20.2 19.1 20.1 19.1 18  ...
      (17.7 16.7 19.2 19.3 16.9 16.8 19.2 19.5 18.6 18. ...
      (16.7 18.8 17.2 18.4 19.9 15.3 16.4 16.8 16.8 18. ...
      (24.8 23.4 20.9 21.6 21.4 20.9 20.4 16.9 18.2 15. ...
      (15.6 16.1 17.3 15.1 13.4 13.3 11.8 14.8 14.9 15. ...
      (10.3 9.2 9.4 8.6 9.4 8.3 8.3 9.4 10 9.4 7.3 8.9  ...
      (6.6 6 6.3 5.6 4.2 5.4 6 2.4 3.5 3.7 3.8 3.2 2.1  ...
    (1845
      (5 5.8 6.7 9.7 10.2 8.3 6.7 7.8 9.4 10.6 6.7 6.1  ...
      (2.6 6.7 9.2 5.9 7.5 4.5 1.7 4.5 7.5 7.3 5.3 9.5  ...
      (9 9 9.2 6.5 4 2.9 5.5 7.9 8.6 9.7 5.7 4.1 1.8 1. ...
      (15.4 12.9 15.9 15.4 9.6 13.4 15.1 11.2 10.1 9.6  ...
      (16 14.4 12.7 12.7 12.7 10.4 10.4 11.8 12.4 13.9  ...
      (19.9 19.7 14.6 15.8 16.3 16 16.1 16 16.6 18 24.2 ...
      (16.1 18.8 14.7 17.9 17.7 18.1 19.6 18.3 16.4 18. ...
      (17.7 17.3 17.8 18.9 19.2 18.7 18.9 19.5 16.7 18. ...
      (17.6 18.8 18.7 18.2 17.2 15.1 16.5 19.1 20.6 18. ...
      (13.6 12.9 13.4 10.4 11.3 13.4 11.8 12 12.3 12.3  ...
      (12.8 11.9 12.6 11.7 13.9 15.1 11.7 10.9 9.7 11 1 ...
      (8.2 5.7 2.6 7.9 6.6 6 7.1 9.9 8.7 10.4 6.6 5.1 6 ...
    (1846

and so on.

To access the data for a specific year, I can use ref:

(define (data-for-year y)
   (let ((year-ref (ref y yearly)))
       (rest (yearly (chop year-ref)))))

such that:

(data-for-year 1845)

returns a list of 12 lists. And to access data for a specific month, I can use this:

(define (data-for-month month-number year-data)
  ; month-number:  Jan = 1, not Jan = 0 :)
   (clean (fn (n) (< n -30)) (year-data (- month-number 1))))

So I can, for example, find data for February 1900 like this:

(data-for-month 2 (data-for-year 1900))
    ;->
(3.4 3.3 2.8 4.4 3.4 3.3 3.9 1.6 4.9 1.1 2.8 2.4 4.1 4.4 7.9 7.7 7.2 6.3 3.8 6.2 5.7 9.9 11.7 12.8 9.3 6.8 5.1 5.6)

The clean function removes those spurious -999s for February 29, February 30, and February 31.

Another simple function:

 (define (average lst)
  (div (apply add lst) (length lst)))

calculates averages, so I can ask for the average (maximum) temperature:

(average (data-for-month 2 (data-for-year 1900)))
 ;-> 5.421428571

To find years with high average maximum temperatures, I can run the following code:

(for (year 1844 2004)
     (set 'year-data (data-for-year year))
     (set 'month-average nil)
     (for (month 1 12)
          (push (average (data-for-month month year-data)) month-average))
     (push (list year (average month-average)) year-average))

(sort year-average (fn (a b) (> (last a) (last b))))

with the following results:

((1959 14.20715118)
 (1846 14.12936444)
 (1949 14.11140297)
 (1921 13.9449232)
 (1857 13.88233359)
 (1945 13.87162442)
 (2003 13.84484575)
 ...

and so on. If you google the year 1846, you’ll find references to the typhoid outbreak that followed the long hot summer, which claimed upwards of 30 000 lives. But that summer of 1959 has been described as a ‘benchmark’ summer; that special kind of warm and dry golden summer that ageing Brits like to look back on nostalgically when faced with another of the wet and miserable days that they usually see so many of in July and August.

A picture is worth a 1000 words

The plain numeric data is interesting, but I like exploring using visual techniques too.

My first attempts at visualizing the data used the HTML5 Canvas, but for this task I ended up switching back to PostScript. Both have their advantages: Canvas can do semi-transparency, but PostScript can’t; PostScript can produce PDFs which are high resolution, and easily converted to other formats. Bear with me for a moment while I define a short library of PostScript functions.

(define (ps str)
  (write-line *buffer* str))

(define (render filename)
  (let ((fname (string (env "HOME") "/Desktop/" filename)))
     (write-file fname
                (string "%!PS-Adobe-3.1"
                        "\n"
                        "%%Creator: newLISP "
                        (sys-info -2)
                        "\n"
                        *buffer*
                        "%%showpage" "\n"))
     ; on MacOS X, this runs the PostScript to PDF converter 
     ; and opens the resulting PDF file in Preview
     (exec (string "open " fname))
     ; on other platforms, the PostScript file needs to be 
     ;converted with, eg, GhostScript...
     ))

(define (ps-prolog)
  (ps (string {%} (date)))
  ; hack for big paper size
  (ps (string {
%%BeginFeature: *PageSize Default
<> setpagedevice
%%EndFeature
  }))
  ; default font
  (ps {/Helvetica-Bold findfont 12 scalefont setfont})
  (ps {1 setlinewidth}))

(define (transform-x x)
   (mul x-scale (add x x-offset)))

(define (transform-y y)
   (mul y-scale (add y y-offset)))

(define (line-to x y)
  (ps (format "%f %f lineto\n" (transform-x x) (transform-y y))))

(define (move-to x y)
  (ps (format "%f %f moveto\n" (transform-x x) (transform-y y))))

(define (begin-path)
  (ps (format {newpath })))

(define (fill-path)
  (ps (format {fill })))

(define (stroke-path)
  (ps (format {stroke })))

(define (set-fill-colour r g b)
  (ps (format {%f %f %f setrgbcolor} r g b)))

(define (set-stroke-colour r g b)
  (ps (format {%f %f %f setrgbcolor} r g b)))

(define (dot x y r)
  (begin-path)
  (ps (format "%f %f %f 0 360 arc " (transform-x x) (transform-y y) r))
  (fill-path))

(define (text x y d)
  (move-to x y)
  (ps (format {(%s) show} (string d))))

I usually just copy and paste these from script to script, adapting them when necessary to the task in hand. There’s a hack in the ps-prolog function to force a very wide paper size. This graph is going to be huge – I’m going to attempt to plot every single data point on this graph.

For colours, I define a simple colour map that goes from red to blue:

(define (generate-colour-list)
    (for (i 0 1 0.001)
        (push (list i 0 (sub 1 i)) colour-map -1)))

and a function that takes a temperature value from between -10 and 35 (they’re safely below and above the range of values in the dataset) and selects an entry in the colour map. (I shouldn’t hard-wire all these values in, should I?!)

(define (colour-for-temp t)
   (colour-map (int (mul (length colour-map) (div (add 10 t) 45)))))

Now I’m ready to initialize:

(define (init)
  (set '*buffer* {})
  (generate-colour-list)
  (ps-prolog)
  (set  'x-offset 10
        'y-offset 25
        'x-scale 4
        'y-scale 6
        'x-step 0.02
        'start-year 1844
        'end-year 2004
        'graph-width 2000
        'graph-height 400))

and – finally – I’m ready to start drawing.

(init)

; legend
(set-fill-colour 0.0 0.0 0.0)
(text 10 44 {Maximum temperatures recorded at Armagh Observatory, Degrees Centigrade})
(text 10 42 {Monthly averages shown as grey circles})
(text 10 40 {Daily maximum temperatures marked as small dots})
(text 10 38 {Records shown in red/blue})

(set-stroke-colour 0.7 0.7 0.7)

; x-axis
(begin-path)
(move-to 0 0)
(line-to graph-width 0)
(stroke-path)

; y-axis
(begin-path)
(move-to 0 -5)
(line-to 0 (- graph-height 30))
(stroke-path)

; y rules 
(for (y -5 35 5)
     (begin-path)
     (set-stroke-colour 0.9 0.9 0.9)
     (move-to 0 y)
     (line-to graph-width y)
     (stroke-path)
     (set-fill-colour 0 0 0)
     (text -7 y y))

The way I’ve set this up, I’m actually adding the graph’s ‘furniture’ by specifying vertical coordinates in degrees Celsius! It’s weird, but it kind of makes sense to say “I want this line to appear at the -5° line”.

The graph is drawn in two passes, partly because there’s no other way to get layering. In the first pass, I’m drawing grey circles to indicate the average maximum temperature for each month. In the second pass, I go back and plot each day’s value as a dot:

; first pass; draw monthly average maximums as grey dots
(set 'x-pos 0)
(for (year start-year end-year)
      (set 'year-data (data-for-year year))
      (for (month 1 12)
           (set-fill-colour 0.7 0.7 0.7)
           (set 'monthly-average (average
                (data-for-month month year-data)))
           (set 'y-pos monthly-average)
           (dot  x-pos monthly-average 3)
           (inc x-pos (mul x-step (length
                (data-for-month month year-data)))))
      (inc x-pos x-step))

; second pass; draw daily maximums
(ps {/Helvetica-Bold findfont 15 scalefont setfont})
(set 'x-pos 0 'hottest-day-so-far 25 'coldest-day-so-far 0)
(for (year start-year end-year)
    ; year markings
    (when (= 0 (% year 5))
       ; sometimes, 5 yearly ticks
       (set-stroke-colour 0.8 0.8 0.8)
       (begin-path)
       (move-to x-pos -10)
       (line-to x-pos -12)
       (stroke-path)
       ; year: black
       (set-fill-colour 0 0 0)
       (text x-pos -14 year))

    ; always, 1 year ticks
    (set-stroke-colour 0.8 0.8 0.8)
    (begin-path)
    (move-to x-pos -10)
    (line-to x-pos -11)
    (stroke-path)

    ; get data for year
    (set 'year-data (data-for-year year))
    (for (month 1 12)

         ; every day
         (dolist (daily-temp (data-for-month month year-data))
            (set 'hottest-day-so-far (max hottest-day-so-far daily-temp))
            (set 'coldest-day-so-far (min coldest-day-so-far daily-temp))
            ; lower alpha for this set
            (cond
                ((= hottest-day-so-far daily-temp)
                      (set-fill-colour 1 0 0)
                      (dot x-pos daily-temp 3))
                ((= coldest-day-so-far daily-temp)
                      (set-fill-colour 0 0 1)
                      (dot x-pos daily-temp 3))
                (true
                      (apply set-fill-colour (colour-for-temp daily-temp))
                      (dot x-pos daily-temp 0.5)
                      ))
            (inc x-pos x-step)))
    ; next year
    (inc x-pos x-step))

; finish drawing
(ps {showpage})
(render "graph.ps")

The result is a PostScript image that is easily converted to a PDF image by the Mac’s pstopdf program (or by GhostScript or Adobe Distiller). The result is too wide for publication in a typical journal, and strains at the edges of typical web pages. But it’s easy to explore the PDF image using Preview or Acrobat Reader.

The PDF is here: armagh-graph

One thing I noticed (apart from the apparent lack of much significant change during the period) is that those averages seem oddly unrepresentative of the actual daily weather. There’s no disputing the arithmetic (I hope). And yet, for example, look at the summers of 1870 or 1995: daily temperatures look hot, with many above 25°, but the monthly averages appear cooler.

Every picture tells a story

This type of graphic visualization of a dataset is designed to tell a story visually, the story being, presumably, the one told by the surrounding text. However, it’s possible for the author to make the storytelling even more effective by tweaking aspects of the presentation. For example, authors can change or move the axes, adjust the horizontal or vertical scales, introduce colours to influence perception, combine two or more datasets as if they were one, start or stop the graphing at a more ‘telling’ point in the data, or even combine two or more graphs on a single display to imply connections. For more on this fascinating subject, see the books and writings of Edward Tufte.

I don’t have a scientific story to tell, here. Rather, I’m telling a meta-story. I’ve made a number of small mistakes and inappropriate design decisions in this post (some deliberate, or at least, some I’m aware of, others are accidental). But, given the published and freely downloadable weather data, the code listed here, and – of course – the excellent free and open source newLISP language, it should be possible for anyone to retrace my steps, find my mistakes, and present a more credible or compelling view of the same dataset. I like to think that, if I was a scientist, I’d be happy for that to happen. But how many of us are either willing or able to do the same forensic work for other – and probably much more important – graphical visualizations of specific datasets? Currently the world of climate science seems to be in disarray, with allegations of data tampering, fraud, and much else besides. It appears that we should all be a lot more critical of the way information is presented to us.

Addendum

Thanks to Kazimir, this post was mentioned at Hacker News – and this site had 8000+ more page views than usual that day… :) I was pleasantly surprised that quite a few of the commenters seemed to have understood that this wasn’t literally an attempt at serious scientific analysis, despite the teasing headline. It was merely an observation or two from a non-scientific layman about the process of drawing pictures and conclusions from sets of numbers, while at the same time exploring some simple graph drawing using newLISP (rather than proprietary software packages such as Excel or Mathematica).

I’m intrigued by the heavy data compression that’s going on in much of the graphs I see on the net. A huge amount of juicy data is squeezed until dry enough to be summarized by a single wavy line. In this particular example I tried to avoid dropping any data points, but the resulting graph is, I freely admit, too hard to ‘read’. When you’re far enough away to see it, you can’t see anything worth seeing.

Kazimir also suggested that I add yearly averages. Easy enough to do, although I’m reluctant to cloud the graph with even more information:

And about this ‘average’ thing: talking about the average temperature for ‘a decade’ begs the question of why starting with a year ending with 0 is any better than starting with any other number. For example, why not compare the decade 1853-1862 with 1927-1936? And for that matter, why are we using intervals of 10 years, rather than 5 or 14?

But there’s an easy way to answer this particular conundrum:

(for (x 3 20)
   (for (y 0 (- (length yearly-averages) 1))
        (set 's (slice yearly-averages y x))
        (if (= (length s) x)
            (push (list (first s) (average (map last s))) results)))
    (println x { } (first  (sort results (fn (a b) (> (last a) (last b))))))
    (set 'results nil))

    3 (2002 13.74500614)
    4 (2001 13.60712154)
    5 (2000 13.58417734)
    6 (1999 13.56492782)
    7 (1998 13.51859569)
    8 (1997 13.53881651)
    9 (1995 13.44358625)
    10 (1995 13.47582925)
    11 (1994 13.41562109)
    12 (1993 13.33442216)
    13 (1992 13.29284527)
    14 (1846 13.32015829)
    15 (1845 13.2821639)
    16 (1989 13.29859146)
    17 (1988 13.28216522)
    18 (1987 13.24236811)
    19 (1986 13.17982611)
    20 (1985 13.13376373)

which tells us that, for this data set at least, the warmest periods between 3 and 13 years long, and between 16 and 20 years long, were all in the late 1980s and 1990s, but the warmest 14 and 15 year long periods were in the 1840s and 1850s. Change the sorting function to <, and the coldest (or, to be more precise, the least warm, on average) periods were all in the late 1890s. Tentatively, I could suggest that – at Armagh – there was a slight dip in average temperatures towards the turn of the 20th century. and that average temperatures now are equalling and surpassing the early Victorian readings by a tenth of a degree or so. But I said I was trying not to do any science here, so that’s enough of that.

January 10, 2010

Tweeting frequency

Filed under: newLISP — newlisper @ 18:00
Tags:

Not functional in the archive version.

I enjoyed the blog post by taoeffect/itistoday/greg at Tao Effect Blog; a good story, well told, and full of enthusiasm (increasingly a scarce commodity in online communities). I noticed that there was a slight increase in activity in the newLISP corner of the Twitterverse as a result: up from one or two twitters a week to 12 in one day.

I thought it would be nice to graph the tweet frequency. I added a short function to the Dragonfly Twitter module to draw a simple bar chart – see it here. However, I’m not too impressed with the effect. I’m not convinced that the choice of bar graph is correct either.

What I’m looking for is some kind of ‘bar code’ type of graph, where each vertical line represents a point in time to mark each tweet, and any increase in frequency shows up as a cluster of lines closer together. I don’t know what this type of graph is called, or how to produce it using Google’s chart API, though. If it’s not possible, I’ll think about drawing one using the HTML5 Canvas element. Help wanted!

Update: I wrote a new graph plug-in, using the HTML canvas. It looks more like the bar-code thing now. Opera doesn’t draw the text, but it’s OK in FireFox, Safari, and Chrome.

twitter.png

December 12, 2009

Seasonal greetings

Filed under: newLISP — newlisper @ 09:17
Tags:

Not functional in the archive version.

seasonal-greetings.png

Seasonal greetings from Unbalanced Parentheses Headquarters!


(for (i 0 255)
(push (list (rand 250) (rand 250) (rand 250) (max 0.3 (random 0 1))) colour-map -1))

(define (greet text x y)
(local (colour) (set ‘colour (colour-map (rand 100)))
(write-buffer page (string (format {a_context.fillStyle = ‘rgba(%d, %d, %d, %f)’; } colour) “\n”
(format {a_context.font = ‘%dpx sans-serif’; } (max 10 (rand 24))) “\n”
(format {a_context.fillText (‘%s’, %d, %d); } text x y) “\n” ))))

(set ‘page {})

(define (make-canvas nm width height)
(write-buffer page
(format (string (char 60) {canvas id=”%s” width=”%d” height=”%d”} (char 62) (char 60) {/canvas} (char 62)) nm width height)))

(seed (date-value))

(set ‘width 800 ‘height 1200)

(make-canvas “a” width height)

(write-buffer page
(string
(char 60)
{script type=”text/javascript” language=”javascript” charset=”utf-8″}
(char 62)
{var a_canvas = document.getElementById(“a”); } “\n”
{var a_context = a_canvas.getContext(“2d”); } “\n”
{a_context.font = “bold 12px sans-serif”; } “\n”
“\n” ))

(dotimes (i 200)
(greet (amb “Hyvää joulua ja onnellista uutta vuotta”
“Joyeux Noël et bonne année”
“Fröhliche Weihnachten und ein gutes neues Jahr”
“聖誕節同新年快樂”
“Nollick Ghennal as Blein Vie Noa”
“圣诞节快乐”
“Buon Natale e felice anno nuovo”
“Veselé vánoce a šťastný nový rok”
“God jol og godt nyttår”
“Linksmų Kalėdų ir laimingų Naujųjų Metų”
“Crăciun fericit şi un An Nou Fericit”
“Natale hilare et annum faustum”
“Merry Christmas”
“Zorionak eta urte berri on”
“Gleðileg jól og farsælt nýtt ár”
“(println {Happy newLISPing})”
“Seasons Greetings”
“शुभ क्रिसमस”
“Καλά Χριστούγεννα!”
“Bonan Kristnaskon kaj feliĉan novan jaron”
“Prettige kerstdagen en een Gelukkig Nieuwjaar!”
“Sretna Nova godina!”
“Rõõmsaid Jõule ja Head Uut Aastat”
“Nollaig chridheil agus bliadhna mhath ùr”
“Nadolig llawen a blwyddyn newydd dda”
“明けましておめでとうございます”
“Priecīgus Ziemassvētkus un laimīgu Jauno gadu”
“Веселого Різдва і з Новим Роком”
“С наступающим Новым Годом”)
(rand (- width 100)) (rand (+ height 200))))
(write-buffer page (string (char 60) {/script} (char 62)))
page

This post uses the HTML 5 Canvas, and should work properly on recent standards-compliant browsers such as Safari, Firefox, and Google Chrome. The Opera browser can’t handle this, which surprised me. As for Internet Explorer … I suspect you won’t see anything. Also, even if the canvas works well, there’s still the problem of all those Unicode fonts. We haven’t completely left behind the early days of the web, when every other page had a “Best viewed in browser X” banner.

The image is generated afresh each time you load the page, so the colours and positions of the various greetings are different each time. This is because the image is generated by embedded newLISP code in the HTML database which is evaluated only at browse time. The only tricky part of the operation is to make sure the code survives being translated by Markdown into HTML, then being uploaded via xmlrpc to be stored in the newLISP database ready for being processed by Dragonfly.

If you want a challenge, see how many different languages you can identify (without cheating)!

November 17, 2009

newLISP Bayesian Comment Spam Killer

Filed under: newLISP — newlisper @ 19:35
Tags:

This post describes the newLISP Bayesian Comment Spam Killer. It won’t kill Bayesian comments – although it might – but it tries to kill spam comments on blogs, using Bayesian analysis.

The story starts after the aspiring commenter clicks the Submit button on the comment form, and after the CGI script or web framework has extracted the information from the commenter’s posted submission. To makes things easy, here are some declarations that get me quickly to the same position:

(set 'comment-date "20091114T163223Z")
(set 'storyid "projectnestorpart1")
(set 'comment "Very nice site!")
(set 'commentator "svQrVW a href=\"http://asdfhh.com/")
(set 'commentator-uri "svQrVW a href=\"http://asdfhh.com/")
(set 'ip-address ("94.102.60.174"))

The first thing to do is to save this information in a file. There are many ways to do this, but I like to save data in newLISP format wherever possible, because it saves time and effort when reading it back in:

; make a suitable path name
(set 'path (string {/Users/me/blog/comments/}
    story-id "-" comment-date ".txt"))
; save as association list
(set 'comment-list
     (list
        (list 'comment-date comment-date)
        (list 'storyid storyid)
        (list 'commentator commentator)
        (list 'comment comment-text)
        (list 'ip-address ip-list)
        (list 'status "spam")
        (list 'commentator-uri commentator-uri)))
(save path 'comment-list)

A few weeks after opening a comments form to the intelligent citizens of cyberspace, there will be hundreds of little newLISP files in the directory, containing all kinds of comment. Each file looks something like this:

(set 'Comments:comment-list '(
  (Comments:comment-date "20091114T163223Z")
  (Comments:storyid "projectnestorpart1")
  (Comments:comment "svQrVW a href=\"http://asdfhh.com/ etc etc ")
  (Comments:commentator "svQrVW a href=\"http://asdfhh.com/")
  (Comments:commentator-uri "svQrVW a href=\"http://asdfhh.com/")
  (Comments:ip-address ("94.102.60.174"))
  (Comments:status "spam")
  ))

I’ve added a status tag to each one, with the default value of “spam”. That means that every comment so far is considered spam. That’s not good (although very close to the actual truth), so I must also manually alter any genuine comments and tag them as “approved”. That’s a vital task, and for a while I did it by hand, until the collection of comments was large enough for me to trust the Bayesian analysis to do it automatically.

Once I’ve got a reasonable collection of comments, I’m ready to start building the Comment Spam Killer.

(context 'Comments)

A little macro I’ve been using recently provides a modified append:

(define-macro (extend)
  (setf (eval (args 0)) (append (eval (args 0)) (eval (args 1)))))

This accepts a symbol holding a list, and a list, and adds the elements in the list at the end of the symbol’s current elements.

I want somewhere to store the analysis:

(define MAIN:spam-corpus)

This function extracts a list of the words used in all the comments:

(define (build-word-lists dir)
    (dolist (nde (directory dir {^[^.].*txt}))
       (if (directory? (append dir nde))
         ; directory, recurse
         (build-word-lists (append dir nde "/"))
         ; file: read info and make a list of its contents
         (letn  ((file            (string dir nde))
                 (comment-list    (load file))
                 (commentator     (lookup 'commentator comment-list))
                 (comment         (lookup 'comment comment-list))
                 (comment-status  (lookup 'status comment-list))
                 (commentator-ip  (lookup 'ip-address comment-list))
                 (commentator-uri (lookup 'commentator-uri comment-list))
                 (word-list '()))
              (extend word-list (parse commentator       "[^A-Za-z]" 0))
              (extend word-list (parse comment           "[^A-Za-z]" 0))
              (extend word-list (parse commentator-uri   "[^A-Za-z]" 0))
              ; sometimes ip addresses are stored in a list...
              (if (list? commentator-ip)
                  (dolist (i commentator-ip) (extend word-list (list i))))
              (cond
                  ((= comment-status "approved")
                      (extend genuine-comments (clean empty? word-list)))
                  ((= comment-status "spam")
                      (extend spam-comments (clean empty? word-list))))))))

And the two lists can be turned into a Bayesian-ready dictionary with:

(bayes-train spam-comments genuine-comments 'MAIN:spam-corpus)

The resulting spam-corpus is a context that provides two numbers for each word in the comments. Here’s an informative extract:

 ;         
 ("prepended" (0 2))
 ("prescription" (36 0))
 ("present" (0 1))
 ("presepe" (3 0))
 ("pretty" (0 1))
 ("price" (2 0))
 ("primari" (3 0))
 ("primaria" (6 0))
 ("primitive" (0 1))
 ("princessdc" (2 0))
 ("print" (0 2))
 ("printing" (4 0))
 ("println" (0 5))
 ("prior" (0 1))
 ("priors" (0 3))
 ;...

The contents of the spam context hold a list of words and the number of times that each word occurs in the first category, the spam comments, or the second category, the genuine comments. The apparent discrepancy between print and printing is easily resolved once you look at the original comments – something to do with custom T-shirt printing, whereas print was twice mentioned in a piece of newLISP code in a comment.

(define (analyse-comment file)
    (letn ((comment-list    (load file))
           (commentator     (lookup 'commentator comment-list))
           (comment         (lookup 'comment comment-list))
           (comment-status  (lookup 'status comment-list))
           (commentator-ip  (lookup 'ip-address comment-list))
           (commentator-uri (lookup 'commentator-uri comment-list))
           (word-list '())
           (spam-comments '())
           (genuine-comments '()))
        (extend word-list (parse commentator       "[^A-Za-z]" 0))
        (extend word-list (parse comment           "[^A-Za-z]" 0))
        (extend word-list (parse commentator-uri   "[^A-Za-z]" 0))
        (if (list? commentator-ip)
            (dolist (i commentator-ip) (extend word-list (list i))))
        (clean empty? word-list)
        (set 'spam-score (bayes-query word-list 'MAIN:spam-corpus))))

which returns a double-valued spam score for each comment. The two numbers are the probabilities that a comment belongs in the first or second category.

It’s now easy to decide whether to reject a comment based on the two numbers returned by this function. The example I started with manages to score (1 0), a clear indication that this apparently harmless phrase is, when considered as part of a comment as a whole, usually a comment from a spammer.

If you’re wondering where the comments form is on this site – well, there isn’t one; I decided against using up disk space storing hundreds of unwanted comments!

October 21, 2009

Syntax matters

Filed under: newLISP — newlisper @ 22:22
Tags:

My fellow newLISP blogger Kazimir observes in his poll analysis that one of the main reasons why people don’t use Lisp is the syntax:

Lisp syntax is the most important reason for majority of people who do not use Lisp.

The thing I don’t understand, though, is what exactly is wrong with Lisp syntax that makes people avoid the language. Perhaps that’s another research opportunity.

Here’s a simple scripting problem. Given a text file containing a series of random paragraphs, separated by percent signs, sort them so that they are ordered according to the first significant word in each paragraph. That is, words such as “a” and “the” shouldn’t affect the sort order.

Here’s one solution:

#!/usr/bin/perl -w

my %sort_buckets;
my %exclusions;

my $file_to_sort = '/path/to/file_to_sort';
my $sorted_file = '/path/to/sorted_file';

while () {
    chomp;
    $exclusions{$_}++;
}

open $in, "<", $file_to_sort
    or die "Can't open file: $!";

$/ = "%\n";

while () {
    my $line = $_;
    my @words = split " " => $line;
    my $sort_key = '';
    for (0..$#words) {
        if ($exclusions{lc($words[$_])}) {
            next;
        } else {
            $sort_key = join "" => map { lc($_) } @words[$_..$#words];
            last;
        }
    }
    $sort_buckets{$sort_key} = $line;
}

close $in;

open $out, ">", $sorted_file
    or die "Can't open file: $!";

foreach (sort keys %sort_buckets) {
    print $out $sort_buckets{$_};
}

close $out;

__END__
a
the
this
that
you
when
is
may
be
if
and

I’m not sure I know what’s going on there, but it’s a typical practical (and probably quick) solution from a Perler. For comparison, here is a newLISP version:

#!/usr/bin/env newlisp

(set 'common-words
  '("a" "the" "this" "that" "you" "when" "is" "may" "be" "if" "and"))

(define (remove-common text)
    (difference
       (find-all {[a-zA-Z]+} (lower-case text))
        common-words))

(define (compare text1 text2)
    (< (remove-common text1)
       (remove-common text2)))

(write-file {/path/to/sorted-file.txt}
    (join (sort
             (parse (read-file {/path/to/file_to_sort.txt}) "%")
             compare)
     "%"))

It’s not clear to me why this syntax is considered unappealing, or indeed how it could be improved. I’ve written about this before, so obviously I haven’t sought hard enough for an answer.

Can we measure some aspect of these two files? How about looking at the use of alphabetic characters and parentheses. For the Perl example:

Parentheses: (7 7)
Braces: (11 11)
Brackets: (2 2)
Alphabetic: 403
Non-alphabetic: 642
ratio 0.6277258567

And the newLISP example:

Parentheses: (17 17)
Braces: (3 3)
Brackets: (1 1)
Alphabetic: 256
Non-alphabetic: 168
ratio 1.523809524

The preponderance of the Lisp parentheses is apparent. But otherwise, the Lisp example is plainly much more alphabetic, more readable, more civilized. It’s just possible that the non-mathematical syntax, and the lack of familiar and friendly “x = x + 3″ forms (as learned by schoolchildren from 10 years and upwards) are enough to disarm and perplex the non-Lisper.

September 29, 2009

newLISP at Wikibooks: volunteers wanted

Filed under: newLISP — newlisper @ 18:09
Tags:

I’ve moved the Introduction to newLISP over to Wikibooks, and you can find it here:

http://en.wikibooks.org/wiki/Introduction_to_newLISP

You can find a link to the Print version on that page, which gives a complete view of the entire document, suitable for printing to PDF or saving as HTML.

I’ve never really looked at Wikibooks before, having spent much more time using Wikipedia. Wikibooks is an attempt to build an open-source library of text books. At present there are nearly 40,000 books available to browse or edit, and I’ve found some quite interesting books there, although there are many others that didn’t look interesting (or that needed serious editing).

Although I have a few reservations about the presentation of texts in Wikibooks (I like more control over the visual appearance of documents, and the output conversions could be better), I think the Wikibooks approach is worth trying for this particular document. Like the WIkipedia and Wikibooks themselves, it’s an experiment. I’m hoping that the contributions of others in the newLISP and Wikibooks communities will improve the document and make it more useful.

To take part, all you have to do is visit the home page and have a go at editing. One thing that really needs doing at the moment is a general check of anything where my lack of Windows- knowledge was all too obvious.

Next Page »

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.