(archive 'newLISPer)

January 30, 2006

Super strings: the basics of newLISP strings

Filed under: newLISP — newlisper @ 14:09


Strings are one of the basic building blocks of all programming languages. newLISP has many easy to use and powerful string handling tools, and you can easily add more tools to your toolbox if your particular needs aren’t met.

Here’s a quick guided tour of newLISP’s ‘string orchestra’. It’s also an extract from the book about newLISP I’m writing, so don’t be dismayed by the length of this post. Relax, this is a gentle journey rather than a steep climb!

Strings in newLISP code

You can write strings in three ways:

  • enclosed in double quotes
  • embraced by curly brackets
  • marked-up by markup codes

like this:

(set 's "this is a string")
(set 's {this is a string})
(set 's [text]this is a string[/text])

Use the first method for strings with less than 2048 characters or if you want to include escaped characters, such as \n and \t, or code numbers (46).

(set 's "this is a string \n with two lines")
(println s)
this is a string
with two lines

Double-quote characters must be escaped with backslashes, as must a backslash.

Use the second method, braces (‘curly brackets’), for strings shorter than 2048 characters when you don’t want any escaped characters to be processed:

(set 's {strings can be enclosed in "quotation marks" \n } )
(println s)
;-> strings can be enclosed in "quotation marks" \n

This is a really useful way of writing strings, because you don’t have to worry about putting backslashes before every quotation character, or backslashes before other backslashes. However, don’t include a closing brace before the end of the string (you can’t escape them – by which I mean you can’t ‘escape’ them). You can nest pairs of braces inside a braced string though.

I like to use braces, not only because they face the right way (which plain quotation marks don’t), but also because text editors can balance and match them.

The third method, using [text] and [/text] markup tags, is intended for longer text strings running over many lines, and is used automatically by newLISP when it outputs long strings. Again, you don’t have to worry about which characters you can and can’t include – you can put anything you like in, with the obvious exception of “[/text]”!

(set 'novel (read-file {my-latest-novel.txt} ))
It was a dark and "stormy" night...
The End.

If you want to know the length of a string, use length:

(length novel)
;-> 575196

A million characters or so doesn’t seem to bother newLISP too much.

Making strings

A lot of functions, such as the file reading ones, return strings or lists of strings for you. If you want to build a string from scratch, one way is to start with the char function. This converts the supplied number to the equivalent character string with that code number. (It can also reverse the operation, converting the supplied character string to its equivalent code number.)

(char 33) ;-> "!"
(char "!") ;-> 33
(char 955)
;-> Unicode lambda character
(char 0x2318)
;-> Unicode Place of Interest Sign character 2318

These last two examples are available when you’re running the Unicode-capable version of newLISP. Since Unicode is hexadecimally inclined, I’ve used the hexadecimal number, which char can convert to a string. (I haven’t attempted to get them displayed in this post!)

You can use char to build strings in other ways:

(join (map char (sequence (char "a") (char "z"))))
;-> "abcdefghijklmnopqrstuvwxyz"

This applies the char function to a list of integers generated by sequence, so producing a list of strings. This list can be converted back to a single string by join, which turns a list into a string. join can also take a separator when building strings:

(join (map char (sequence (char "a") (char "z"))) "-")
;-> "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

Similar to join is append, which works directly on strings:

(append "con" "cat" "e" "nation")
;-> "concatenation"

but even more useful is string, which turns any collection of numbers, lists, and strings into a single string.

(string ' '(sequence 1 10) { produces '} (sequence 1 10) "\n")
;-> '(sequence 1 10) produces '(1 2 3 4 5 6 7 8 9 10)

Notice that even the parentheses around the lists are included in the string.

The string function, combined with the various string markers such as braces and markup tags, is a good way to include the values of variables inside strings:

(set 'x 42)
(string {the value of } 'x { is } x)
;-> "the value of x is 42"

dup makes copies:

(dup "spam" 10)
;-> "spamspamspamspamspamspamspamspamspamspam"

And date makes a date:

;-> "Wed Jan 25 15:04:49 2006"

or you can give it a number of seconds since 1970 to convert:

(date 1230000000)
;-> "Tue Dec 23 02:40:00 2008"

String surgery

Now you’ve got your string, there are plenty of functions for operating on them. Some of these are ‘destructive’ functions – they change the string permanently, possibly losing information for ever, whereas others are ‘constructive’, producing a new string and leaving the old one unharmed.

reverse is destructive:

(set 't "a hypothetical one-dimensional subatomic particle")
(reverse t)
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"

Now t has changed for ever. However, the case-changing functions aren’t destructive, producing new strings without harming the old ones:

(set 't "a hypothetical one-dimensional subatomic particle")
(upper-case t)
(lower-case t)
;-> "a hypothetical one-dimensional subatomic particle"
(title-case t)
;-> "A hypothetical one-dimensional subatomic particle"


If you know which part of a string you want to extract, use one of the following constructive functions:

(set 't "a hypothetical one-dimensional subatomic particle")
(first t)
;-> "a"
(rest t)
;-> " hypothetical one-dimensional subatomic particle"
(last t)
;-> "e"
(nth 2 t) ; the first character has index 0
;-> "h"

There’s a useful shortcut: follow the string with a number:

(t 2)
;-> "h"

slice gives you a new slice of an existing string, counting either from the beginning (positive integers) or from the end (negative integers), for a given number of characters:

(slice t 15 13)
;-> "one-dimension"
(slice t -8 8)
;-> "particle"

There’s an easier way to do this, too, by putting the required start and length before the string in a list:

(15 13 t)
;-> "one-dimension"
(0 14 t)
;-> "a hypothetical"

If you don’t want a continuous run of characters, but want to cherry-pick some of them for a new string, use select followed by a sequence of character index numbers:

(set 't "a hypothetical one-dimensional subatomic particle")
(select t 3 5 24 48 21 10 44 8)
;-> "yosemite"
(select t (sequence 1 49 12)) ; every 12th character starting at character 1
;-> " lime"

which is good for finding secret Da Vinci-style coded messages buried in text…

If you just want to swap two characters, use the destructive function swap:

(set 'typo {teh})
(swap 2 1 typo)
;-> "the"

Changing strings

trim and chop are both constructive string-editing functions that work from the ends of the original strings inwards:

(chop t) ; defaults to last character
;-> "a hypothetical one-dimensional subatomic particl"
(chop t 9) ; chop 9 characters off
;-> "a hypothetical one-dimensional subatomic"

trim removes strings from the ends of a source string:

(set 's "      centred       ")
(trim s) ; defaults to removing spaces
;-> "centred"
(set 's "------centred------")
(trim s "-")
;-> (centred)
(set 's "------centred********")
(trim s "-" "*")
;-> "centred"

There are two approaches to changing characters inside a string. Either use the index numbers of the characters, or specify the substring you want to change.

Using index numbers

Use indexing with the nth-set and set-nth functions. nth-set and set-nth are twin character assassins – destructive functions for changing strings. They look the same, but nth-set returns just the part of the string that was destroyed, and set-nth returns the modified string. nth-set is quicker.

(set 't "a b c")
;-> "a b c"
(set-nth 0 t "xyz")
;-> "xyz b c"
(nth-set 0 t "xyz")
;-> "a"
;-> "xyz b c"

To remember which does which, consider that set-nth starts with “s” and returns the string, whereas nth-set starts with “n” and returns only the nth characters. (If this doesn’t work for you, remember them another way!)

Changing substrings

If you don’t want to – or can’t – deal with index numbers or character positions, use replace, a powerful destructive function that does all kinds of useful operations on strings. Use it in the form:

(replace old-string source-string replacement)


(set 't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" t "theor")
;-> "a theoretical one-dimensional subatomic particle"

replace is usually destructive, but if you want to use replace or another destructive function constructively, without affecting the original string, enclose the string in a string function call:

(set 't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" (string t) "theor")
;-> "a theoretical one-dimensional subatomic particle"
;-> "a hypothetical one-dimensional subatomic particle"

The use of string creates a new string that gets operated on by replace. The original string t is unaffected.

replace is one of a group of newLISP functions that accept regular expressions for defining patterns in text. You add a number at the end of the list which specifies the type of regular expression to use: 0 means basic regular expressions, 1 means case-insensitive matching, and so on.

(set 't "a hypothetical one-dimensional subatomic particle")
(replace "h.*?l" t "" 0) ; look for "h" followed by "l", but not too greedily
;-> "a  one-dimensional subatomic particle"

If you’re happy working with Perl-compatible Regular Expressions (PCRE), you’ll be happy with replace. Full details are in the newLISP reference manual.

Another interesting feature of replace is that the replacement doesn’t have to be just a simple string, it can be any newLISP expression. Each time the pattern is found, the replacement expression runs. If you want, you can use this to provide a replacement value that’s calculated dynamically, or you could do anything else you wanted to. For example, here’s a simple search and replace operation that keeps count of how many times a letter has been found, and replaces each occurrence in the original string with the total so far:

(set 't "a hypothetical one-dimensional subatomic particle")
(set 'counter 0)
(replace "o" t
        (inc 'counter)
        (println {replacing "} $0 {" number } counter)
        (string counter)) 0)
replacing "o" number 1
replacing "o" number 2
replacing "o" number 3
replacing "o" number 4
;-> "a hyp1thetical 2ne-dimensi3nal subat4mic particle"

Did you notice the $0 in there? replace updates a set of system variables $0, $1, $2 up to $15 with the matched expressions, so you can access the inner workings of the regular expression matching that’s going on while the function is running. You could do other useful too, such as build a list of matches for later processing.

Testing and comparing strings

There’s various tests that you can run on strings. newLISP’s comparison operators work by finding and comparing the code numbers of the characters until a decision can be made:

(> {Higgs Boson} {Higgs boson}) ; nil
(> {Higgs Boson} {Higgs}) ; true
(< {dollar} {euro}) ; true
(> {newLISP} {LISP}) ; true
(= {fred} {Fred}) ; nil
(= {fred} {fred}) ; true

and of course newLISP’s flexible argument handling lets you test loads of strings at the same time:

(< "a" "c" "d" "f" "h")
;-> true

To check whether two strings share common features, you can either use starts-with and ends-with, or the more general pattern matching commands regex and find.

starts-with and ends-with are simple enough:

(starts-with "newLISP" "new")
;-> true
(ends-with "newLISP" "LISP")
;-> true

regex is more interesting. It returns nil if the string doesn’t contain the pattern, or, if it does contain the pattern, it returns a list with the matched strings and substrings and the start and length of each string.

(regex "sub.*" t)
;-> ("subatomic particle" 31 18)
(regex {(s[a-z]*)(.*)(s[a-z]*)} t 0)
;-> ("sional subatomic" 24 16 "sional" 24 6 " " 30 1 "subatomic" 31 9)

and these matches are also stored in the system variables $0, $1, $2 up to $15, which you could inspect with:

(dotimes (i 16) (println ($ i)))

Instead of regex you could use find, which returns the index of the matching substring.

Strings to lists

Two functions let you convert strings to lists, ready for manipulation with newLISP’s extensive list-processing powers. The well-named explode function cracks open a string and returns a list of single characters:

(set 't "a hypothetical one-dimensional subatomic particle")
(explode t)
:-> ("a" " " "h" "y" "p" "o" "t" "h" "e" "t" "i" "c" "a" "l" " " "o"
 "n" "e" "-" "d" "i" "m" "e" "n" "s" "i" "o" "n" "a" "l" " " "s"
"u" "b" "a" "t" "o" "m" "i" "c" " " "p" "a" "r" "t" "i" "c" "l"

The explosion is easily reversed with join.

parse is a more powerful way of breaking strings up and returning the pieces. Used on its own, it will break strings apart at the spaces between them:

(parse t)
;-> ("a" "hypothetical" "one-dimensional" "subatomic" "particle")

Or you can supply a delimiting character, and parse will break the string whenever it meets the character:

(set 'pathname {/System/Library/Fonts/Courier.dfont})
(parse pathname {/})
;-> ("" "System" "Library" "Fonts" "Courier.dfont")

By the way, we could eliminate that first empty string by filtering it out. Notice the use of a lambda function for defining a quick nameless test function – we can use either fn or lambda:

(filter (fn (s) (not (empty? s))) (parse t {/}))
;-> ("System" "Library" "Fonts" "Courier.dfont")

You can also specify a delimiter string rather than a delimiter character:

(set 't {spamspamspamspamspamspamspamspam})
;-> "spamspamspamspamspamspamspamspam"
(parse t {am}) ; break on "am"
;-> ("sp" "sp" "sp" "sp" "sp" "sp" "sp" "sp" "")

Or you can specify a regular expression, remembering the options flag 0 (or whatever):

(set 't {/System/Library/Fonts/Courier.dfont})
(parse t {[/aeiou]} 0) ; strip out vowels and slashes
;-> ("" "Syst" "m" "L" "br" "ry" "F" "nts" "C" "" "r" "" "r.df" "nt")

Here’s the well-known quick and not very reliable HTML tag-stripper:

(set 'html (read-file "/Users/Sites/index.html"))
(println (parse html {<.*?>} 4))

For parsing XML strings, newLISP provides the specialized function xml-parse.

Other string functions

There are a few other functions that work with strings. search looks for a string inside a file:

(set 'f (open {/private/var/log/system.log} {read}))
(search f {kernel})
(seek f (- (seek f ) 64))
(dotimes (n 3)
    (println (read-line f)))
(close f)

This example looks in the system.log for the string “kernel”. If it’s found, newLISP rewinds the file pointer by 64 characters, then prints out three lines, showing the line in context.

There are also functions for working with base64 encoding files, and for encrypting strings.

It’s also worth mentioning the format function, which lets you insert the values of newLISP expressions into a pre-defined template string. Use %s to represent the location of a string expression inside the template. For example, suppose we want to display a list of files like this:

[File: foo.txt]
[File: bar.txt]

A suitable template looks like this:

"[File: %s]":

We give the format function this template string, followed by the expression (f) that produces a filename:

(format "[File: %s]" f)

The code to generate a directory listing using this format and the directory function looks like this:

(dolist (f (directory))
    (println (format "[File: %s]" f)))

and generates a listing like this:

[File: .hotfiles.btree]
[File: .Spotlight-V100]
[File: .Trashes]
[File: .vol]
[File: .VolumeIcon.icns]
[File: Applications]
[File: automount]
[File: bin]
[File: Cleanup At Startup]
[File: cores]
[File: Desktop Folder]
[File: dev]
[File: Developer]
[File: etc]
[File: Library]

Lastly, we must mention eval-string, a version of newLISP’s eval function for use with strings. eval-string tries to process a string as newLISP code. If it’s valid newLISP, you’ll see the result:

(set 'sum "(+ 2 2)")
;-> "(+ 2 2)"
(eval-string sum)
;-> 4

This means that you can build newLISP code strings, using all the functions we’ve described in this chapter, and then have it evaluated by newLISP. You could write programs that write programs. But that’s another chapter.

Updated for correction of minor errors and incorporation of comments.


January 25, 2006

Quickie PostScript graphics

Filed under: newLISP — newlisper @ 13:17


If you want to generate some quick graphics from a newLISP script, here’s a simple technique that involves just a small bit of PostScript code. (If you’re using Linux or Windows, you’ll need a PostScript interpreter and PDF converter somewhere on your system – I don’t know whether these are installed by default, like they are on MacOS X.)

This little script draws a bar graph in a PostScript file, given a list of height values. The open command automatically converts the PostScript file to PDF format and displays the image in Preview.

(set 'ps-prolog "%!PS-Adobe-2.0
%%Title: newLISP test
%%Creator: newLISP
0.5 setlinewidth
30 30 translate
/str {10 string} def
/Helvetica findfont 8 scalefont setfont
(set 'ps-epilog " showpage
(define (bar-chart point-list)
    " bar-chart '(y1 y2 y3) ..."
    (println ps-prolog)
    (set 'x-coord 10)  ; start x
    (set 'rect-width 15) ; width of bar
    (println (format "%d %d moveto " x-coord (first point-list)))
    (dolist (y point-list)
        ; draw bar
            (format "%d %d %d %f rectstroke " x-coord 0 rect-width y ))
        ; draw label
            (format "%d %f moveto %d str cvs show " x-coord (+ y 5)  (floor y)))
        (inc 'x-coord rect-width))
    (println "stroke")
    (println ps-epilog))
(device (open "/tmp/myfile.ps" "write"))
(bar-chart (random 0 500 30)) ; example
(close (device))
(exec "open /tmp/myfile.ps") ; convert to PDF

A useful newLISP facility here is the device command. This can direct the output of the various println functions to the named device, a file in /tmp. But, you can comment out the device calls while you’re developing the script, then, when the statements look OK, remove the comments and run the file again, printing to the file instead.

(And before you all tell me that there are much better graphing tools in newLISP’s Tcl/Tk Graphical Frontend – I just haven’t had the time to look into this. Also, I used to like hacking PostScript files, so this was fun!)

January 20, 2006

Bayesian functions in the latest newLISP

Filed under: newLISP — newlisper @ 17:02


The latest development version of newLISP contains two new Bayesian statistical functions: bayes-train analyses datasets for word frequencies, saving the results in a newLISP context, and bayes-query uses this context to generate the probability of a given piece of text belonging to one or other of the datasets. Here’s a version of the example in the manual, which estimates the probability that a given piece of text can be considered as spam – a typical use for Bayesian analysis tools.

First, we obtain two sets of data, one good, one bad: I dumped a dozen or so desirable and undesirable email messages into a pair of text files. parse converts each file into a list of symbols:

(set 'spam-data (parse (read-file "/Users/me/spam.txt") {\s+} 0))
(set 'nospam-data (parse (read-file "/Users/me/nospam.txt") {\s+} 0))

Then we use the bayes-train function to produce a context containing all the words in these lists, together with their frequency data:

(bayes-train spam-data nospam-data 'Lexicon)

Next, we can save the resulting context in a text file:

(save "lex.lsp" 'Lexicon)

so that we can load it again later:

(load "lex.lsp")

The training process takes a few seconds, so it makes sense to do it once, and then load the context when you want to analyse some text.

With the context loaded, we can use the bayes-query function to analyse a piece of text against the previously-analysed data:

(set 'q1 (bayes-query (parse "newLISP is fine open source software") Lexicon))
(set 'q2 (bayes-query (parse "Office XP is cheap at the moment") Lexicon))

Each result is a pair of numbers in a list: the first number is the probability that the phrase belongs in the first dataset (and is therefore, in this case, not spam), and the second number the probability that it belongs in the second, spam dataset. The two numbers add up to 1.

(println (format "%5f" (first q1)))
(println (format "%5f" (first q2)))

with the following results:


So the phrase “newLISP is fine open source software” scores a tiny 0.000090 out of 1, and so is not considered to be similar to the spam text I used for training, whereas the phrase “Office XP is cheap at the moment” certainly scores like some of the spam email I receive – I receive dozens of messages like this each month, so the statistics have clearly produced the right results this time.

Consult the newLISP manual for all the options and formulas.

I discovered these functions just after writing my previous entry about analysing novels. If I use two of the novels for training, I can find out the probability that a piece of text was written by one or other of the authors. Plainly newLISP is an excellent choice for this sort of activity!

January 17, 2006

Sherlock Holmes and the Case of the Picture in the Attic

Filed under: newLISP — newlisper @ 10:04

A context in newLISP is, according to the manual:

a namespace that is lexically separated from other namespaces


a stateful namespace

this definition is from John Small’s excellent 21 minute introduction to newLISP.

My present understanding of a newLISP context is that it provides a named container for symbols, and that symbols in different contexts can have the same name without clashing. So, for example, in one context I can define the symbol called meaning-of-life to have the value “42”, but, in another context, the identically-named symbol could have the value “dna-propagation”, and, in yet another, “worship-of-deity-x”.

Unless you specifically choose to create and/or switch contexts, all your newLISP work is carried out in the default context, called MAIN.

I decided to investigate contexts and namespaces with the help of that great detective, Sherlock Holmes. I downloaded Sir Arthur Conan Doyle’s “The Sign of Four” from Project Gutenberg, stripped out the introductory text, and ran the following newLISP script, which reads the text of the original novel and stores every word as a symbol, prefaced by an underscore (_) character (a convention that helps us to avoid confusing ordinary words and symbols):

(context 'Doyle)
(set 'file (open "/Users/me/doyle-sign4.txt" "read"))
(set 'word-count 0)
; remember and count each word
(while (read-line file)
    (set 'data (parse (lower-case (current-line)) "[^a-z]+" 0))
    (dolist (w data)
        (inc 'word-count)
        (and (!= w "") ; skip blanks
            (if (set 'result (eval (sym (append "_" w) Doyle ) ))
                    (set (sym (append "_" w) Doyle ) (+ result 1)) ; increase count
                    (set (sym (append "_" w) Doyle ) 1)))))
; create a word list
(dolist (w (symbols Doyle))
    (set 'wrd (name w))
    (if (and (starts-with wrd "_") (!= "_" wrd))
        (push (list (eval w) (slice wrd 1) ) words) ))
; save the context
(save "/Users/me/doyle-context.lsp" 'Doyle)

The first line creates – and switches to – a new context called “Doyle”, and all the new symbols are created in this context rather than in MAIN. Each line of the file is converted to lower-case and then split into words. If the word preceded by an underscore doesn’t already exist, it is created. But if it evaluates to something, the word has already been encountered, so the symbol’s associated count is updated instead.

Then the words and their frequencies are stored as a list in the symbol words, without the initial underscore:

(set 'words '(
    (2 "zum")
    (1 "zigzag")
    (3 "youth")
    (2 "yourselves")
    (9 "yourself")
    (7 "yours")
    (107 "your") ...

Finally, the entire context is saved in a newLISP source file. The whole script takes 2 seconds on my machine, which is pretty quick.

Loading contexts

I now have a collection of data, wrapped up in a package called “Doyle”, that captures the words used in the novel (although it has, of course, completely lost the plot). I can quickly load this saved context in another script or newLISP session using:

(load "/Users/me/doyle-context.lsp")

and newLISP will automatically recreate all the symbols in the Doyle context, switching back to the MAIN (default) context when done. It takes about 80 milliseconds here.

I can access the values of any symbol in the Doyle context by prefacing it with the name of the context and a colon, eg “Doyle:”. For example:

;-> 43795

I can find out the frequency of any word just by evaluating the name of the symbol, remembering the underscore we used as a prefix. If I’m in the MAIN context, I have to use the “Doyle” ‘prefix’ – of course, if I’m already in the Doyle context, I don’t need to.

;-> 75
(context Doyle)
;-> 12
(context MAIN)  ; switch back to MAIN context
;-> 5

Conan-Doyle famously describes Holmes’s drug-taking habits in the opening paragraphs…

Loading other contexts

It’s the work of a few seconds to load up other contexts with other novels. This lets us make lots of pointless but amusing comparisons between different novels. As before, we obtain the novel’s text and create a context to hold the words. I’ve chosen Oscar Wilde’s “The Picture of Dorian Gray”. All I need to do is change “Doyle” to “Wilde” in the above script and change the context names accordingly:

(context 'Wilde)
(set 'file (open "/Users/me/wilde-doriangray.txt" "read"))
(save "/Users/me/wilde-context.lsp" 'Wilde)
(load "/Users/me/wilde-context.lsp")

When both the Doyle and Wilde contexts have been loaded side by side (they’re happy to co-exist) we can start to ask questions like “How often do the two writers use the word ‘charming’?”:

(dolist (ctx '(Wilde Doyle))
    (println (context ctx (string "_charming") )))

Here, we’re using the dolist function to step through the two contexts, and the context function to assemble a reference to the symbol that you’d otherwise refer to as Doyle:_charming or Wilde:_charming if you were addressing them directly. As you might have guessed if you’ve read both authors, the word appears far more in Oscar’s sentences than in Arthur’s.

If we produce a pair of word lists, without frequencies, we can ask how many words appear in just one novel. The difference function can return a new list of all the symbols that appear in the first list but not the second:

; first, make the word lists, in their own contexts
(dolist (w (reverse (sort Doyle:words)))
    (push (last w) Doyle:wlist))
(dolist (w (reverse (sort Wilde:words)))
    (push (last w) Wilde:wlist))
; now compare the word lists
(println " words in Wilde but not in Doyle: " (length (difference Wilde:wlist Doyle:wlist)))
(println " words in Doyle but not in Wilde: " (length (difference Doyle:wlist Wilde:wlist)))
words in Wilde but not in Doyle: 4060
words in Doyle but not in Wilde: 2626

This suggests that, despite the more exotic nature of Sherlock Holmes’s quest for Indian treasure, Wilde manages to reach more corners of the English dictionary.

You can also use intersect to find list elements that appear in both lists. In fact there’s no end to the number of strange tests and queries you could run – there are probably university researchers who get paid for doing this stuff all day.

(define (wfreq-diff wlist)
    (dolist (w wlist)
        (set 'wf  (context 'Wilde (string "_" w)))
        (set 'df  (context 'Doyle (string "_" w)))
        (push  (list (- wf df) w wf df) r ) r))
(println (sort (wfreq-diff (intersect Wilde:wlist Doyle:wlist))))

this produces a list of words used by both writer, sorted to show which writer uses them more frequently: “wooden”, “river”, “police”, and “business” are Conan-Doyle words; “simply”, “pity”, “perfect”, and “painting” are Wilde words.

We can do some calculations, too:

(dolist (ctx '(Wilde Doyle))
    (println  ctx " " (div (context ctx "word-count") (length (context ctx "wlist"))))) ;->
Wilde 12.67362847
Doyle 8.163094129

I’m not sure what dividing the length of the novel by the number of different words used tells us – perhaps that Wilde uses a wider selection of words than Conan-Doyle?


To be honest, I haven’t learnt much about the novels that I couldn’t have learned by reading them again – but I have started to learn about using newLISP contexts. There are many other uses for contexts, such as for prototype-based object-oriented programming, whatever that is. The newLISP documentation provides many useful examples.

By the way, there’s an interesting connection between these two novels. I’ll leave you to google it.

January 11, 2006

Script analysis

Filed under: newLISP — newlisper @ 16:28

newLISP can be a very concise language. If you’re a novice scripter, like me, you might sometimes be puzzled at the brevity of the scripts written by advanced newLISP gurus. So I thought it might be interesting to look at one of my simple newLISP scripts in detail, to see whether brevity arises from the language or from the skill and experience of the newLISP ninja. Experienced newLISPers and/or MacOS X users can tell me whether I’m getting into bad habits, if they can be bothered to read all this.

Here’s a short newLISP script that ties together three components of the MacOS X Tiger operating system. It runs on selected graphics files, and I usually start it using Big Cat (which I mentioned earlier). It uses an image processing tool called sips, which is a useful utility for processing graphics files, and displays confirmation using a notification tool called Growl. Its job is simply to scale all selected images by 50%.

 1 #!/usr/bin/newlisp
 3 (set 'file-list (rest (rest (main-args))))
 5 (define (reduce-file file-name)
 6   (letn
 7    ((image-data
 8       (exec (format "sips -g pixelHeight -g pixelWidth '%s' " file-name)))
 9     (old-pixel-width
10      (integer (nth 1 (parse (nth 2 image-data)))))
11    (new-pixel-width
12      (/ old-pixel-width 2))
13    (sips-command
14      (format "sips --resampleWidth %d '%s' " new-pixel-width file-name)))
15    (and
16      (copy-file (string file-name) (string (replace "." (string file-name) "-1.")))
17      (exec sips-command)
18      (exec (format "/usr/local/bin/growlnotify %s -m \"processing file  '%s'\"" (date) file-name)))))
20 (dolist (f file-list)
21  (reduce-file f))
23 (exit)

Line 3 sets the symbol file-list so that it contains the names of the files. The main-args method knows all the arguments that were handed to this invocation of the newLISP program, but we don’t want the first one (“/usr/bin/newlisp”) or the second one (the name of this newLISP script, whatever it might be), so we take the rest of the rest of the arguments. In theory I prefer to use longer variable names such as file-list rather than f or l, but in practice I don’t always remember to!

Line 5 sees the start of a function that will do all the hard work. It will process the file in file-name (which is going to be passed to it in line 21 – you have to define the functions first, before you call them).

Line 6 starts a block (not sure if that’s the right word?) with letn. This block continues all the way to line 18, the end of the definition of the reduce-file function. letn has two parts: in the first part you declare local variables, and in the second part you evaluate expressions that can access those local variables. I usually define variables in this more expansive style:

(set 'x 1)
(set 'y 2)
(set 'z 3)

but I’m trying to learn the more advanced techniques of letn and let:

   ( (symbol value)
     (symbol value) )
   (body) )

These symbols are local rather than global, which I gather is considered good practice (but probably not important for these small scripts).

Why letn rather than let, though? The reason is that line 10 uses the value of the symbol image-data, which was defined in lines 7 and 8. If I’d used let, line 10 wouldn’t work, because the value of image-data wouldn’t have been available until the end of the symbol declaration section (line 14). letn does the right thing – it’s a ‘nested let’, allowing you to access symbol values as soon as they’re defined.

 8      (exec (format "sips -g pixelHeight -g pixelWidth '%s' " file-name)))

Line 8 runs the sips command, with the sole aim of getting the file’s current width and height. I couldn’t find a way to get and set image properties simultaneously, so I’m calling sips twice, first to get the value, then, in line 14, to divide it by two. The exec command runs the sips command, and I’m using format to create the command that gets sent to the shell. Fortunately format uses the more familiar printf() style formatting conventions, rather than Common Lisp’s language-within-a-language version, so you can probably tell what’s happening. It’s a useful way to be able to add or change command options, too. Notice the single quotes that enclose the file name placeholder %s: this is because a lot of my files have spaces in their names…

The result of the command, in image-data, is a list of three strings:

("/Users/me/untitled folder/image045.jpg" "  pixelHeight: 437" "  pixelWidth: 525")

so lines 9 and 10 do a quick bit of parsing to get the width value and convert it to an integer (eg 525):

 9     (old-pixel-width
10      (integer (nth 1 (parse (nth 2 image-data)))))
  • this isn’t very elegant, really, but I couldn’t think of a better or easier way. When decoding these, remember that all indexing starts at zero, so parse works on the third element of image-data, and integer works on the second element of the resulting list returned by parse.

Lines 11 and 12 calculate the new width and save it in new-pixel-width, using integer division (/) rather than floating point division (div):

11   (new-pixel-width
12      (/ old-pixel-width 2))

Now I can build the second invocation of the sips command (lines 13 and 14) and assign it to the symbol sips-command. The reason I do this in two stages is that I’ve found it useful to be able to check what command actually got sent, and it’s easier to insert debugging or logging statements if required. In a way, this has made the script longer, however…

13   (sips-command
14      (format "sips --resampleWidth %d '%s' " new-pixel-width file-name)))

I’m using a good text editor, so balancing those parentheses is very easy. If not, it would have been hard to remember to put the extra parenthesis at the end of line 14, which is needed because this is the end of the symbol definition section of letn.

The second, or body, part of the letn block is for the expressions that will actually do stuff with these local symbols.

15  (and
16    (copy-file (string file-name) (string (replace "." (string file-name) "-1.")))
17    (exec sips-command)

I’m using and here to evaluate a series of expressions. I don’t need to, because let accepts one or more expressions as part of its body section, but if it didn’t I could have used begin. When you’re using newLISP, you need to know when you can supply a series of expressions, and when you can’t. However, and is useful here because it evaluates each expression in turn but stops as soon as one expression fails (ie evaluates to nil or an empty list). This saves you having to test the results returned by each expression. Of course, sometimes things fail without returning nil or the empty list, but that’s another angle which I’ve forgotten to cover.

18    (exec (format "/usr/local/bin/growlnotify %s -m \"processing file  '%s'\"" (date) file-name)))))

In line 18, I’m calling Growl. Growl is a free utility for MacOS X that provides a notification service – basically pop-up windows that appear unobtrusively on the screen for a while and then fade away (or go when you click them). Growl has been around for a while now, and applications are starting to support it – there’s no system service that does anything similar. I’ve got quite a lot of Growl notifications already set up: new incoming email messages, song titles being played by iTunes, and so on, and I’ve become accustomed to this way of receiving gentle reminders and transient information. Here I’m simply calling growlnotify, the shell version, because newLISP doesn’t have a built-in interface to it. The date function provides a date stamp.

Finally, the dolist function runs my reduce-file function for every file in the list of files.

20 (dolist (f file-list)
21  (reduce-file f))

Do I need exit to finish? I’m not sure whether I do or not. It might be a habit that carries over from using newLISP in a terminal.

23 (exit)

So, is conciseness a consequence of the language or a habit of the writer? I suspect that you can get satisfaction from producing an extremely concise script, but also from producing an easy to read and well-commented script. I do know that occasionally I would really like a program that analyses other people’s newLISP programs and produces an explanation as detailed as this one!

January 8, 2006

Apply and map confusion

Filed under: newLISP — newlisper @ 20:30


Beginners get confused in ways that experts don’t understand. Like anyone learning a language, I’ve met various things in newLISP that confused me for a while. Looking back, I can’t remember exactly what I found difficult about the apply and map functions, because they look a bit easier now than they did. But you’ll probably notice my confusion in what follows.

Both apply and map let you use other functions as data. They have the same basic form:

(apply f l)
(map f l)

where f is the name of a function and l is a list. The idea is that you tell newLISP to process the list using the function you specify.

The apply function uses the elements in the list as arguments to the function, and evaluates the result:

(apply reverse '("this is a string"))
;-> "gnirts a si siht"

Here, apply looks at the elements of the list, which in this case consists of a single string, and feeds these elements to the function as arguments. The string gets reversed. Notice that we don’t have to quote the function in newLISP to prevent it getting evaluated immediately (although we could do), but we do have to quote the list, because we don’t want newLISP to evaluate it before the designated function gets to it.

The map function, on the other hand, works through the list, element by element, like a sergeant major inspecting a row of soldiers, and ‘applies’ the function to each element in turn, using the element as the argument. (Perhaps I confused myself by using the word ‘apply’ there?) However, map remembers the results of each evaluation as it goes, and returns all the results in a new list.

So map looks like a control-flow word, a bit like dolist, whereas apply seems to be way of controlling the newLISP list evaluation process from within a program.

If we adapt the previous example for map, it gives a similar result, although the result is a list rather than just a string:

(map reverse '("this is a string"))
;-> ("gnirts a si siht")

Here I’ve confused myself by using a list with only one element, which is why the result is almost identical to the apply example. The string has been extracted from the list, reversed, and then stored in another list created by map. (I think.)

Here’s a better, or at least a simpler, example:

(map reverse  '("this" "is" "a" "list" "of" "strings"))
;-> ("siht" "si" "a" "tsil" "fo" "sgnirts")

Now we can see clearly that map has applied reverse to each string element of the list in turn, and returned a list of the resulting strings.

Write one in terms of the other?

I suspect that either map or apply could be written in terms of the other. For example, this is a first attempt at a version of map defined in terms of apply:

(define (my-map f l , r) ; declare a local variable r to hold the results
    (dolist (e l)
        (push (apply f (list e)) r -1))

which seems to work, at least for simple expressions:

(println (my-map explode '("this is a string")))
;-> (("t" "h" "i" "s" " " "i" "s" " " "a" " " "s" "t" "r" "i" "n" "g"))
(println (map explode '("this is a string")))
;-> (("t" "h" "i" "s" " " "i" "s" " " "a" " " "s" "t" "r" "i" "n" "g"))

This example illustrates to me why map is so useful, too. It’s an easy way to transform all the elements of a list without the hassle of working through them element by element.

For a while, I didn’t understand why this didn’t work:

(apply reverse '(1 2 3))
;-> list or string expected in function reverse : '1

when this did:

(apply reverse '((1 2 3)))
;-> (3 2 1)

The answer, I think, is that apply sort of ‘de-lists’ the elements of the list before applying the function, so you can’t apply reverse to the first element of the list (1 2 3), because you can’t reverse the number 1. In the correct example, the first element of the list ((123)) is the list (1 2 3), which can be reversed.

More tricks

Both map and apply have more tricks up their sleeves.

map can traverse more than one list. It interleaves the elements of each list together, starting with the first element of each list, and then passes them in order as arguments to the function:

(map append '("cats " "dogs " "birds ")  '("miaow" "bark" "tweet"))
;-> ("cats miaow" "dogs bark" "birds tweet")

I like this weaving together of strands – like knitting with lists.

apply has a trick too. A third argument indicates how many of the preceding list’s arguments the function should use. So if a function takes two arguments, and you supply three or more, apply comes back and makes another attempt, using the result of the first application and another argument. It continues eating its way through the list until all the arguments are used up. To see this in action, let’s first define a function that takes two arguments and compares their lengths:

(define (longest s1 s2)
    (println  s1 " is longest so far, is " s2 " longer?") ; feedback
    (if (>= (length s1) (length s2)) ; compare lengths

Now we can apply this function to a list of strings, using the third argument to tell apply to use up the arguments two strings at a time:

(apply longest '("green" "purple" "violet" "yellow" "orange"
"black" "white" "pink" "red" "turquoise" "cerise" "scarlet"
"lilac" "grey" "blue" ) 2 )

This is the output:

green is longest so far, is purple longer?
purple is longest so far, is violet longer?
purple is longest so far, is yellow longer?
purple is longest so far, is orange longer?
purple is longest so far, is black longer?
purple is longest so far, is white longer?
purple is longest so far, is pink longer?
purple is longest so far, is red longer?
purple is longest so far, is turquoise longer?
turquoise is longest so far, is cerise longer?
turquoise is longest so far, is scarlet longer?
turquoise is longest so far, is lilac longer?
turquoise is longest so far, is grey longer?
turquoise is longest so far, is blue longer?

“purple” did quite well, until “turquoise” arrived.

(In my experiments running this on a short novel, though, I found that newLISP ran out of call stack space quickly, so I don’t recommend this approach just yet. It’s easier and quicker to scan large lists using dolist.)


This thing about passing around the names of Lisp functions as if they were bits of data is very Lisp-y, and I’m just starting to appreciate how useful it is. You can find it all over the place in newLISP. Here’s a useful function called filter:

(filter integer? '(1 2 3 4.1 5 6.21 7 8 9.12))
;-> (1 2 3 5 7 8)

It strips out any elements that don’t satisfy the integer? predicate.

January 7, 2006

The newLISP logo

Filed under: newLISP — newlisper @ 12:59


This is the newLISP logo – a dragonfly lovingly fashioned from parentheses by Brian Grayless of Sunergize.

A dragonfly is a very suitable logo for newLISP – dragonflies are ancient and extremely successful creatures that have been around for hundreds of millions of years. Lisp has been around a long time too, relatively speaking, and is still flying around today.

January 4, 2006

newLISP and the Mac Finder working together with Big Cat

Filed under: newLISP — newlisper @ 15:35


It’s useful to be able to run newLISP scripts when you’re using the MacOS Finder. The best way I’ve found to do this is a free utility called Big Cat, written by Brent Simmons, ace Mac programmer and developer of NetNewsWire. Big Cat provides you with the ability to choose and run scripts from the Finder’s contextual menu.

When you’ve installed it, put the scripts that you want to run in the ~/Library/Application Support/Big Cat Scripts/Files folder. They’ll be available in the Finder when you’ve selected one or more files.

The basic form for a Big Cat script is something like this:

;;; change files to have a+x execution permissions
(set 'file-list (rest (rest (main-args)))) ; get selected files
(dolist (i file-list)                   ; with each file in argument
    (set 'f (format "'%s'" i))      ; quote the filename to protect it
    (exec (string "chmod a+x " f)))

This tries to change the file’s execute permissions. I use this in the Finder after I’ve written a newLISP script (or a shell script) in a text editor. The first line skips the first two arguments, then uses the remaining arguments as file names for the dolist loop.

While you’re testing your Big Cat scripts, it’s useful to keep the Console application running, showing the console.log – errors and standard output are written here.

Another script I use a lot is this one:

;;; copy unix paths to clipboard
(set 'file-list (rest (rest (main-args))))
(set 'clipboard-contents nil)
(dolist (i file-list)
    (set 'f (format "'%s'" i))
    (push f clipboard-contents))
(exec (string "echo " (join clipboard-contents " " ) " | pbcopy "))

This copies the Unix paths of selected files to the system clipboard, ready for being pasted in to another application. I use this because not all applications support the Option-Drag technique for inserting the pathnames of Unix files visible in the Finder. The pbcopy command (and its pbpaste counterpart) are Unix commands for reading and writing to the pasteboard, which is the same as the clipboard. It can be a useful way of making information available to GUI applications that don’t communicate with the Unix engine room.

Create a free website or blog at WordPress.com.