(archive 'newLISPer)

July 31, 2008

Looking for things

Filed under: newLISP — newlisper @ 19:36

When writing newLISP, it’s often useful to find any existing code that uses a particular function or symbol name. I’m a fan of both the Unix find command and the Mac’s built-in search tool, Spotlight. The former is all-powerful, but it can be a little tricky to remember all those options. The latter is particularly good for finding text in documents – not just text files – since many developers provide a Spotlight system plug-in for searching their own document types.

Naturally, though, I also use a small newLISP utility alongside these two: its purpose is to look only in places where there are newLISP-related files. I reworked this recently based on some suggestions in the newLISP forum. Here’s the code:

#!/usr/bin/env newlisp
(context 'Look4)

(constant 'extensions '("lsp" "txt" "html"))

(set 'places
  (list (string (env "HOME") "/projects/lisp")
        (string (env "HOME") "/projects/webapps")
        (string (env "HOME") "/projects/guiapps")
        (string (env "HOME") "/projects/tech-writing")

(define (string-ends-with L str)
  (exists (fn (x) (ends-with str x)) L))

(define (look-in-file pn)
 (let (contents "" res '())
  (when (string-ends-with extensions pn)
     (set 'file (open pn "r"))
     (while (search file (string "(.*?" Term ".{10,20})") true 1)
        (push $1 res -1))
     (close file)
     (if res (list pn res)))))

(define (Look4:Look4 dir)
   (dolist (nde (directory dir "^[A-z]"))
     (set 'item (append dir "/" nde))
     (if (directory? item)
       (Look4 item)
       (when (set 'r (look-in-file item Term)) (inc 'counter) (push r results)))))

(when (>= (length (main-args)) 2)
  (set 'Term (main-args 2) 'counter 0)
  (map Look4 places)
  (when results
    (sort results)
    (dolist (r results)
        (println "\n" (r 0) "\n\t\t\t")
        (dolist (e (r 1)) (println "\t" e)))
        (println "Found " counter " occurrences of \"" Term "\"")))


You can see that I’ve hardwired the names of the directories to search in. That’s not very flexible, but it may suit your working style.

On the command line, you run the script and supply a string:

$ look4 unless

and you’ll see the results in a couple of seconds:


    (unless txt (exit))


;; a Piano instrument unless the function 'gs:mi


      (unless (and (empty?  user-


color blue "([[:space:]()]|^)(trim|true|true\?|unicode|unify|unique|unless|unpack|until|upper-

Found 201 occurrences of "unless"

Since the search is a regular expression one, it might be possible to supply a regex-friendly string, if you escape all the regex characters, but I find that I rarely use regular expressions in this type of search. I’m more likely to be asking “Didn’t I once write a binary function?”.


July 25, 2008

Character reference

Filed under: newLISP — newlisper @ 22:43

I was looking through an old (1990!) book on Unicode the other day. I’ve always been intrigued by the amazing diversity of letter forms that we’ve created over the last few thousand years. Here are just some of the many wonderful and peculiar characters you’ll find tucked away in the Unicode glyph banks:

۞ ⱁ ᚅ Ꮈ Ϣ ܍ ⫷ ⨸ ℻

You’ll also find the I Ching, Braille, alchemy, an alphabet funded by George Bernard Shaw, neo-pagan tree language, astrology, dentists, talking leaves, and much more besides.

Most of the technical aspects of Unicode escape me (supplementary planes, normalization, high surrogates, collation?) but it’s useful to know the basics of using Unicode in newLISP, particularly now that it’s the most popular encoding used on the internet.

newLISP is UTF-8 friendly by default on MacOS X, and UTF-8 versions are available for other platforms too (although I’m not sure whether the default versions are UTF-8). UTF-8 is a variable-length character encoding, which allows characters to use 1, 2, 3 or 4 bytes depending on their Unicode value.

One essential newLISP function for exploring the Unicode character set is char. This takes either a number or a character, and returns the matching character or number:

(char 63498)

(char "")

Unicode characters are usually described using hexadecimal, so it’s useful to know how to translate between hex and decimal. To convert a decimal integer to a hex string, use format:

(format "%llx" 63498)

To convert a hex string to a decimal integer, pass a hexadecimal string starting with “0x” to int :

(int (string "0x" "f80a"))

When you’re writing text, it would be good if you could easily insert these characters as you type. There are useful system tools for doing this (on MacOS X, there’s the Character Palette), but for fun I’ve added the following two functions to the Markdown converter that I use to process my writing:

(define (hex-str-to-unicode-char strng)
   (char (int (string "0x" (1 strng)) 0 16)))

(define (ustring s)
  (replace "U[0-9a-f]{4,}" s (hex-str-to-unicode-char $0) 1))

So now I can type “U” followed by 4 hexadecimal characters, and the appropriate Unicode character is inserted automatically: “U f80a” is converted to “”. (I had to insert a space after the U to prevent translation.)

You can happily use Unicode characters anywhere in newLISP code, if your text editor or console is up to the job. And if ustring is available, you can generate them easily too:

(constant (sym (ustring "U 2660")) 4  ; spades
       (sym (ustring "U 2661"))      3  ; hearts
       (sym (ustring "U 2662"))      2  ; diamonds
       (sym (ustring "U 2663"))      1  ; clubs


(! != $ $0 $1 $10 $11 $12 $13 $14 $15 $2 $3 $4 $5 $6 $7 $8 $9 $HOME $args $idx $main-args ...  zero? | ~ ♠ ♡ ♢ ♣)

(println "(> ♢ ♣)? " (> ♢ ♣))
(> ♢ ♣)? true

(println "(> ♡ ♠)? " (> ♡ ♠))
(> ♡ ♠)? nil

Using descriptive Unicode characters for your symbol names could introduce a whole new level of readability to your code!

(constant (global '☼)  MAIN)
(context '☺)

(define (☻ ✄ ☁ ⍾)
   (print ✄ ☁ ⍾))

(define (‽)
   (println {‽}))

(context ☼)
(set '℥ "what "  'ᴥ "the " 'ᴒ "dickens")
(☺:☻ ℥ ᴥ ᴒ)

Appropriately enough, that last function call returns “‽”, which is the much-needed interrobang character.

The problem now is to remember all those four digit hexadecimal numbers that identify the Unicode characters. I whipped up a quick Unicode browser in newLISP:

This just shows a page of Unicode characters at a time, and lets you move up and down through the ‘pages’. It has some problems when the character code exceeds FFFF – I don’t know why‽

This post should display correctly on most modern browsers. If you see lots of boxes rather than characters, then you are using a browser or system that doesn’t handle Unicode well. This applies to the iPhone and iPod Touch as well: it appears that Mobile Safari doesn’t like Unicode as much as its desktop version. Apple – improve Unicode support please!

July 13, 2008

googled again

Filed under: newLISP — newlisper @ 19:31

Not functional in the archive version.

In the top right corner of this page you’ll see a search box. This is a custom Google search engine, and it’s currently set to search three newLISP-related sites using Google technology: this one, the main newLISP site at newlisp.org, and the newLISP on Noodles wiki. I didn’t want to add anyone else’s site without their permission or knowledge, but it’s easy to add extra sites, so just let me know if you’d like yours added.

So far I’ve found it fairly useful, and it’s meant that I haven’t had to write my own blog search tool yet. However, it’s been doing odd things with the HTML display of the page, generating thousands of Javascript errors and doing weird stuff like (apparently) loading the page more than once. Also, it fails to find things that I think it should, so I’m not convinced it’s as good as Google engineers apparently think it is.

If anyone is at all knowledgeable about what the custom Google search engine is doing with its HTML, please help! The Google engineers can’t be bothered to help anyone use their work it seems, so I’m on my own. Perhaps I’ll have to write my own search engine after all…!

Create a free website or blog at WordPress.com.