(archive 'newLISPer)

July 12, 2010

Balance those parentheses

Filed under: newLISP — newlisper @ 21:50

I called this site “unbalanced parentheses” not just because the phrase had a Lisp-oriented slant but also because it hinted at a lightweight, off-the-wall attitude to the subject, rather than a serious analytical perspective. I spent all of 10 seconds thinking of the name while staring at a registration form with a mind suddenly gone blank, and it probably wasn’t the perfect choice.

Looking at the logs recently, I noticed that some visitors actually appear to be looking for help balancing their unbalanced parentheses! I suspect, though, that they’re not newLISPers, and probably not even Lisp users (who presumably don’t need much help in that department anyway, and have their own opinions about editing code – see Greg’s post for an interesting discussion). It occurred to me to try and write something about the subject. And it’s high time I wrote something – anything – for this site.

The parenthesis is normally employed in written language to indicate something less than essential or subordinate – material that can be omitted without major damage to the sentence. In mathematics and programming, though, parentheses are more likely to be used for organizing selected items into groups or larger entities, and also for controlling the order of evaluation. Like legs, or trousers, they nearly always go around in pairs, so that a left, opening, parenthesis should always be accompanied by a matching right, closing, parenthesis following not far behind.

You know all this, of course.

The problems start when you start using lots of pairs of parentheses. Take this piece of simple mathematics:

(a * (b + ((c * d) / e) - f)

As every schoolboy and schoolgirl knows, to check parentheses in your algebra homework, you read from left to right and count out loud, going up one whenever you see an opening parenthesis, and down one for each closing one. Thus the counting for the above expression would sound like this:

"1 a * 2 b + 3 4 c * d 3 / e 2 - f 1"

and, since you started at 0 but ended on 1, you can conclude that your parentheses are unbalanced. Too many lefts, not enough rights, in this case.

Like me, you can probably do a simple example like this without counting, just by looking at it and mentally matching lefts and rights. But you don’t want to have to do this for long Lisp expressions, or expressions that span a number of lines. Your preferred text editor should help you match the parentheses – each editor offers a few great tools, but I’ve found that few offer everything you want.

A little newLISP script can do the job automatically:

(set 'source-code-list (explode (read-file (nth 2 (main-args)))))
(set 'nest -1)
(dolist (i source-code-list)
     ((= i "(")     (inc nest)
                    (print "\n" (dup "  " nest) i))
     ((= i ")")     (dec nest)
                    (print i "\n" (dup "  " nest)))
     ((= i "\n")    (print ""))
     (true          (print i))))

which, when given a piece of source code like this:

(define (mandelbrot)
    (for (y -2 2 0.02)
        (for (x -2 2 0.02)
            (inc counter)
            (set 'z (complex x y) 'c 126 'a z)
            (while (and
                     (< (abs (:rad (set 'z (:add (:mul z z) a)))) 2)
                     (> (dec c) 32)))
            (print (char c)))

outputs something like this:


    (y -2 2 0.02)

      (x -2 2 0.02)

      (inc counter)

      (set 'z
        (complex x y)
       'c 126 'a z)

                (set 'z
                    (:mul z z)

            (dec c)

        (char c)


It’s a curious and spacious layout not to everyone’s taste, but the parentheses line up vertically, so it’s easy to see what’s going on. You can get even closer to the simple counting idea by replacing the parentheses with snazzy Unicode symbols, producing results like this (which won’t look right if your browser isn’t Unicode-friendly):


    ➂y -2 2 0.02❸

      ➃x -2 2 0.02❹

      ➃inc counter❹

      ➃set 'z
        ➄complex x y❺ 'c 126 'a z❹


            ➆dec c❼ 32❻❺❹

        ➄char c❺❹❸


using code like this:

(set 'level 0)

(define (open-list)
  (print "\n"
    (dup "  " level)
    (char (+ (int (append "0x" (string 2780)) 0 16) level)))
  (inc level))

(define (close-list)
  (dec level)
  (print (char (+ (int (append "0x" (string 2776)) 0 16) level))))

(dolist (c source-code-list)
        ((= c "(")  (open-list))
        ((= c ")")  (close-list))
        (true       (print c))))

This technique is useful for analysing big SXML lists, too.

All this is fun, but you might be thinking that there’s a much quicker way to simply find out whether the parentheses are balanced. Why not just count them?

(define (parenthesis-count txt)
     (count '("(" ")") (explode txt)))

(if (apply = (set 'r (parenthesis-count test-code)))
             (println "good code! " r)
             (println "bad code!  " r))

which returns something like:

bad code!  (22 23)
; or
good code! (22 22)

The world, or at least your source code, might well be harmonious and balanced when there are equivalent numbers of left and right parentheses.

But… you’re too smart to be easily fooled by this glibness. You know better than I do that none of the code written thus far will operate perfectly on itself, let alone on the sort of code that crazy newLISPers can come up with. Obviously, the parentheses inside the strings are going to upset the counting. And when source code is formatted with comments and documentation markup, it’s unlikely that these simple tricks are going to give accurate results. I’ll have to analyze source code more carefully.

In my view, one of the few features that newLISP currently lacks is the ability to read its own code into its own nested list format. The powerful list referencing functions such as ref-all and set-ref are unable to operate on source code stored in strongly hierarchical lists or S-expressions, simply because there’s no obvious way to convert source to that form. I use a temporary work round, in the form of Nestor (a utility that you’ll find mentioned on these pages, although the code is unlikely to work on more recent versions on newLISP). Nestor also adds the colours to the source listings you see here, too, by converting raw code into an intermediate form that can then be converted to HTML with lots of CSS SPAN tags to colourize parenthesis-enclosed strings.

Here’s what a piece of source code looks like after Nestor’s manipulations:

((("open-paren" "(")
  ("symbol" "define")
  (("open-paren" "(")
   ("symbol" "mandelbrot")
   ("close-paren" ")"))
  (("open-paren" "(")
   ("symbol" "for")
   (("open-paren" "(")
    ("symbol" "y")
    ("integer" -2)
    ("integer" 2)
    ("float" "0.02")
    ("close-paren" ")"))
   ; ...

Now I can ask for all opening parentheses and obtain accurate references to them. I’ve put the tokenized and deformatted source in `s-list’:

(set 'op-refs (ref-all '("open-paren"  "(") s-list))
(set 'cp-refs (ref-all '("close-paren" ")") s-list))

((0 0)
 (0 2 0)
 (0 3 0)
 (0 3 2 0)
 (0 3 3 0)
 (0 3 3 2 0)
 (0 3 3 3 0)
 (0 3 3 4 0)
 (0 3 3 4 4 0)

((0 2 2)
 (0 3 2 5)
 (0 3 3 2 5)
 (0 3 3 3 3)
 (0 3 3 4 4 4)
 (0 3 3 4 11)

The parentheses can now be counted simply and with more confidence, knowing that strings and comments are not going to give false positives:

(length op-refs)
;-> 22
(length cp-refs)
;-> 22

And now it’s possible to see each expression separately. I can work through every reference to an open parenthesis, and for each one, chop the end of the address off to get a reference to the expression as a whole:

(dolist (a-ref (sort op-refs (fn (x y) (> (length x) (length y)))))
    (output-code (s-list (chop a-ref))))

(:mul z z)
(:add (:mul z z) a)
(set 'z (:add (:mul z z) a))
(:rad (set 'z (:add (:mul z z) a)))
(abs (:rad (set 'z (:add (:mul z z) a))))
; ...

This particular code sorts the expressions according to their depth (the most deeply nested first) and then displays them. It’s kind of like how newLISP evaluates expressions, I fancy. Each of these expressions could be checked for unbalanced parentheses or dubious syntax, too.

Alternatively, it’s possible to examine particular constructions occurring in code. For example, I could look at each use of `set’, to check for dodgy assignations:

(set 'setrefs (ref-all '("symbol" "set") s-list))
(dolist (setref setrefs)
   (output-code (s-list (chop setref))))

(set 'z (complex x y) 'c 126 'a z)
(set 'z (:add ( :mul z z) a))

I can also chain these types of queries together, or look for one inside another:

(set 'references-to-while
    (ref '("symbol" "while") s-list match))
(set 'references-to-set
    (ref '("symbol" "set") (s-list (chop references-to-while)) match))
    (nth (chop (append (chop references-to-while) references-to-set)) s-list))

(set 'z (:add (:mul z z ) a))

However, I think I’m now barking up the wrong tree. It’s relatively easy to find out whether your parentheses are balanced – but it can be much harder to notice when one or more of them are in the wrong position. This is, perhaps, a paradoxical result of Lisp’s powerful yet simple syntax. Here’s a typical example of what I mean:

(define (test a b)
   (let ((c a)
         (d b)
         (result 0))
     (set 'result (+ c d)))

(test 2 2)


I was hoping to see 4, but I see nil instead. The parentheses are balanced, and the syntax is correct. But one of the parentheses is in the wrong place, and I doubt whether any script or tool could easily identify which one or tell you where it should be located. (Perhaps the mistake is too stupid for that!) Can you see the mistake? And can you imagine a tool or editor that could detect or prevent it happening?


1 Comment »

  1. I spotted the error in about two seconds. I have some things to say on the subject:

    1. Balancing parentheses becomes kind of a habit after years of Lisp/Scheme programming.
    2. You wouldn’t make mistakes like this using a template editor.
    3. You wouldn’t make mistakes like this using proper indentation.

    You have only to see that the body of this definition should contain two expressions, not one.

    I believe a good editor (Emacs, PLT) might complain that
    the temporary variables created in the let expression are never used; it is as if you had written:

    (define (test a b)

    (which is very bad style but useful for debugging haha, I don’t spend much time debugging since I “ditched ” C in favor of Scheme) (I do use a Scheme-to-C compiler) (rhizome/pi)

    Comment by ronaldo — January 27, 2012 @ 17:43 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: