(archive 'newLISPer)

March 8, 2006

Simple sums

Filed under: newLISP — newlisper @ 10:23


Someone asked the TextWrangler mailing list for suggestions on how to add up every occurrence of a range of prices in a large text file, and generate subtotals and grand totals. There were about a dozen different prices (1.99, 2.99, 3.99, 4.99, and so on) scattered through the file. This was the first suggestion, when the format of the file was still unspecified:

(set 'file (open "/Users/me/bigdata.txt" "read"))
(set 'prices '("1.99" "2.99" "3.99" "4.99" "5.99" "6.99")
     'tally (dup '0 (length prices))
     'total 0)
(while (read-line file)
  (set 'tally
    (map (fn (x y) (add x y))
      tally (count prices (parse (current-line))))))
(map (fn (price number)
       (set 'value (mul (float price) (float number)))
       (println number " at " price ", value " value)
       (inc 'total value))
     prices tally)
(println "Total " total " for " (apply add tally) " items")

which produced the following (untidy) output for a large sample file:

262530 at 1.99, value 522434.7
131270 at 2.99, value 392497.3
131270 at 3.99, value 523767.3
87510 at 4.99, value 436674.9
0 at 5.99, value 0
0 at 6.99, value 0
Total 1875374.2 for 612580 items

newLISP provides useful tools for this type of job. For example, the count function takes two lists and finds occurrences of each of the first list’s elements in the second list:

(count '("1.99" "2.99" "3.99") '("1.99" "2.99" "1.99" "1.99" "1.99")))
;-> (4 1 0)

and this can cope with input lines that have any number of occurrences of prices.

The map function can be used to add lists together (adding the results of the count function to a running total), and also produces the totals by applying a simple multiplication across the final totals list.

However, this first version of the script didn’t work. The problem is here:

(count prices (parse (current-line)))

Using parse like this isn’t going to work on every text file. I’m not sure exactly what the problem was, but it seems to be related to string constants, since the error message was:

string token too long :

I think the reason is that parse, used without options, is newLISP’s own internal parser, and it’s therefore treating the input string as newLISP code, and so presumably finding some construction that breaks the rules.

The solution is to use parse more carefully:

(parse (current-line) " ")) ; split at spaces
(parse (current-line) "[^A-z]" 0 ) ; split at non-alphabetics

choosing the technique to match the format of the input file.

In fact, the input file turned out to be XML, so the problem was easily solved!



  1. >What version of newLISP are you using?I ran into similar problems with parse using some 8.7 releases, but it has been fixed in 8.8

    Comment by sarken — March 11, 2006 @ 15:06 | Reply

  2. >I’m using 8.8. parse has been improved anyway for 8.8 (speeded up, I think, as well), but I also think that this function is quite hard to get to grips with if you’re a beginner, so it might be me…!

    Comment by newlisper — March 11, 2006 @ 21:40 | Reply

  3. >parse without the break parameter will parse as if parsing newLISP source code and conmplain about unbalanced quotes etc.

    Comment by donlucio — March 14, 2006 @ 13:22 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: