(archive 'newLISPer)

March 2, 2010

newLISP tackles global warming

Filed under: newLISP — newlisper @ 21:50
Tags:

At Armagh Observatory, they’ve been keeping records of the weather since the late 18th century, and detailed records since 1843. The datasets, which are freely available to all, via their web site, are considered to be high quality and very useful, suffering from few of the problems that bedevil other sets of weather observations. There are hardly any problems with gaps in the records (which would have to be filled in by a technique that scientists call ‘sparse data infill’ and which I would call ‘making stuff up’). There are no nearby airport runways or industrial complexes that could upset the microclimates. And, because the information is published openly, there’s little chance that modifications or corrections can be applied without anyone noticing.

Armagh itself is a smallish town in Northern Ireland, less than 1000 miles south of the Arctic Circle although bathed in the warm currents of the Gulf stream. The name is familiar to most UK residents mainly as the place armagh-graph, during the 1970s and 1980s, it was possible to witness gun, bomb, and grenade attacks in the streets, symptoms of the long-running conflict between Catholic and Protestant extremists. This was the time of “the Troubles”, as they became known.

I decided to attempt some simple analysis of one of the sets of weather records that the Armagh Observatory has posted on its website, using newLISP as my magnifying glass and explorer’s machete.

I started at http://climate.arm.ac.uk/calibrated/airtemp/index.html, and downloaded the ‘corrected daily maximum temperature’ file.

(set 'source-file
    (get-url {http://badc.nerc.ac.uk/browse/badc/armagh/data/air_temperature/corrected_daily_max_temp/tccmax1844-2004.txt}))

I parsed this into separate lines.

(set 'raw-data (parse source-file "\n" 0))

Looking at the result, there are three different types of line to process. After the initial comment lines, there are either year indicators or space-separated lists of values (in degrees Centigrade) for every month for a particular day. For example, the line starting with “1” contains the maximum recorded temperatures for the first of January, February, March, etc. up to and including the first of December. Lines starting with 29 to 31 contain a few odd-looking entries such as “-999” that indicate that there was no such day for certain months.

("Daily maxmimum temperature at Armagh Observatory compiled and calibrated by John Butler and"
 "Ana Garcia Suarez and Alan Coughlin, Armagh Observatory, August 2003 "
 "Reference: Meteorological Data recorded at Armagh Observatory: Volume II - Daily, Monthly and"

" 1844"
"     1    1.9    6.2    6.8   16.3   19.2   18.7   17.7   16.7   24.8   15.6   10.3    6.6"
"     2   -0.2    5.1    6.4   12.0   22.7   16.7   16.7   18.8   23.4   16.1    9.2    6.0"

"    31    5.3 -999.0   13.3 -999.0   20.0 -999.0   15.3   22.3 -999.0   12.6 -999.0    6.3" ...)

To parse the raw data I used the following code:

(let ((year 0) (values '()) (monthly '()))
    (dolist (line raw-data)
      (cond
        ((and (< (length line) 6)
                        (>= (int line 0 10) 1844)
                        (<= (int line 0 10) 2004))
        ; a year start?
            (set 'year (int line 0 10)))
        ; a data row?
        ((nil? (find "[A-Za-z]" line 0))
            (set 'values (map float (parse line)))
            (when values
               (set 'day (first values))
               (push (rest values) monthly -1)
               ; after 31, start a new month
               (when (= day 31)
                   (push (cons year (transpose monthly)) yearly -1)
                   (set 'monthly nil)))))))

As usual, my code is a quick hack; ungraceful yet yielding a practical solution. It runs just once, because the resulting data are collected in the yearly list, which is then more efficiently accessed when saved in a file and reloaded as needed:

(save {/Users/me/armagh-data.lsp} 'yearly)

; later: 

(load {/Users/me/armagh-data.lsp} 'yearly)

The data list yearly holds all the data in a simple hierarchical list structure. I used the transpose function to ‘flip’ the monthly data lists as they were being processed, thus converting the unusual “first day of every month” order into what seemed a more reasonable chronological “month by month” sequence. The data is now in the following form:

(
    (1844
      (1.9 -0.2 6.7 11.1 11.7 8.9 6.1 6.9 10.3 8.2 8.9  ...
      (6.2 5.1 4.8 6.2 6.2 6.2 4.5 4.5 5.2 4.8 6.7 8.4  ...
      (6.8 6.4 8.2 6.6 3.4 5.7 7.9 11.5 10.4 9.3 8.7 6. ...
      (16.3 12 11.2 10.8 10.7 12.1 14 14.6 17.9 15.7 13 ...
      (19.2 22.7 16.8 16 19.9 17.9 17.1 16.9 15.4 14.4  ...
      (18.7 16.7 20.1 20.8 18.6 20.2 19.1 20.1 19.1 18  ...
      (17.7 16.7 19.2 19.3 16.9 16.8 19.2 19.5 18.6 18. ...
      (16.7 18.8 17.2 18.4 19.9 15.3 16.4 16.8 16.8 18. ...
      (24.8 23.4 20.9 21.6 21.4 20.9 20.4 16.9 18.2 15. ...
      (15.6 16.1 17.3 15.1 13.4 13.3 11.8 14.8 14.9 15. ...
      (10.3 9.2 9.4 8.6 9.4 8.3 8.3 9.4 10 9.4 7.3 8.9  ...
      (6.6 6 6.3 5.6 4.2 5.4 6 2.4 3.5 3.7 3.8 3.2 2.1  ...
    (1845
      (5 5.8 6.7 9.7 10.2 8.3 6.7 7.8 9.4 10.6 6.7 6.1  ...
      (2.6 6.7 9.2 5.9 7.5 4.5 1.7 4.5 7.5 7.3 5.3 9.5  ...
      (9 9 9.2 6.5 4 2.9 5.5 7.9 8.6 9.7 5.7 4.1 1.8 1. ...
      (15.4 12.9 15.9 15.4 9.6 13.4 15.1 11.2 10.1 9.6  ...
      (16 14.4 12.7 12.7 12.7 10.4 10.4 11.8 12.4 13.9  ...
      (19.9 19.7 14.6 15.8 16.3 16 16.1 16 16.6 18 24.2 ...
      (16.1 18.8 14.7 17.9 17.7 18.1 19.6 18.3 16.4 18. ...
      (17.7 17.3 17.8 18.9 19.2 18.7 18.9 19.5 16.7 18. ...
      (17.6 18.8 18.7 18.2 17.2 15.1 16.5 19.1 20.6 18. ...
      (13.6 12.9 13.4 10.4 11.3 13.4 11.8 12 12.3 12.3  ...
      (12.8 11.9 12.6 11.7 13.9 15.1 11.7 10.9 9.7 11 1 ...
      (8.2 5.7 2.6 7.9 6.6 6 7.1 9.9 8.7 10.4 6.6 5.1 6 ...
    (1846

and so on.

To access the data for a specific year, I can use ref:

(define (data-for-year y)
   (let ((year-ref (ref y yearly)))
       (rest (yearly (chop year-ref)))))

such that:

(data-for-year 1845)

returns a list of 12 lists. And to access data for a specific month, I can use this:

(define (data-for-month month-number year-data)
  ; month-number:  Jan = 1, not Jan = 0 :)
   (clean (fn (n) (< n -30)) (year-data (- month-number 1))))

So I can, for example, find data for February 1900 like this:

(data-for-month 2 (data-for-year 1900))
    ;->
(3.4 3.3 2.8 4.4 3.4 3.3 3.9 1.6 4.9 1.1 2.8 2.4 4.1 4.4 7.9 7.7 7.2 6.3 3.8 6.2 5.7 9.9 11.7 12.8 9.3 6.8 5.1 5.6)

The clean function removes those spurious -999s for February 29, February 30, and February 31.

Another simple function:

 (define (average lst)
  (div (apply add lst) (length lst)))

calculates averages, so I can ask for the average (maximum) temperature:

(average (data-for-month 2 (data-for-year 1900)))
 ;-> 5.421428571

To find years with high average maximum temperatures, I can run the following code:

(for (year 1844 2004)
     (set 'year-data (data-for-year year))
     (set 'month-average nil)
     (for (month 1 12)
          (push (average (data-for-month month year-data)) month-average))
     (push (list year (average month-average)) year-average))

(sort year-average (fn (a b) (> (last a) (last b))))

with the following results:

((1959 14.20715118)
 (1846 14.12936444)
 (1949 14.11140297)
 (1921 13.9449232)
 (1857 13.88233359)
 (1945 13.87162442)
 (2003 13.84484575)
 ...

and so on. If you google the year 1846, you’ll find references to the typhoid outbreak that followed the long hot summer, which claimed upwards of 30 000 lives. But that summer of 1959 has been described as a ‘benchmark’ summer; that special kind of warm and dry golden summer that ageing Brits like to look back on nostalgically when faced with another of the wet and miserable days that they usually see so many of in July and August.

A picture is worth a 1000 words

The plain numeric data is interesting, but I like exploring using visual techniques too.

My first attempts at visualizing the data used the HTML5 Canvas, but for this task I ended up switching back to PostScript. Both have their advantages: Canvas can do semi-transparency, but PostScript can’t; PostScript can produce PDFs which are high resolution, and easily converted to other formats. Bear with me for a moment while I define a short library of PostScript functions.

(define (ps str)
  (write-line *buffer* str))

(define (render filename)
  (let ((fname (string (env "HOME") "/Desktop/" filename)))
     (write-file fname
                (string "%!PS-Adobe-3.1"
                        "\n"
                        "%%Creator: newLISP "
                        (sys-info -2)
                        "\n"
                        *buffer*
                        "%%showpage" "\n"))
     ; on MacOS X, this runs the PostScript to PDF converter 
     ; and opens the resulting PDF file in Preview
     (exec (string "open " fname))
     ; on other platforms, the PostScript file needs to be 
     ;converted with, eg, GhostScript...
     ))

(define (ps-prolog)
  (ps (string {%} (date)))
  ; hack for big paper size
  (ps (string {
%%BeginFeature: *PageSize Default
<> setpagedevice
%%EndFeature
  }))
  ; default font
  (ps {/Helvetica-Bold findfont 12 scalefont setfont})
  (ps {1 setlinewidth}))

(define (transform-x x)
   (mul x-scale (add x x-offset)))

(define (transform-y y)
   (mul y-scale (add y y-offset)))

(define (line-to x y)
  (ps (format "%f %f lineto\n" (transform-x x) (transform-y y))))

(define (move-to x y)
  (ps (format "%f %f moveto\n" (transform-x x) (transform-y y))))

(define (begin-path)
  (ps (format {newpath })))

(define (fill-path)
  (ps (format {fill })))

(define (stroke-path)
  (ps (format {stroke })))

(define (set-fill-colour r g b)
  (ps (format {%f %f %f setrgbcolor} r g b)))

(define (set-stroke-colour r g b)
  (ps (format {%f %f %f setrgbcolor} r g b)))

(define (dot x y r)
  (begin-path)
  (ps (format "%f %f %f 0 360 arc " (transform-x x) (transform-y y) r))
  (fill-path))

(define (text x y d)
  (move-to x y)
  (ps (format {(%s) show} (string d))))

I usually just copy and paste these from script to script, adapting them when necessary to the task in hand. There’s a hack in the ps-prolog function to force a very wide paper size. This graph is going to be huge – I’m going to attempt to plot every single data point on this graph.

For colours, I define a simple colour map that goes from red to blue:

(define (generate-colour-list)
    (for (i 0 1 0.001)
        (push (list i 0 (sub 1 i)) colour-map -1)))

and a function that takes a temperature value from between -10 and 35 (they’re safely below and above the range of values in the dataset) and selects an entry in the colour map. (I shouldn’t hard-wire all these values in, should I?!)

(define (colour-for-temp t)
   (colour-map (int (mul (length colour-map) (div (add 10 t) 45)))))

Now I’m ready to initialize:

(define (init)
  (set '*buffer* {})
  (generate-colour-list)
  (ps-prolog)
  (set  'x-offset 10
        'y-offset 25
        'x-scale 4
        'y-scale 6
        'x-step 0.02
        'start-year 1844
        'end-year 2004
        'graph-width 2000
        'graph-height 400))

and – finally – I’m ready to start drawing.

(init)

; legend
(set-fill-colour 0.0 0.0 0.0)
(text 10 44 {Maximum temperatures recorded at Armagh Observatory, Degrees Centigrade})
(text 10 42 {Monthly averages shown as grey circles})
(text 10 40 {Daily maximum temperatures marked as small dots})
(text 10 38 {Records shown in red/blue})

(set-stroke-colour 0.7 0.7 0.7)

; x-axis
(begin-path)
(move-to 0 0)
(line-to graph-width 0)
(stroke-path)

; y-axis
(begin-path)
(move-to 0 -5)
(line-to 0 (- graph-height 30))
(stroke-path)

; y rules 
(for (y -5 35 5)
     (begin-path)
     (set-stroke-colour 0.9 0.9 0.9)
     (move-to 0 y)
     (line-to graph-width y)
     (stroke-path)
     (set-fill-colour 0 0 0)
     (text -7 y y))

The way I’ve set this up, I’m actually adding the graph’s ‘furniture’ by specifying vertical coordinates in degrees Celsius! It’s weird, but it kind of makes sense to say “I want this line to appear at the -5° line”.

The graph is drawn in two passes, partly because there’s no other way to get layering. In the first pass, I’m drawing grey circles to indicate the average maximum temperature for each month. In the second pass, I go back and plot each day’s value as a dot:

; first pass; draw monthly average maximums as grey dots
(set 'x-pos 0)
(for (year start-year end-year)
      (set 'year-data (data-for-year year))
      (for (month 1 12)
           (set-fill-colour 0.7 0.7 0.7)
           (set 'monthly-average (average
                (data-for-month month year-data)))
           (set 'y-pos monthly-average)
           (dot  x-pos monthly-average 3)
           (inc x-pos (mul x-step (length
                (data-for-month month year-data)))))
      (inc x-pos x-step))

; second pass; draw daily maximums
(ps {/Helvetica-Bold findfont 15 scalefont setfont})
(set 'x-pos 0 'hottest-day-so-far 25 'coldest-day-so-far 0)
(for (year start-year end-year)
    ; year markings
    (when (= 0 (% year 5))
       ; sometimes, 5 yearly ticks
       (set-stroke-colour 0.8 0.8 0.8)
       (begin-path)
       (move-to x-pos -10)
       (line-to x-pos -12)
       (stroke-path)
       ; year: black
       (set-fill-colour 0 0 0)
       (text x-pos -14 year))

    ; always, 1 year ticks
    (set-stroke-colour 0.8 0.8 0.8)
    (begin-path)
    (move-to x-pos -10)
    (line-to x-pos -11)
    (stroke-path)

    ; get data for year
    (set 'year-data (data-for-year year))
    (for (month 1 12)

         ; every day
         (dolist (daily-temp (data-for-month month year-data))
            (set 'hottest-day-so-far (max hottest-day-so-far daily-temp))
            (set 'coldest-day-so-far (min coldest-day-so-far daily-temp))
            ; lower alpha for this set
            (cond
                ((= hottest-day-so-far daily-temp)
                      (set-fill-colour 1 0 0)
                      (dot x-pos daily-temp 3))
                ((= coldest-day-so-far daily-temp)
                      (set-fill-colour 0 0 1)
                      (dot x-pos daily-temp 3))
                (true
                      (apply set-fill-colour (colour-for-temp daily-temp))
                      (dot x-pos daily-temp 0.5)
                      ))
            (inc x-pos x-step)))
    ; next year
    (inc x-pos x-step))

; finish drawing
(ps {showpage})
(render "graph.ps")

The result is a PostScript image that is easily converted to a PDF image by the Mac’s pstopdf program (or by GhostScript or Adobe Distiller). The result is too wide for publication in a typical journal, and strains at the edges of typical web pages. But it’s easy to explore the PDF image using Preview or Acrobat Reader.

The PDF is here: armagh-graph

One thing I noticed (apart from the apparent lack of much significant change during the period) is that those averages seem oddly unrepresentative of the actual daily weather. There’s no disputing the arithmetic (I hope). And yet, for example, look at the summers of 1870 or 1995: daily temperatures look hot, with many above 25°, but the monthly averages appear cooler.

Every picture tells a story

This type of graphic visualization of a dataset is designed to tell a story visually, the story being, presumably, the one told by the surrounding text. However, it’s possible for the author to make the storytelling even more effective by tweaking aspects of the presentation. For example, authors can change or move the axes, adjust the horizontal or vertical scales, introduce colours to influence perception, combine two or more datasets as if they were one, start or stop the graphing at a more ‘telling’ point in the data, or even combine two or more graphs on a single display to imply connections. For more on this fascinating subject, see the books and writings of Edward Tufte.

I don’t have a scientific story to tell, here. Rather, I’m telling a meta-story. I’ve made a number of small mistakes and inappropriate design decisions in this post (some deliberate, or at least, some I’m aware of, others are accidental). But, given the published and freely downloadable weather data, the code listed here, and – of course – the excellent free and open source newLISP language, it should be possible for anyone to retrace my steps, find my mistakes, and present a more credible or compelling view of the same dataset. I like to think that, if I was a scientist, I’d be happy for that to happen. But how many of us are either willing or able to do the same forensic work for other – and probably much more important – graphical visualizations of specific datasets? Currently the world of climate science seems to be in disarray, with allegations of data tampering, fraud, and much else besides. It appears that we should all be a lot more critical of the way information is presented to us.

Addendum

Thanks to Kazimir, this post was mentioned at Hacker News – and this site had 8000+ more page views than usual that day… :) I was pleasantly surprised that quite a few of the commenters seemed to have understood that this wasn’t literally an attempt at serious scientific analysis, despite the teasing headline. It was merely an observation or two from a non-scientific layman about the process of drawing pictures and conclusions from sets of numbers, while at the same time exploring some simple graph drawing using newLISP (rather than proprietary software packages such as Excel or Mathematica).

I’m intrigued by the heavy data compression that’s going on in much of the graphs I see on the net. A huge amount of juicy data is squeezed until dry enough to be summarized by a single wavy line. In this particular example I tried to avoid dropping any data points, but the resulting graph is, I freely admit, too hard to ‘read’. When you’re far enough away to see it, you can’t see anything worth seeing.

Kazimir also suggested that I add yearly averages. Easy enough to do, although I’m reluctant to cloud the graph with even more information:

And about this ‘average’ thing: talking about the average temperature for ‘a decade’ begs the question of why starting with a year ending with 0 is any better than starting with any other number. For example, why not compare the decade 1853-1862 with 1927-1936? And for that matter, why are we using intervals of 10 years, rather than 5 or 14?

But there’s an easy way to answer this particular conundrum:

(for (x 3 20)
   (for (y 0 (- (length yearly-averages) 1))
        (set 's (slice yearly-averages y x))
        (if (= (length s) x)
            (push (list (first s) (average (map last s))) results)))
    (println x { } (first  (sort results (fn (a b) (> (last a) (last b))))))
    (set 'results nil))

    3 (2002 13.74500614)
    4 (2001 13.60712154)
    5 (2000 13.58417734)
    6 (1999 13.56492782)
    7 (1998 13.51859569)
    8 (1997 13.53881651)
    9 (1995 13.44358625)
    10 (1995 13.47582925)
    11 (1994 13.41562109)
    12 (1993 13.33442216)
    13 (1992 13.29284527)
    14 (1846 13.32015829)
    15 (1845 13.2821639)
    16 (1989 13.29859146)
    17 (1988 13.28216522)
    18 (1987 13.24236811)
    19 (1986 13.17982611)
    20 (1985 13.13376373)

which tells us that, for this data set at least, the warmest periods between 3 and 13 years long, and between 16 and 20 years long, were all in the late 1980s and 1990s, but the warmest 14 and 15 year long periods were in the 1840s and 1850s. Change the sorting function to <, and the coldest (or, to be more precise, the least warm, on average) periods were all in the late 1890s. Tentatively, I could suggest that – at Armagh – there was a slight dip in average temperatures towards the turn of the 20th century. and that average temperatures now are equalling and surpassing the early Victorian readings by a tenth of a degree or so. But I said I was trying not to do any science here, so that’s enough of that.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: