Monday, November 23, 2009

Data horribilia: the HARRY_READ_ME.txt file

With the CRU emails having been examined, it seems that some people—mainly techies—are really starting to dig into the data files. These files are, as far as we can tell, temperature data, modelling results and other such useful files, i.e. these are the files produced and worked on by the CRU teams, as well as considerable amounts of information on—and code from—the actual computer modelling programmes themselves.

In other words, these are the guts of CRU's actual computer models—the data, the code and the applications.

And they are, by all accounts, a total bloody mess.

++++ START INSERT ++++

So, come with me on a wonderful journey as the CRU team realise that not only have they lost great chunks of data but also that their application suites and algorithms are total crap; join your humble Devil and Asimov as we dive into the HARRY_READ_ME.txt (thanks to The Englishman) file and follow the trials and tribulations of Ian "Harry" Harris as he tries to recreate the published data because he has nothing else to go on!

Thrill as he "glosses over" anomalies; let your heart sing as he gets some results to within 0.5 degrees; rejoice as Harry points out that everything is undocumented and that, generally speaking, he hasn't got the first clue as to what's going on with the data!

Chuckle as one of CRU's own admits that much of the centre's data and applications are undocumented, bug-ridden, riddled with holes, missing, uncatalogued and, in short, utterly worthless.

And wonder as you realise that this was v2.10 and that, after this utter fiasco, CRU used the synthetic data and wonky algorithms to produce v3.0!

You'll laugh! You'll cry! You won't wonder why CRU never wanted to release the data! You will wonder why we are even contemplating restructuring the world economy and wasting trillions of dollars on the say-so of data this bad.

++++ END INSERT ++++

Via ever-prolific Tom Nelson, Soylent Green has picked up some geek reports on this material.
Got this from reader, Glenn. I’m out of my depth trying to read the code—and apparently so were several folks at CRU. If what he, and the techies at the links, say is true, it’s no wonder they had to spin this for 10 years—it’s all absolute bullshit.

Here’s Glenn’s take with links:
The hacked e-mails were damning, but the problems they had handling their own data at CRU are a dagger to the heart of the global warming “theory.” There is a large file of comments by a programmer at CRU called HARRY_READ_ME documenting that their data processing and modeling functions were completely out of control.

They fudged so much that NOTHING that came out of CRU can have ANY believability. If the word can be gotten out on this and understood it is the end of the global warming myth. This much bigger than the e-mails. For techie takes on this see:

Link 1

Link 2

To base a re-making of the global economy (i.e. cap-and-trade) on disastrously and hopelessly messed up data like this would be insanity.

Now, this stuff really is beyond me, but I have looked at the links given about and, from what little I can decipher, there do seem to be some issues.

The main issues being that the techies at CRU don't seem to have been able to tell what the hell was going on with the code, let alone anything else.

I shall quote user Asimov—from the top of the page of Link 1 [above]—to give you a flavour of the confusion that seems to have been rife at CRU.
There's a very disturbing "HARRY_READ_ME.txt" file in documents that APPEARS to be somebody trying to fit existing results to data and much of it is about the code that's here. I think there's something very very wrong here...

This file is 15,000 lines of comments, much of it copy/pastes of code or output by somebody (who's harry?) trying to make sense of it all....

Here's two particularly interesting bits, one from early in the file and one from way down:
7. Removed 4-line header from a couple of .glo files and loaded them into Matlab. Reshaped to 360r x 720c and plotted; looks OK for global temp (anomalies) data. Deduce that .glo files, after the header, contain data taken row-by-row starting with the Northernmost, and presented as '8E12.4'. The grid is from -180 to +180 rather than 0 to 360.

This should allow us to deduce the meaning of the co-ordinate pairs used to describe each cell in a .grim file (we know the first number is the lon or column, the second the lat or row - but which way up are the latitudes? And where do the longitudes break?

There is another problem: the values are anomalies, wheras the 'public' .grim files are actual values. So Tim's explanations (in _READ_ME.txt) are incorrect...

8. Had a hunt and found an identically-named temperature database file which did include normals lines at the start of every station. How handy - naming two different files with exactly the same name and relying on their location to differentiate! Aaarrgghh!! Re-ran anomdtb:

Uhm... So they don't even KNOW WHAT THE ****ING DATA MEANS?!?!?!?!

What dumbass names **** that way?!

Talk about cluster****. This whole file is a HUGE ASS example of it. If they deal with data this way, there's no ****ing wonder they've lost **** along they way. This is just unbelievable.

And it's not just one instance of not knowing what the hell is going on either:
The deduction so far is that the DTR-derived CLD is waaay off. The DTR looks OK, well OK in the sense that it doesn;t have prominent bands! So it's either the factors and offsets from the regression, or the way they've been applied in dtr2cld.

Well, dtr2cld is not the world's most complicated program. Wheras cloudreg is, and I immediately found a mistake! Scanning forward to 1951 was done with a loop that, for completely unfathomable reasons, didn't include months! So we read 50 grids instead of 600!!! That may have had something to do with it. I also noticed, as I was correcting THAT, that I reopened the DTR and CLD data files when I should have been opening the bloody station files!! I can only assume that I was being interrupted continually when I was writing this thing. Running with those bits fixed improved matters somewhat, though now there's a problem in that one 5-degree band (10S to 5S) has no stations! This will be due to low station counts in that region, plus removal of duplicate values.

I've only actually read about 1000 lines of this, but started skipping through it to see if it was all like that when I found that second quote above somewhere way down in the file....

CLUSTER.... ****. This isn't science, it's gradeschool for people with big data sets.

Now, I'm no climate modeller or even a professional coder, but it does seem to me that there is just a teeny weeny bit of confusion evidenced in the HARRY_READ_ME.txt file. I mean "teeny weeny" in the sense that whoever wrote this file obviously hadn't got a fucking clue what was going on—and not for want of trying.

But there's more—here's another taster, a few posts down from the one above, from Asimov's analysis of the HARRY_READ_ME.txt (I'm trying to give you a hint about what's going on: the paydirt's coming soon!).
Christ. It gets better.
So.. we don't have the coefficients files (just .eps plots of something). But what are all those monthly files? DON'T KNOW, UNDOCUMENTED. Wherever I look, there are data files, no info about what they are other than their names. And that's useless.. take the above example, the filenames in the _mon and _ann directories are identical, but the contents are not. And the only difference is that one directory is apparently 'monthly' and the other 'annual' - yet both contain monthly files.

Lets ignore the smoking gun in a legal sense, and think about the scientific method for just a moment....

I do believe this is more than one gun and there's some opaque mist coming from the "fun" end. I won't claim it's smoke, but holy ****, this is incredible.

I think that we are all starting to get an impression of what is going on here, right? Piles and piles of undocumented and inconsistent datasets and the techies in CRU utterly baffled by al of it.

But what are they actually trying to do—what is this HARRY_READ_ME.txt all about...? Yep, it's over to Asimov, a few posts down again (what can I say: I like the man's style!)...
I'm just absolutely STUNNED by this ****. **** the legal stuff. RIGHT HERE is the fraud.
These are very promising. The vast majority in both cases are within 0.5 degrees of the published data. However, there are still plenty of values more than a degree out.

He's trying to fit the results of his programs and data to PREVIOUS results.

Yup, somewhere along the way, some stuff has got lost or corrupted. Badly.

This programmer—Ian "Harry" Harris—is attempting to recreate... What? The data? The applications and algorithms that ran the original data? It seems to be the latter, because Harry carries on.
TMP has a comforting 95%+ within half a degree, though one still wonders why it isn't 100% spot on..
...

DTR fares perhaps even better, over half are spot-on, though about 7.5% are outside a half.

The percentages below is the percentage of accuracy
However, it's not such good news for precip (PRE):
...
Percentages: 13.93 25.65 11.23 49.20

21. A little experimentation goes a short way..

I tried using the 'stn' option of anomdtb.for. Not completely sure what it's supposed to do, but no matter as it didn't work:

Oh yea, don't forget. He's getting 0.5 and 1 degree differences in results... while they are predicting temperatures to a supposed accuracy of tenths...

Unless I find something MUCH WORSE than what I've already posted, I'll leave the file for your to read and stop spamming the thread with this.

Needless to say, worse is to come...
Ok, one last bit to finish that last one off:
..knowing how long it takes to debug this suite - the experiment endeth here. The option (like all the anomdtb options) is totally undocumented so we'll never know what we lost.

22. Right, time to stop pussyfooting around the niceties of Tim's labyrinthine software
suites - let's have a go at producing CRU TS 3.0! since failing to do that will be the definitive failure of the entire project..

I eagerly await more reading to find the results of that.

Oh, same here, Asimov: same here. Shall we see some more? Why not...
You'd think that where data was coming from would be important to them... You know, the whole accuracy thing..
The IDL gridding program calculates whether or not a station contributes to a cell, using.. graphics. Yes, it plots the station sphere of influence then checks for the colour white in the output. So there is no guarantee that the station number files, which are produced *independently* by anomdtb, will reflect what actually happened!!

Well I've just spent 24 hours trying to get Great Circle Distance calculations working in Fortran, with precisely no success. I've tried the simple method (as used in Tim O's geodist.pro, and the more complex and accurate method found elsewhere (wiki and other places). Neither give me results that are anything near reality. FFS.

Worked out an algorithm from scratch. It seems to give better answers than the others, so we'll go with that.
...

The problem is, really, the huge numbers of cells potentially involved in one station, particularly at high latitudes.
...

out of malicious interest, I dumped the first station's coverage to a text file and counted up how many cells it 'influenced'. The station was at 10.6E, 61.0N.

The total number of cells covered was a staggering 476!

Keep in mind how climate models work. They split the world up into cells and treat each cell as a single object... (Complexity thing, only way to get any results at all in reasonable times, even with supercomputers.)

Seriously, this really isn't good.
Bit more to add to the last, then off to bed, so I'll stop spamming. :P
Back to the gridding. I am seriously worried that our flagship gridded data product is produced by Delaunay triangulation - apparently linear as well.

As far as I can see, this renders the station counts totally meaningless.

It also means that we cannot say exactly how the gridded data is arrived at from a statistical perspective - since we're using an off-the-shelf product that isn't documented sufficiently to say that.

Why this wasn't coded up in Fortran I don't know - time pressures perhaps? Was too much effort expended on homogenisation, that there wasn't enough time to write a gridding procedure? Of course, it's too late for me to fix it too. Meh.0

"too late for me to fix it"

I guess it doesn't matter that we're talking about data that's basically determining the way the WHOLE ****ING HUMAN RACE IS GOING TO LIVE for the next few CENTURIES?

TRILLIONS OF DOLLARS.

"Meh."

Like Asimov, I too much retire to bed—it's much too late, or early. But I shall continue trawling both these threads and the data—I might try to find and post the entire HARRY_READ_ME.txt file for starters—but I would just like to add one quick comment...

I have tried to keep my language moderate throughout all of these CRU articles—the subject matter is way too important for these posts to be written off as being "too sweary"—but there really is only one response to all of this.

Fucking. Hellski.

UPDATE: there's more from the HARRY_READ_ME.txt file—posted by Asimov again.
The problem is that the synthetics are incorporated at 2.5-degrees, NO IDEA why, so saying they affect particular 0.5-degree cells is harder than it should be. So we'll just gloss over that entirely ;0)

ARGH. Just went back to check on synthetic production. Apparently - I have no memory of this at all - we're not doing observed rain days! It's all synthetic from 1990 onwards. So I'm going to need conditionals in the update program to handle that. And separate gridding before 1989. And what TF happens to station counts?

OH **** THIS. It's Sunday evening, I've worked all weekend, and just when I thought it was done I'm hitting yet another problem that's based on the hopeless state of our databases. There is no uniform data integrity, it's just a catalogue of issues that continues to grow as they're found.

Let me just repeat that final line:
There is no uniform data integrity, it's just a catalogue of issues that continues to grow as they're found.

And I will just sign off with another comment from Asimov...
My god people, have you just been skipping over everything I've posted from that HARRY_READ_ME.txt file!?!?

The data itself is a HUGE unknown, even to the researchers themselves as they attempt to decode what's gone before.

Sure, the emails indicate the possibility (and certainty in some cases) of fraud. That one file PROVES HOW UNRELIABLE THE DATA ITSELF IS!!

They "lost" the original data?? I believe it now. v2.10 was run with a ****ton of code that was undocumented, made no sense and was FULL of bugs. Is v3.0 better when half the data from 1980 on is SYNTHETIC?!? Or when it used the output from the buggy 2.10 version (which is all they had) to produce NEW data?!?!

This is a ****ing joke. The emails are FAR from the most damning thing in this. I can't wait for somebody familiar with the code to start going over it and seeing how many "So we'll just gloss over that entirely ;0)" instances exist.

What the hell has been going on over at CRU...? No wonder they didn't want to release their data...

I shall try to find some time to make a more succinct posting at some point over the next few days but, believe me, the main upshot is that none of the CRU data is worth a single shiny penny.

74 comments:

ScotsToryB said...

Bliss!

STB.

sierra said...

Amazing. In many ways the file's most alarming line is the first:

READ ME for Harry's work on the CRU TS2.1/3.0 datasets, 2006-2009!

Three years' worth of utter frustration and blind, seat-of-the-pants hackery, running no less than 15,000 lines! Incredible.

Anonymous said...

Just heard Lord Lawson on the today programme talking about this issue. Timed roughly just after the 7.30 news.

someday said...

Staggering fraud.

Ben Gardiner said...

Sent this email to my MP yesterday. Will send him an update when more of the data issues come to light. Can we all do the same - we need to be pressuring these people.

"I hope you have been following the breaking story that emails and data from the Climate Research Unit at the University of East Anglia have been leaked. If not, there is a good summary in the Telegraph*. The emails appear to show that some of the leading climate scientists have not been acting with integrity:

They seem to have been manipulating the data to get the results they want.

They can’t explain why the world hasn’t warmed this century (“it is a travesty that we can’t”)

They have discussed how to avoid (even by destroying data) having their work scrutinised through Freedom of Information requests

Admissions that they have tried to eliminate the Medieval Warm Period (presumably because if it was warmer then than it is now, then now is not exceptional)

The peer-review process may have been rigged to freeze out non-believers.

Journals seem to have been subjected to pressure not to publish articles from sceptics.


I would be interested to know whether, after reading some of the emails, you agree with me that:

There should be a public inquiry into the methods and work of the Climate Research Unit

There should be moratorium on taxes and legislation designed to restrict CO2 until any science “proving” the link between “greenhouse gases” and dangerous climate change regains credibility (if it ever does)

All further climate science undertaken at British Universities or paid for by public funds must be done in an open and transparent way. This means that all data and methodologies must be available for scrutiny, as is normal practice in any other field of science.

AndrewSouthLondon said...

As is usual in these things the "profs" handle to political relationships and the money while way down the pecking order the geek/grunts do the actual work, get dirty with the data, battle with all the anomalies and inconsistencies.

It also tells you what a piss-poor data production environment our "universities" are. Every corner is cut. There's conferences in Tahiti to go to. There's all those long holidays and sabbaticals and exchange visits.

You couldn't behave like this in the real world. Its no suprise the veneer of arrogance and prestige belies what is beneath the surface - a crock.

Forget the rivalries and internecine warfare - the data files are the real issue for CRU. The product is not fit for purpose.

Great work DK, and Asimov.No wonder the BBC (£3.2bn public funding)is silent.

astateofdenmark said...

Asimov is right, the data is more important. Those who have been beavering away at Hansen, Mann and Briffa for years have long maintained that their data is rubbish.

Hopefully, in those leaked files, is the evidence.

Interesting that the read me is from 2006-09. Did they tell the IPCC that there might be, issues, with the data?

Neal Asher said...

Thanks for that Ben Gardiner. I've used some of what you posted here and created my own letter (hope you don't mind). I've sent letters to my local MP, councilors and MEPS. Anyone else wants to do this them go here: http://www.writetothem.com

Jonathan said...

The revelations about the data being confused, and the algorithms dysfunctional are not news to scientists in the field.

I used to work for the University of Oxford (in IT, not global warming). Pretty much exactly the same criticisms as set out above were leveled at this dataset in a talk given by some leading climatologists.
After the talk, the local watermelons (green outside, red inside - our universities are full of them) were thoroughly disheartened that their AGW beliefs were based on such a shaky foundation.

Look at the reasons the Met Office gave for refusing the FOI request (available in the leaked docs). One of them was commercial interest. The Met office (part of the MOD) sells weather information around the world.
Anyone who collaborates with the Met Office has to sign a non-disclosure agreement with them, not to release the code that is being discussed.
They would lose substantial income if it was revealed that their predictions were based upon thousands of lines of badly documented Fortran that date back decades.

Plato said...

Linky to Lawson interview on Today

http://news.bbc.co.uk/today/hi/today/newsid_8373000/8373594.stm

Thortung The Terrible said...

The whole idea of configuration management and change control would appear to be an alien concept to these clowns.

I manage embedded software engineering projects for safety critical systems. As the results of this research are going to be used to direct policy to the tune of billions of pounds, I would expect in their shoes to have to comply with similar high standards, as used in the aviation, rail, automotive and similar applicatons.

I would certainly expect to see a full documentation trail from requirements through to formal test with traceability and would expect to see a documented set of procedures for how they develop code and manage change, V&V etc.

Where these idiots to be subject to a TickIT audit as we plebs in industry are, they would probably hit a hold point (critical non-conformity) within seconds.

Given this total lack of a disciplined and professional approach, I can't see how anyone could even begin to trust the results at all, let alone use them as a basis for £billions of taxation policy.

Man in a Shed said...

I have to say none of this surprises me at all.

Its well known that academic code sucks from the software engineering point of view.

Thortung The Terrible said...

I particularly like this section from the file

"Looking in the files I see that Bulletin 58009 is 'BYRON BAY (CAPE BYRON LIGHTHOUSE)',
and 58216 is 'BYRON BAY (CAPE BYRON AWS)'. But the database stubs that have been
entered have not been intelligently named, just truncated - so I have no way of
knowing which is which! CRU NEEDS A DATA MANAGER. In this case I had to assume that
the updates were processed in .au code order, so 1-1 and 2-2. "

Dodgy Geezer said...

The implications of this are, of course, severe, but I can easily see how it would have happened.

If you are a scientist doing research, you do not cut code in a proper software engineering environment. You doodle on the back of an envelope and see if your hypothesis might run. After a while you have quite a lot of little personal data sets you have been experimenting with.

Then someone calls you and says that they are collecting data about ice formation or some such. You send them your databases - uncommented and undocumented. The data collection builds up as everyone adds a bit, and by then papers are being written claiming that the world is in grave danger.

Once the roller coaster ride of international conferences and foreign heads of government calling you to congratulate you and plan legislation starts, do you say "Hang on a second, all this data is in a bit of a mess, we should really go back to the beginning and get it professionally organised"? Of course you don't. Besides, you are now getting more and more data coming in, and so long as it all shows temperature rises, it would be just wasing time and money to smarten up your old data. You expect that new data will show increasing and dangerous rises - that is the data to look at.

Then, after 10 years, it turns out that the new data does NOT show dangerous rises. And you are forced to revisit your old data to try to coax a bit more warming out of it. But by now that data has been left in a cupboard, documents lost, generally unsuported...

It's not surprising that we are in this mess. The interesting question is how can CRU (and the world!) get out of it without loosing face? Sceptics pointing out what is wrong will only make Warmists more determined than ever to cover up and defend to the last ditch...

Francis Turner said...

The one thing I get from "HARRY_READ_ME" is that when Jones said that he didn't have the original HADCRU data he was probably correct. He really doesn't have it or if he does he doesn't know where it is and how he munged it into the processed for.

I've stuck all the code up on my webserver as I intend to join in the code review as time permits - the critical directory is http://di2.nu/foia/cru-code/

Ed P said...

All this was predictable: if you start from a desired (political) end-point, then making the data "fit" becomes a necessary fudge along the way. It's not science, it's not ethical and it's now of zero value.

microdave said...

DK - your Link 1 "For techie takes on this see:" isn't working. I'm getting a Blogger "Page Not Found".

Maybe it's already been commented on (in which case I apologise) but the Daily Mail have been running with this today.

Also Mrs Dale has linked to you.

Mr Rob said...

Just to say keep on doing a great job of blogging here - facts and informed analysis from those qualified to do so, only a modicum of swearing, and no leaping to conclusions. Great stuff so far.

Francis Turner said...

BTW I've uploaded the readme in sections starting here - http://di2.nu/foia/HARRY_READ_ME-0.html - the last section (35) needs further splitting

bella gerens said...

microdave -

There are appear to be two URLs pasted into 'Link 1' by accident. Open it in your browser and erase everything up to 'http://www.tickerforum.org/...' etc.

The correct URL is:

http://www.tickerforum.org/cgi-ticker/akcs-www?post=118625&page=13

Jeff Wood said...

Excellent comments.

When I have walked the dog, I intend to follow Ben Gardiner's advice, as extended by Neil Asher. We cannot simply stand and snigger, we must try to generate a political reaction.

If the end result is the closure of Hadley CRU, and the taming of the Met Office, so much the better.

microdave said...

bella gerens - thanks for the explanation, I tried your suggestion and it works. The link in the box is faulty, but I've found the same one a few lines down works O.K.

The following is worth downloading - http://joannenova.com.au/globalwarming/skeptics-handbook-ii/the_skeptics_handbook_II-sml.pdf

I found it on the ICECAP site linked from OldRightie's blog. Although it is dated this year, it must have been produced before the "leak", and it's interesting to see mention of Phil Jones attempts to block publication of his data, data which is now available!

knirirr said...

If you are a scientist doing research .... After a while you have quite a lot of little personal data sets you have been experimenting with..... You send them your databases - uncommented and undocumented.

Absolutely. It is often the case that scientific data aren't well curated and lack appropriate metadata. This occurs in other fields as well. For example, molecular biologists will often have lots of important samples hanging around at the back of a -70º freezer with the labels smudged or fallen off and the only key to what is where being a hand-written notebook amongst a pile of papers on their desk. Attempting to rectify this problem is often not as exciting as doing more research and funding to employ someone to do it is therefore harder to come by. Not that this is necessarily an excuse, of course.

The Great Simpleton said...

I've written to my MP using Ben's template- thanks Ben. I did add a section pointing out is not just emails but also the data that is a problem. As my MP is David Lidington who "voted very strongly for laws to stop climate change" I don't expect mush support!

Just a thought, but this Harry is fairly hacked off (pun intended), could he be the whistle blower / leaker?

Keep up the good work, DK.

Francis Turner said...

Here's my first bit of code analysis - http://www.di2.nu/200911/23a.htm - based not just on the Harry_read_me but on the files it refers to.

Here's my conclusion:
I've examined two files in some depth and found (OK so Harry found some of this)

* Inappropriate programming language usage
* Totally nuts shell tricks
* Hard coded constant files
* Incoherent file naming conventions
* Use of program library subroutines that appear to be
o far from ideal in how they do things when they work
o do not produce an answer consistent with other way to calculate the same thing
o but which fail at undefined times
o and where when the function fails the the program silently continues without reporting the error

AAAAAAAAAARGGGGGHHHHHHHH!!!

DavidP said...

Francis Turner 11/23 08:08pm,
That sounds just like what tends to happen when you try to combine data and tools from a variety of sources, with many of them fairly scrappy to start with. I've done and seen some of this myself, notably at research labs (non-climate). It especially happens when people get stuff with instructions on how to get the data out, but without going through the process of learning how each piece works.
Failure to document things suitably for the next person is also a characteristic failing of the grad. student and the time pressured.

Anonymous said...

“How handy - naming two different files with exactly the same name and relying on their location to differentiate! “

Just that one point in itself is enough that no-one with software reliability experience would believe a thing that came out of the work. And no doubt there is an ocean of items similar or worse.

This is no longer a scientific issue. It is purely a political issue – a power struggle to enslave us in a totalitarian system of personal carbon credits.

Anonymous said...

PS - Francis Turner's spot

"where when the function fails the the program silently continues without reporting the error"

Leaves me speachless.

Anonymous said...

I suspects that "unpleasant" stations have been "turned off" and replaced with "friendly" stations.

I'm not an native english speaker,
can anyone confirm my suspicion

from line 4208 at HARRY_READ_ME
---
I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as

Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO

and one with, usually overlapping and with the same station name and very similar coordinates. I

know it could be old and new stations, but why such large overlaps if that's the case? Aarrggghhh!
---

and the followings

sierra said...

I summarized the post here.

microdave said...

Anon 12:28 - Or ones that have been moved from (cooler) locations, or are now surrounded with buildings that weren't built when originally installed - these would be quite "friendly".

http://joannenova.com.au/globalwarming/photos/surface-stations/tucson_arizona-labelled.jpg

http://wattsupwiththat.files.wordpress.com/2009/03/rome_italy_airport_weather_station_large.jpg

http://www.surfacestations.org/odd_sites.htm

Anonymous said...

as a very experienced software engineer, my take on the HARRY_READ_ME file was that it read like it was written by a reasonably competent programmer working at the limit of his abilities who was brought in to clean up an undocumented mess created by someone else.

if we are to believe the IPCC, we need to treat climate models (including paleoclimate models) like safety-critical systems and subject them to detailed, exacting, review. every single line of code matters and its authors need to be able to show their work.

billions or trillions of dollars may be at stake

Frank said...

Let me add a bit of math insight into one comment:

"ARGH. Just went back to check on synthetic production. Apparently - I have no memory of this at all - we're not doing observed rain days! It's all synthetic from 1990 onwards. So I'm going to need conditionals in the update program to handle that."

Conditionals are often used to offset the fact that the data you're using is an expert opinion rather than fact. Bayesian updates are used to substitute guesses for data. It looks like that's what's going on here..."synthetic" means fake. And he needs to have conditionals to adjust for that.

MikeT said...

wow, just wow. Finished reading through all of Asimov's comments and was horrified. I have been a software engineer, project manager, and executive for over 30 years and this is fubar'd beyond belief.

This code and data is absolutely useless at this point for anything more than a forensic project. Each dataset has to be revalidated individually, named uniquely, metadata added (and verified) and tracked through a version control system. All of the code should be reviewed (all new code for a project this important should be provided for open review and walked through by experts). Unit tests should be written for each subsystem with the test results verified against scientifically accepted acceptance tests. And more.

At this point, speaking as an expert on software development, I can safely say that no papers based on this data and code can be trusted and should all be pulled from publication.

Simply amazing level of fraud and misconduct. I pity the poor guy who went in and tried to clean this up....

Brian Macker said...

I'm a computer engineer and part of the reason why I was so skeptical of these models is that I had experience working with a company that sent up weather balloons to collect data.

It was a joke. I was a kid in college and when I complained about the processes they were using on the data they started having me unload trucks.

Not only is the code garbage but the data going in is. They had horrible error controls on the data entry people.

Anonymous said...

I've got an idea. They should just buy a bigger computer, that will fix things.

"findings are the result of billions of calculations made by the world's biggest super-computer, installed at the Hadley Centre in Berkshire. The latest figures show the earth is heating up fast" Paul Brown of the Guardian, Nov 3, 1998

Anonymous said...

@microdave

I've seen also "Stations" of such type.

It's difficult for me to read the whole stuff in HARRY_READ_ME but I think there
is an example to an other backroom deal with fake and/or "friendly" weather stations.

Hence also the raw datas are suspicous.

Anonymous said...

FWIW Link 1 at TickerForum should be: http://www.tickerforum.org/cgi-ticker/akcs-www?post=118625&page=4

There is no page 13 on that post...

Joel said...

Come on guys, some person freaking out about not understanding the code? Why is that special?

Academic code is usually written just for the person using it... assumptions and estimates disappear into papers and theses instead of comments.

You're taking everything that some wrote in a text file as gospel.

I do large scale ecological simulations, which is very similar to weather simulations (but they actually have better data since it's easier to measure weather than organisms). There are, and always will be, a lot of unknown variables.

Anonymous said...

CHECK THIS OUT:
lines 13745 - 13747:

So the temps are in degs C, vapour pressure's in hPa, wind's in m/s and cloud's fractional.

Then I thought about it. 17.5mm/day is pretty good - especially as it looks to be Eastern Sahara.

I am only a non-climate physicist,
but would this spell out to 17l/m2??
- in Eastern Sahara ??
And he appears content with that???

Anonymous said...

line 14564:

This whole project is SUCH A MESS. No wonder I needed therapy!!

The above thing with the remark on Eastern Sahara seems to have been pure gallows humour.

Actually, i have been in that same seat,
but only on a much smaller and shorter project - and like an earlier commentator mentioned, this guy can
really be pitied...

Michael J. McFadden said...

Ok, I'm a bit out of place here... don't know my armpit from my butt crack when it comes to computer statistical data modeling code, nor have I really looked closely at the whole global warming set of criticisms beyond feeling a general strong suspicion that the conclusions are flowing from the politics rather than the science.

But so MUCH of what is here looks like it would cut and paste perfectly over the field that I *do* play around with more directly: the whole set of statistical antismoking research. It would be SOOOO nice to see and have someone competent skeptically analyze the raw data that's been used in all the studies promoting smoking bans. In the few cases where I *have* been able to get and understand the raw data or something close to it I've generally found it to be horribly abused in order to fit the desired, and paid for, conclusions.

See:

http://www.jacobgrier.com/blog/archives/2210.html

if you have any interest in an example... read the AfterComments to the article.

To Devil and all the previous commenters: WONDERFUL stuff! Imagine if you had the whole motherlode to examine!

Michael J. McFadden
Author of "Dissecting Antismokers' Brains"

The Great Simpleton said...

JFDI

http://petitions.number10.gov.uk/UEACRU/

steve said...

I wonder if one or more of the denier's would be willing to publish their data sets and software if they haven't already.

If they have, I would like to see some bloggers go through it just to see if their is any contrast in quality.

cbullitt said...

Meant to stop by yesterday. Thanks for the link. Love the logo.

Anonymous said...

I've scanned through the HARRY_READ_ME.txt file briefly and "Harry" is constantly referring to Tim -- apparantly there are two Tims:

A few lines in the file Mention

Tyndall Centre grim file created on 22.01.2004 at 13:52 by Dr. Tim Mitchell

That I think is the "main" Tim mentioned in the file. He also mentions "Tim O" -- another Tim it seems:

On to precip problems. Tim O ran some comparisons between 2.10 and 3.00, in general things are
much improved but there are a few hair raisers (asterisked for special concern):

Also Phil and "DL" are mentioned:

The six-digit code for NANCY/ESSEY is 071800. Mailed Phil and DL as this could be a big
problem - many of the Update stations have no other metadata!

I asked Tim and Phil about 1., they couldn't give a definitive opinion.

I would imagine Phil is Phil jones

Anybody know who Tim O, DL and Dr. Tim Mitchell are and if the comments they have provided that are in HARRY_READ_ME.txt are important in any fashion.

Fai Mao said...

I don't know squat about computer code.

I do understand how to enter data into a statistical program.

What they are describing here would get them an "F" on any project in grad school. It might even get them thrown out of a university.

Anonymous said...

For the who's who at CRU have a look at: http://www.cru.uea.ac.uk/cru/people/#Research%20Staff

Tim O is most likely Dr. Tim Osborn, and DL is most likely Mr. David Lister.

fuckgrapefruit said...

Anonymous asked if anyone knew who Tim Mitchell was.

I don't know him personally, but apparently he's a member of the South Park Evangelical Church. click and ctrl F for 'Mitchell'

He also appears to be a 'Young Earth Creationist'. link

The Great Simpleton said...

@Steve 02:20wonder if one or more of the denier's would be willing to publish their data sets and software if they haven't already.

If they have, I would like to see some bloggers go through it just to see if their is any contrast in quality.


Go and look at the diagram at the top of this post (and read the whole thing if you have time). It explains the process for testing scientific theories. As the box on the left hand of the diagram shows, all data has to be made avaialble for external testing.
http://wattsupwiththat.com/2009/11/24/the-people-vs-the-cru-freedom-of-information-my-okole%E2%80%A6/

Furthermore, the equipment (satelittes, ground stations etc) that is used to collect the data, collecting and analysing the data is,mostly, paid out of public funds.

Finally, the financial and human cost of getting the climate forecasts wrong, in either direction, is staggeringly huge.

And yes, the so called deniers do release their data,if and when they get published. As these mails show, that has been very difficult.

Anonymous said...

Joel, he can't understand the code because it's crap and the QA and documentation are non-existant.

As an academic, this may be acceptable practice. But acceptance of AGW has civilisation changing implications. Those of us in the real (non-academic) world are entitled to expect proper industrial levels of software quality as a pre-requisite for spending trillions on AGW.

[ PS - when I was am academic physicist my code was a fuck of a sight better than this ]

dearieme said...

Yes, when I was an academic my code too was far better. I am cursed with a poor memory, and coped by structuring eveything logically and littering it with comments. (On the other hand, havihg a poor memory I did once lose an important program!)

knirirr said...

I am cursed with a poor memory, and coped by structuring eveything logically and littering it with comments.

It's the same for me.
Checking everything into a Subversion repository (which is backed up) complete with a note of what was changed is also very useful indeed.

save us said...

I am a politician. I organise people. I run their lives for the greater good. I do not understand code. I do not understand the implications of this. But I am a politician and I am aware of the need for the greater good as seen by me.

Therefore I will ignore this. I will instead assume there may be something bad about to happen any day now and we need to protect people from it. I will therefore immediately fly to a far flung and possibly sunny location to meet with my fellow politicians. While I am away my staff will, aided by public opinion as defined by my favourite media, institute a program of comprehensive spending, for which we will require more money from people.

This money will be taken in increased taxes and with it will be numerous laws and penalties imposed. I do not understand but I will act firmly to protect both the greater good and my position. I am a politician.

mojo said...

Nobody tell UEA about software versioning systems...

Anonymous said...

I'm in favor of giving Phil Jones the benefit of the doubt. Let's have Phil run this program with his actual data while being broadcast live and videotaped and see if he can replicate his temperature results without "fudging" the numbers.

Expose the Code
Busted not Robust
Bust the Anti-Trust Climate Team
Ed

Anonymous said...

The debate about the contents of the HARRY_READ_ME.txt the validity of the programming and modelling techniques is something only experts and debate.

However, the lay person only needs to know this about the file which they can verify for themselves -- specifically he HARRY_READ_ME.txt file) is a THREE YEAR journal of a CRU programmer describing everything he tried with the data and models in an attempt to reproduce existing results CRU had published. Comments in the file make it clear that “HARRY” tried FOR THREE YEARS to recreate CRU’s published results AND FAILED.

Do you see the REAL significance of this because it is absolutely fatal to the credibility of anything CRU has produced.

What we have here is a documented THREE year effort by a CRU programmer, who had access to all the data, access to all the code, access to all the people who developed the code and the models and still HE could still NOT duplicate CRU’s OWN results. If he can’t it simply means the CRU’s results cannot be reproduced even by themselves and so there is no point anyone else even trying -- CRU themselves have proven it's a waste of time and so they themselves have proven their own results are plain rubbish. That means any "peer reviewed" document CRU produced along with any other papers that cited the CRU papers are based on data the CRU themselves can't verify.

Besides, the absolutly sorry state of affairs in the data handling and software managment the HARRY_READ_ME.txt reveals, the utter and total mess of CRU data and software this document reveals is WHY CRU has not released its data and model software.

Given the CRU is one of the most, if not the most cited sources of climate data -- upon which trillions of dollars of economic policy is being set, the importance of what the HARRY_READ_ME.txt file reveals becomes scary.

kevin said...

Just read HARRY_READ_ME.text file(shouldn't that be HARRY_README.txt). Anyway Unit testing, continuous builds, version control, portable code...not a hint of it. And what the hell are they doing hard coding file paths in the source code!!
eg InfoFile = '/cru/mikeh1/f709762/f90/obs/_ref/station-list-ncep.txt'


It took our luckless Harry 2 days just to get something that was working perfectly well before to work again! I've never worked with prolog/f90. Don't they have something along the lines of Maven2 to ease the pains of the build cycle?

It's great fun reading this. Makes me realise why I get paid more than some sorry hacker at UEA

Anonymous said...

Anyone remember cold fusion?

If "Peer Review" is let loose the CRU and anything attached to it will be profoundly repudiated and suspect.

With 40 billion dollars in money pending Copenhagen,who wants to vote this will all be suppressed by those entities with that much at stake?

Anonymous said...

ClimateGate Data Series Part I: A break-down of large data file for manipulating global temperature trends from 2006-2009

By Computer Scientist J. Smith of (repubx.com)

Overview of the Global Warming Myth Data

Al Gore, George Soros and other Carbon Pirates have got Congress to fund and direct Billions of dollars towards “Green Energy” and “Carbon Control” projects based on bogus data claims. These data claims stem from the recently hacked Climate Research Unit (CRU) at the University of East Anglia in the UK.

The CRU was set up by the US Dept. of Energy at this university. Why not a US university? It could be because the UK, for the longest time, had no “Freedom of Information Act”, like the US does. You’ll soon see why these scientists and the Dept. of Energy wanted to keep the data a secret or unverifiable. Because the outcomes and the analysis of the data are purely political (Scientists likely paid or on payroll to do so), it has not factual or scientific basis whatsoever.

How do we know this to be true? Because recently, 3000 e-mails of the Scientists’ conducting the research at the CRU were hacked, along with several data files. While the e-mails are quite damming, the data files are the “guts” of the CRU's actual computer models—the data, the code and the applications. It is really the blatant manipulation of the data that is the “dagger” to the heart of the Global Warming Myth.

Read the rest here: Repubx.com

B said...

To all the apologists for the poor code:

It is certainly common for academics to use sloppy undocumented code with data files lacking metadata as i and many others have done this. HOWEVER, there is really no excuse for such garbage when a project is long term and the data and codes are to be used by others in the future. Any person should be able to understand this.

Unfortunately, Fai Mao, this is far more common than you would think. I recently quit a position at a University well known for hydrology and water resources engineering because of repeatedly encountering issues similar to what poor Harry dealt with, but with respect to field data collection rather than computer code. I witnessed a complete lack of quality control combined with often times lazy,incompetent, and overworked grad students performing all the "work" and given the benefit of the doubt with a blind eye turned their way.

I don't believe it is entirely anyone's fault, just a product of slash and burn science that uses graduate students as grunts. Unfortunately many of these grad grunts could literally care less about the scientific method; many are simply performing grunt work for a piece of paper called a degree.

For the professors, as long as the publications roll out and new funding rolls in, what is there to worry about? Well aside from an occasional FOIA request, which would never be issued to the vast majority of scientists. It isn't like we are a society (western) with any real integrity, right? What do people really expect? Are we that delusional?

Ian said...

To my limited understanding, dynamical systems, upon which climate science relies, consists of putting a whole bunch of variables into a series of equations, seeing how they interact, tinkering with them to make predictions then reverse engineering the results to generalize the case and make further projections. To the layman's eye, easily confused with fudging. And to be fair to the skeptical layman, easily abused by scientists and pseudo scientists and politicians and the like.

jetx said...

ClimateGate Data Series Part 2: A break-down of large data file for manipulating global temperature trends from 2006-2009

By Computer Scientist J. Smith of (repubx.com)

In Part I of this series we learned that Ian "Harry" Harris was creating a different data set for the 2006-2009 temperature readings that excluded Alpha values, those values that pose a risk for rejecting the global warming trend. In short, eliminating problematic values.

Moving on through this data file (see part I for brief explanation of this file), Harry indicates he wants to include "non-land" temperature cells for the new standard grid model(Harry says):

"getting closer but there's always another problem to be evaded. Instead, will try using rawtogrim.f90 to convert straight to GRIM. This will include non-land cells but for comparison purposes that shouldn't be a big problem... "

The Fortran 90 program "rawtogrim.f90" will convert all the temperature readings from the other data files into a format that can be compared with non-land temperature cells. A "temperature cell" is a little square block or "cell" that is appears on the model image. Instead of calling it a "sector", it is referred to as a cell that represents that area of the model map. GRIM refers to "GPS Receiver Interface Module". In short, the temp readings from a Global Positioning Satellite. The format for GRIM temperatures are different from the model temperatures Harry just generated, so in order to compare them, they must all be coverted to a similiar format (the GRIM format).

Read the rest of part II and see part I (if you missed) here:
Read part II

Anonymous said...

Ahhhhh ... Man, this just keeps getting better and better.

I am SO going to enjoy watching all these arrogant global warming pricks eat shit over the next few months as the scope of this scam is gradually revealed.

jetx said...

ClimateGate Data Series Part 3: A break-down of large data file for manipulating global temperature trends from 2006-2009 (also includes link to Part 1 and 2).

Excerpt *****************

In Part 1 of this series we learned that Ian "Harry" Harris was creating a different data set for the 2006-2009 temperature readings that excluded Alpha values, those values that pose a risk for rejecting the global warming trend. In short, eliminating problematic values.

In Part 2 of this series we learned that one Ian "Harry" Harris claimed Dr. Tim Osborne was using the wrong temperature values when performing comparisons with temperature anomaly values. Also, when doing compiling the precipitation results, Harry commented Tim's program for doing the compilation was "Not good enough for the bloody Sun!!!" and caused several errors. Harry was able to get the precipitation results to compile with Tim's program, but only after replacing questionable values with a default filler value of "-9999". Harry also indicated that he and Tim's results differed by at 5%.

Moving on through this data file (see part 1 for brief explanation of this file), after compiling the precipitation results, Harry follows the sames steps as for the temperatures by running an IDL routine to create a Standard Grid Model (see part 1 for what a grid model is). Harry has problems compiling the precipitation results and was able to get it working as he found it was not finding certain files (comment 18).

Tampering with Precipitation File Dates ...

See this here

Anonymous said...

"Now, I'm no climate modeller or even a professional coder"

Yeah, that's pretty clear. Otherwise you'd know that you're creaming your pants over pretty standard "oh shit" programmer comments that you'd see if you stole the source code to an ATM machine, too.

In technical fields like programming, we tend to go "oh shit" over pretty complex and nuanced topics with the understanding that anyone else reading the "oh shit" remarks will understand enough of the context around them to truly comprehend the gravity (or lack thereof) of the situation.

When layman start picking things apart -- "LOOK! HE SAID OH SHIT!" -- without knowing what we're actually talking about, things start getting pretty fucking stupid.

The Great Simpleton said...

Yeah, that's pretty clear. Otherwise you'd know that you're creaming your pants over pretty standard "oh shit" programmer comments that you'd see if you stole the source code to an ATM machine, too.

In technical fields like programming, we tend to go "oh shit" over pretty complex and nuanced topics with the understanding that anyone else reading the "oh shit" remarks will understand enough of the context around them to truly comprehend the gravity (or lack thereof) of the situation.

I don't give a flying fuck about the quality of programmer and code that is used at an ATM, the impact on my life if a mistake is negligible.

If they get the AGW calculations wrong we are all going to be impoverished with many millions dying.

What would you say if the drug companies were found to be this cavalier with their progamming, data sets and peer review process if something went wrong and the regulators went to investigate?

And yes, I am prepared to accept there is an innocent explanation for a lot of it, I just want to be sure that an independent audit has looked at those explanations and rerun the data.

jjauregui said...

Stop bitching, take responsibility and take action. Stop all donations to the political party(s) responsible for this fraud. Stop donations to all environmental groups which funded this Global Warming propaganda campaign with our money, especially The Environmental Defense Fund. Write your state and federal representatives demanding wall to wall investigations of government sponsored funding and coordination of this and related propaganda campaigns and demand indictments of those responsible. Write your state and federal Attorneys General demanding Al Gore and others conducting Global Warming/Climate Change racketeering and mail fraud operations be brought to justice, indicted, tried, convicted and jailed. That’s what I have done in response to this outrageous violation of the public trust. Think of the consequences if you do nothing! For one, the UK is becoming the poster child for George Orwell’s “1984” and the US government’s sponsorship of this worldwide Global Warming propaganda campaign puts it in a class with the failed Soviet Union’s relentless violation of the basic human right to truthful government generated information. Given ClimateGate’s burgeoning revelations of outrageous government misconduct and massive covert misinformation, what are the chances that this Administration’s National Health Care sales campaign is anywhere near to the truth?

jetx said...

ClimateGate Data Series Part 4: A break-down of large data file for manipulating global temperature trends from 2006-2009

Excerpt .............

In Part 3 of this series we learned that the precipitation temperature database file dates were altered, but not actually updated according to the modfied dates. In short, the final version of the precipitation files compiled by Dr. Tim Osborne could not have been using the latest precipitation database (as Harry said). The synthetic Cloud precipitation values were missing from 1996-2001 and lost by a collegue of Harry's by the name of Mark. To accomadate, Harry found a Fortran program created by Mark to convert Sunshine temperature values (those temperatures with no Clouds) to "Psuedo Cloud" temperature values. In short, convert many of the Sunshine temperature values to more Cloud-like temperatures (which often run warmer). Not finding a good database with precipitation values (because everything was undocumented), Harry just picked one he thought would be a good Candidate for the compilation of precipitation results and forming a standard grid model for those results.

Moving on through this data file (see part 1 for brief explanation of this file), Harry was finally able to create a standard grid model file for precipitation (1994-2002), though the database having the normal precipitation values he is using was picked on a hunch, and many of those values were converted indicate more Cloud-like temps, instead of their original "Sunshine only" temperature values.

See part 4 here

Anonymous said...

you dingbats. here is gavin schmidt's explanation of the file in response to a question about it:

"Chris Byrne says:
23 November 2009 at 1:54 AM
Gavin,

Have you had a look through the coding comments in Harry_Read_Me.txt? There’s a lot of speculation in the blogosphere about this. I have to admit I had a chuckle.

http://www.devilskitchen.me.uk/2009/11/data-horribilis-harryreadmetxt-file.html

[Response: That file is obviously just a notebook for someone piecing together work legacy code made by other people. Messy for sure, but certainly not the 'final version' of the code. It was probably produced in moving from the CRU TS 2.1 to 3.0 version (which is a completely separate data set from the standard HadCRUT numbers by the way) and involves a lot more interpolation. See here: http://www.cru.uea.ac.uk/cru/data/hrg.htm (when their server comes back up), also Mitchell and Jones (2005). - gavin]"


from: http://www.realclimate.org/index.php/archives/2009/11/the-cru-hack-context/

Optimistic Cynic said...

"I'm in favor of giving Phil Jones the benefit of the doubt. Let's have Phil run this program with his actual data while being broadcast live and videotaped and see if he can replicate his temperature results without "fudging" the numbers. "

I used to run huge batch software runs (not so much now) and had a ton of documentation, a list of inputs and could absolutely do that with any of the major systems I wrote.

The simple fact is that if they're computer modelling, then they should have raw data, algorithms for altering data and it should all run as a top to bottom batch run.

When I see things like manual alterations to annual data, it gives me the hebegeebies. If the alterations come from an algorithm, then it shouldn't be in code as alterations, but as an algorithm.

It's shockingly bad. I've worked on projects where I have memories of similar disasters, but none of them involved anything as serious as predicting massive public spending. If this were banking, health or aerospace software, their regulatory authorities would have shut them down.

Squander Two said...

Can I justpop up to say that I wrote this three-and-a-half years ago now? Quite proud of myself now, but it turns out I wasn't nearly cynical enough.

Thanks.

jetx said...

ClimateGate Data Series Part 5: A break-down of large data file for manipulating global temperature trends from 2006-2009

Excerpt: ............

In Part 4 of this series we learned there were 6003 missing precipitation temperature values out of a possible 15,942 temperature readings. The missing 6003 values were not recovered. Also, there were over 200 weather stations with a temperature reading of '0' (North Africa and the West coast of South America) for their cells. According to Harry, there was a '0' reading for each of these 200+ stations throughout the whole temperature series from 1901-1996, thus making Phil's comment illogical in that a '0' meant the climate has not changed since the last reading. If this were the case, North Africa and the West coast of South America would not have had a temperature change ever since recording of the temps!

See part 5 here

Philip Painter said...

I've read the whole thing and as a retired programmer have every sympathy for Harry.

I suspect the root problem that CRU had was that they got scientists to write the code and debug it, manage the databases, do the backups [if they did], write the software documentation [if they did] etc. They may be ace climate scientists but they probably didn't even realize they weren't doing a good software engineering job until Harry discovered the problems he did.

As a technical detail, much has been made elsewhere of the "adjustments" of +/- 0.5 degree assuming these are temperatures and 0.5C changes over a few years is a big number. Well, I think that these 0.5 degree changes are to do with location - latitude and longitude. The data [outside Europe] is divided into boxes of lat/lon [0.5 degrees I think]. These adjustments of up to 0.5 degrees then make sense and are not the big fiddle others have claimed.

In the coming inquiry I hope they strongly suggest that the software be re-developed and managed by software professionals - maybe from the UEA computer department.