Skip to content

SSL implementations compared

I reviewed several SSL implementations for coding style: OpenSSL, NSS, GnuTLS, JSSE, Botan, MatrixSSL and PolarSSL. I looked at how buffers are handled in parsers and writers. Of all of them, I think only JSSE, i.e. pure Java, can be trusted to be free of buffer overflows. It suggests that a good webserver for security-critical applications would be Tomcat, without native extensions.

In OpenSSL, the Heartbleed patch itself is a good example of what not to do:

    /* Read type and payload length first */
    if (1 + 2 + 16 > s->s3->rrec.length)
        return 0; /* silently discard */
    hbtype = *p++;
    n2s(p, payload);
    if (1 + 2 + payload + 16 > s->s3->rrec.length)
        return 0; /* silently discard per RFC 6520 sec. 4 */
    pl = p;

Bounds checks are rolled up into an obscure calculation, then the code proceeds to memcpy() straight out of the buffer.

NSS has a similar style. For example, ssl3_ComputeDHKeyHash() has:

    bufLen = 2*SSL3_RANDOM_LENGTH + 2 + dh_p.len + 2 + dh_g.len + 2 + dh_Ys.len;

In GnuTLS, a similar example of precalculation can be found in this Heartbleed-inspired patch:

 response = gnutls_malloc(1 + 2 + data_size + DEFAULT_PADDING_SIZE);

PolarSSL was also similar in style.

An embedded, open-core SSL library called MatrixSSL shows an alternative style, which proves that there are alternatives to precalculation. It has bounds checks distributed throughout the parser. Before each read, the remaining length is calculated from the cursor position, and compared with the read length:

    if (end - c < SSL2_HEADER_LEN) {
        return SSL_PARTIAL;
    }
    if (ssl->majVer != 0 || (*c & 0x80) == 0) {
        if (end - c < ssl->recordHeadLen) {
            return SSL_PARTIAL;
        }
        ssl->rec.type = *c; c++;
        ssl->rec.majVer = *c; c++;
        ssl->rec.minVer = *c; c++;
        ssl->rec.len = *c << 8; c++;
        ssl->rec.len += *c; c++;
    } else {
        ssl->rec.type = SSL_RECORD_TYPE_HANDSHAKE;
        ssl->rec.majVer = 2;
        ssl->rec.minVer = 0;
        ssl->rec.len = (*c & 0x7f) << 8; c++;
        ssl->rec.len += *c; c++;
    }

This makes auditing easier, and should be commended. However, it’s still a long way short of actually following CERT security guidelines by using a safe string library. And since this is an embedded library, it would have been nice to see a backend free of dynamic allocation, to allow verification tools like Polyspace Code Prover to provide strong guarantees.

Botan is written in C++ by a single author who claims to be aware of security issues, which sounded very promising. In C++, implementing bounds checks should be trivial. However, the code didn’t live up to my expectations. It does pass around std::vector<byte> instead of char* (often by value!), but the author makes extensive use of a memcpy() wrapper to actually read and write those vectors. For example:

std::vector<byte> Heartbeat_Message::contents() const
   {
   std::vector<byte> send_buf(3 + m_payload.size() + 16);
   send_buf[0] = m_type;
   send_buf[1] = get_byte<u16bit>(0, m_payload.size());
   send_buf[2] = get_byte<u16bit>(1, m_payload.size());
   copy_mem(&send_buf[3], &m_payload[0], m_payload.size());
   // leave padding as all zeros

   return send_buf;
   }

The functions are short, with short code paths between length determination and buffer use, which gives it similar auditability to MatrixSSL. But the potential security advantages of using C++ over C were squandered, and frequent copying and small dynamic allocations means that performance will not be comparable to the C libraries.

So it seems that in every case, authors of C and C++ SSL libraries have found unbounded memory access primitives like memcpy() to be too tempting to pass up. Thus, if we want to have an SSL library with implicit, pervasive bounds checking, apparently the only option is to use a library written in a language which forces bounds checking. The best example of this is surely JSSE, also known as javax.net.ssl. This SSL library is written in pure Java code — the implementation can be found here. As I noted in my introduction, it is used by Tomcat as long as you don’t use the native library. The native library gives you “FIPS 140-2 support for TLS/SSL”; that is to say, it links to a library that probably has undiscovered buffer overflow vulnerabilities.

Using Polyspace Code Prover to verify server software

Polyspace Code Prover is a commercial software verification tool. I exchanged many emails and four phone calls with Mathworks in order to convince them to give me a one-month free trial. Unlike my usual toolset, it is not “free as in freedom”, it is free as in “have fun with those FlexLM bugs, bwahaha”. But, I’m a pragmatist.

PCP can be used to verify C, C++ or Ada code. It is mostly aimed at embedded software development. Contrary to the fears of the Mathworks sales team, it can be used to verify server software. Over the last few days, I’ve used it to find several bugs in PHP’s exif extension. It is awesome and probably smarter than me. It didn’t take long for it to start telling me things about the code that I didn’t know.

The general workflow is:

  • Choose a smallish (~2000 line) module
  • Polyspace automatically generates stubs to represent all undefined functions, and generates a main() function which calls all exported (i.e. non-static) functions
  • Analysis generates hundreds of “orange” checks
  • You then refine the stub and main generation, and tweak the source as necessary, in order to reduce the number of orange checks
  • The remaining orange checks can be acknowledged with saved comments on the analysis.

PCP describes orange checks as “unproven code”, but that terminology downplays their severity. An orange check can be a blatant bug, for example:

char * p = malloc(1);
*p = 'x';

PCP knows that malloc() can fail and return NULL, and will flag the dereference orange, to point out that you haven’t checked for that.

The fact that PCP is mostly intended for embedded systems is fairly obvious. Stubs for standard library functions are provided, but POSIX functions such as stat() are missing and need to be written by the user. Perhaps the biggest weakness is its string handling. For example, its sprintf() stub is essentially:

int sprintf(char *s, const char *format, ...)
{
	int length;
	int i;
	ASSERT_IS_VALID_STRING(format);
	length = _polyspace_random_int();
	if (length < 0) {
		length = 0;
	}
	for (i=0; i < length; i++) {
		s[i] = _polyspace_random_char();
	}
	s[length] = 0;
	return length;
}

This is, of course, a gross simplification of this fraught function, and leaves plenty of room for uncaught segfaults.

PCP has a system for customising automatic generation of stub functions, called DRS. I didn’t have much luck with it — I found that progress was much faster when I just wrote the stub functions in C, following examples from polyspace/verifier/cxx/cinclude/__polyspace__stdstubs.c . DRS focuses on specifying valid ranges for numbers, which wasn’t a particularly useful approach for my test case.

Getting started with Polyspace Code Prover

PCP has some stupid defaults which you need to be careful with:

  • When adding include paths, “add recursively” is checked by default. You really don’t want to add /usr/include recursively. Those include paths are equivalent to -I options to gcc, except without system defaults. They need to be non-recursive and correctly ordered.
  • “Target processor type” defaults to i386 on an x86-64 system
  • Check Behavior > Overflow computation mode: should be “wrap-around”, to match the compiler.
  • Precision: These options control the run time and model complexity. I’ve found precision level 1 and verification level 0 to be sufficient for initial exploration, and a run at that level takes a mere 40 minutes for my exif.c test case. A run to verification level 4 took about 10 processor-hours and maybe 1.5 GB per core of RAM. It’s probably good to have a long-running configuration prepared so that you can quickly start it to run overnight before you leave your computer for the day.

Some bugs I found:

  • It doesn’t seem to like assignment expressions, for example in the condition of an if statement. Well, who can blame it, I don’t like them either. The affected assignments were ignored.
  • It miscounted the length of a string literal containing null bytes. I couldn’t reproduce it in an isolated test case.
  • The engine sometimes failed with “Maximum number of users for PolySpace_Bug_Finder_Engine reached”. The solution for this was rm /dev/shm/sem.mw*.
  • The FlexNet binaries in etc/glnxa64 had the wrong ELF interpreter, so attempting to run them would fail with “file not found”. You can identify the interpreter with objdump -j .interp -s lmutil and then symlink it to your actual interpreter, e.g. ln -s /lib/x86_64-linux-gnu/ld-2.15.so /lib64/ld-lsb-x86-64.so.3

It’s not for the faint hearted. But, with patience, you can test code in a way which is mostly orthogonal to human code review, unit testing and fuzz testing. Powerful static analysis tools like PCP could make a valuable contribution to security testing of server code.

Season clock

My wife Angela wanted a clock for her birthday where its one and only hand would go around once per year.

Sure, I can do that, I said. Angela chose a wall clock to modify, and chose artwork for the dial. I designed and built the circuit, and wrote the microcontroller code.

Schematic

In-circuit programming made testing and debugging easy. I built the circuit on Veroboard.

Veroboard layout

Blue lines are copper tracks, red lines are jumpers. Cyan marks the track breaks, which I etched out with a dremel engraving bit.

I had to take the hands off to get the movement box open. The hands were made of metal barely thicker than foil, and it was necessary to use quite a bit of force to get them off the shaft. So I’m quite proud of myself for managing to save the minute hand and to get it back on to the shaft intact.

It was somewhat scary pulling the movement apart and putting it back together again. It was necessary to take all the tiny nylon gears out to get to the motor connections. When I put the case back on, one of the gear axles wasn’t quite lined up correctly, and it bent through almost 90°. I bent it back into shape by hand, and it somehow still worked afterwards.

The code is interrupt-driven. With a 32.768 kHz system clock in idle mode, the current draw was only 80µA, plus about 0.1µA average for driving the motor (3mA pulses with 0.004% duty cycle). So we can expect 3 years of battery life from a pair of AA alkalines.

/**
 * A driver for a standard battery-powered analogue clock, which will tick once
 * every 8766.15 seconds, so that the minute hand will complete a revolution once
 * per year. With the hour and second hands cut off, and appropriate artwork 
 * attached to the dial, the minute hand will tell you what season it is.
 *
 * An ATtiny24/44/84 should be used, with a 32.768 kHz watch crystal attached
 * to XTAL1/XTAL2. PA0 and PA1 are used as a virtual H-bridge.
 *
 * According to Silicon Chip March 2008, a suitable pulse is 13mA for 100ms. 
 * With 3V battery power, that suggests a current-limiting resistor in the 
 * vicinity of 230 ohms.
 *
 * For my continuous sweep wall clock, the multimeter says:
 *   * coil resistance = 560 ohm
 *   * frequency at either port = 8 Hz
 *   * duty cycle = 18.7% (implies 23.4ms pulse)
 *   * VDC = 0.3V (divided by 18.7% implies full swing of 1.6V)
 * 
 * So that implies a 560 ohm resistor for a supply of 3.2V
 * Note: dial diameter 237mm.
 *
 * Timer0 is used for pulse duration timing. Timer1, in combination with the 
 * counter variable, is used as the real-time clock.
 *
 * LFUSE must be set to 0xe6 to enable the external crystal.
 */
#define F_CPU 32768
#include <util/delay.h>
#include <avr/io.h>
#include <avr/sleep.h>
#include <avr/power.h>
#include <avr/interrupt.h>
#include <avr/wdt.h>
 
static long counter = 0;
static enum {
	PULSE_POSITIVE,
	PULSE_NEGATIVE
} nextPulse;
 
// 0.023 seconds with prescale 64
#define TICK_DURATION (short)(0.023 * F_CPU / 64)
 
#ifdef TEST_MODE
#define PERIOD_FACTOR_1 (unsigned int)(F_CPU / 16)
#define PERIOD_FACTOR_2 1
#elif CLOCK_WAS_1HZ
// Tick with period 50026*5742/32768 seconds
// 50026 * 5742 / 32768 happens to be very close to the number of hours in
// a year. So our minute hand will cover one revolution in one year, with a 
// residue here of only 8ms (although of course the quartz crystal is not 
// that stable, or even calibrated correctly)
#define PERIOD_FACTOR_1 (unsigned int)(50026.0 * F_CPU / 32768)
#define PERIOD_FACTOR_2 5742
#else /* 16 Hz */
// This is for continuous sweep movements that normally require 16 ticks per second
// Tick with period 15845*1133/32768 seconds.
// For testing, a square wave will appear on PA3 with frequency 1.034017 Hz.
#define PERIOD_FACTOR_1 (unsigned int)(15845.0 * F_CPU / 32768)
#define PERIOD_FACTOR_2 1133
#endif
 
static inline void tick() {
	// Start the clock
	power_timer0_enable();
	// Set up Timer0A to interrupt after the tick duration
	// Reset the counter
	TCNT0 = 0;
	// Enable output compare interrupt
	TIMSK0 |= _BV(OCIE0A);
	// Start the pulse
	if (nextPulse == PULSE_POSITIVE) {
		nextPulse = PULSE_NEGATIVE;
		PORTA |= _BV(PA0);
	} else {
		nextPulse = PULSE_POSITIVE;
		PORTA |= _BV(PA1);
	}
}
 
ISR(TIM0_COMPA_vect) {
	// End the pulse
	PORTA &= ~_BV(PA0);
	PORTA &= ~_BV(PA1);
 
	// Disable the interrupt and the timer module
	TIMSK0 &= _BV(OCIE0A);
	TCNT0 = 0;
	power_timer0_disable();
}
 
ISR(TIM1_COMPA_vect) {
	// Toggle the test signal
	PORTA ^= _BV(PA3);
	if (++counter >= PERIOD_FACTOR_2) {
		// Cycle the stepper motor
		counter = 0;
		tick();
	}
}
 
int main() {
	wdt_disable();
	power_adc_disable();
	power_usi_disable();
 
	// Enable pullups for unconnected pins
	PORTA = ~_BV(PA0) & ~_BV(PA1) & ~_BV(PA3);
	PORTB = ~_BV(PB0) & ~_BV(PB1);
	// Set direction for outputs PA0, PA1 and PA3
	DDRA = _BV(DDA0) | _BV(DDA1) | _BV(DDA3);
 
	// Set up Timer0 with prescale=64, but disable the clock initially
	TCCR0B = _BV(CS01) | _BV(CS00);
	OCR0A = TICK_DURATION;
	power_timer0_disable();
 
	// Set Clear Timer on Compare mode
	TCCR1B |= _BV(WGM12);
	// Set clock source CLKIO no prescaler
	TCCR1B |= _BV(CS10);
	// Set the period
	OCR1A = PERIOD_FACTOR_1 - 1;
	// Enable interrupts
	sei();
	TIMSK1 |= _BV(OCIE1A);
	// Interrupt loop with idle mode between interrupts
	set_sleep_mode(SLEEP_MODE_IDLE);
	while (1) {
		sleep_mode();
	}
}

Thecus NAS

I just bought a NAS, specifically a Thecus N2200EVO, to do LAN music streaming and some other miscellaneous storage tasks at home. I was just on the verge of disappointment at a few minor niggles, when I discovered the SSH interface.

It lets you log in as root. It’s Linux on ARM. All the partitions are writable, and /etc appears to be mounted on a persistent storage device. The web interface is written in PHP. Within a few minutes, I had hacked out the annoying Flash login page, now it is replaced by an HTML fallback (at least until the next reboot).

The startup process is implemented in shell script, with explanatory comments by people called Astro, Michelle and Angus. It uses SQLite for configuration storage, in addition to plain files.

I understand from a quick bit of web surfing that it is trivial to write your own “module”, which can contain modifications to the PHP web interface, arbitrary cross-compiled binaries, etc. Modules can be installed via the admin interface and are stored on the attached hard drives.

It’s hard to imagine a consumer device offering an easier path to customisation for a LAMP developer like me.