Wednesday, September 18, 2013

Bit Extensions in Java

Java is my favorite language. For almost everything, it provides just enough abstraction to make everything warm and fuzzy. Occasionally though, the low-level control you forfeit can cause some issues.

Numeric manipulation is a huge source of pain in Java. Like a lot of languages, Java signs its numeric primitives: byteshortint, and longfloat and double are also signed, but I won’t discuss them here. Unlike a lot of languages, Java provides absolutely no way of using an unsigned numeric primitive. If you’re going to use a numeric primitive, you’re going to sacrifice a bit to the sign.

As a result, in casting a numeric primitive to a larger (i.e. more bits) numeric primitive, Java employs a technique called sign-extension. The idea is pretty simple: take some 8-bit byte b:


In a 2s complement signing scheme, b represents -1. If we want to promote b to a 16-bit short, just sticking 8 zeroes on the front wouldn’t work, since 0000000011111111 represents 255, not -1. In order to increase the number of bits while preserving the value of the original byte, we have to sign-extend it. Essentially, you “smear” the most significant bit leftward, until the desired number of bits is achieved. In our case, the leading 1 would be smeared to the left 8 bits and b would be promoted to 1111111111111111. Fortunately, this does represent -1.

But what happens when you don’t care about the value? This method of extending bits is great when the value needs to be preserved across the cast, but can cause issues when you want to manipulate the numbers bit by bit. As a motivating example, let’s consider two bytes:


We want a 16-bit short such that the upper 8 bits are taken from highByte, and the lower 8 bits are taken from lowByte. Using the bit-twiddling OR operator, here’s what we try:


We expect to see 1010101011111111, but we check the output, and see that it’s 1111111111111111! The problem is that lowByte was sign extended implicitly to  1111111111111111 during the OR operation (for now, ignore that bit operations in Java actually cast the arguments to int; it doesn't matter at the moment). As you can see, zero extension is what we need. With zero extension, we always smear 0s, regardless of the leading bit of the number that's being promoted. 

Unfortunately Java doesn’t really give you a way to do this, just like it didn’t give you a way to use unsigned numeric primitives. Fortunately, it’s pretty easy to do yourself. Here’s how to zero extend a byte to a short:


Here, block is promoted to an int (using sign extension), since Java bit operations take int arguments. ANDing with 0x000000FF causes all but the bottom 8 bits of block to be set to 0, which means that any 1s that were smeared are set to 0, as if block was zero extended in the first place. Then, we cast back to short, which just lops off the top 16 bits of the result of the bit-operation, which Java returned as an int. This successfully zero extends block. Here's our second attempt using the new zero extending method:


We check the output, and it’s the 1010101011111111 we were looking for.

My entire ZeroExtender class can be found here. This is pretty basic stuff, but zero extension is extremely important in bit-twiddling. If you don’t keep Java’s signing scheme in mind while dealing with numbers, you’ll undoubtably run into code-breaking bugs.

If you were unfamiliar with signed numbers and sign-extension, hopefully this will help you avoid problems in the future. Thanks for stopping by, and stay tuned for creating a more variable number class, where we actually put zero extension to use.

No comments:

Post a Comment