Friday, December 13, 2013

Tetrocity

The source code for this project can be accessed here. This post is largely a collection of loosely related thoughts I have about the development process. Go play the game first! Do it!

Not too long ago I was trudging through one of my online classes when the instructor mentioned something interesting. He brought up the idea of an expansion of Tetris: instead of game pieces (Tetriminoes) with 4 blocks… why not 5!? I was more intrigued that I had never thought of this before, considering the dozens of hours I wasted becoming mediocre at Tetris With Friends. I had made small command-line games before, but nothing really substantial and, you know, actually fun.

So, I set off. Like every OOP project anyone has ever embarked on, I was set on making it as clean, organized, encapsulated, modular, blah blah blah as possible. Surprisingly, I actually managed to succeed, for the most part. I spent a huge amount of time with Microsoft Word, planning my objects and their roles, game features and mechanics, before writing a single line of code.

It was a pretty huge revelation in my development as a programmer to see just how much this helped. It’s easy to get so excited about your ideas that you jump straight in to the code and make up everything as you go along. Almost every time I ever did that, though, it would turn messy pretty fast. Didn’t see that problem coming? Just add a method to this class… I’ll mark it private so nobody will see how badly it doesn’t belong there. That kind of thing adds up fast. I would at this point like to give a shout-out to any recruiter that may be reading this blog post. Please ignore this paragraph.

This necessity is one of the biggest disadvantages and advantages of OOP in general. In functional programming, it’s much easier to implement the functions you need as you go along. Keeping them organized is still important, but not nearly to the degree of OOP. Developing a system in a language like Java requires a lot more forethought. If you put in that initial effort, though, the payoff is massive.

Anyway, after a few weeks of playing around with the idea on and off, getting it sorted out in my head and on paper, I was ready to embark. First things first: more administration. I began by creating every single class that I had anticipated needing, and writing full documentation for each one. This is really just a continuation of the idea from before, but a hell of a lot more boring. I felt it was a lot like eating a salad. Immensely painful and soul rendering, but you walk away feeling better about yourself.

One of the biggest challenges I faced was designing the game in such a way that it would be easy to implement new features down the line. Despite the planning, I really didn’t know what would set my version of Tetris apart from the hordes of Tetris clones already out there.

I began by developing a small set of “fundamental truths” about my game’s mechanics. Really basic stuff that you would expect from any Tetris adaptation: pieces always fall south, the game is over when the board fills, etc. Everything else was completely variable. Tetrimino size? Needs to be potentially infinite. The number of Tetriminoes falling at once? Needs to be potentially infinite. Board size? Needs to be potentially infinite. You see where I’m going with this.

This is where I see a lot of code fail. It’s absolutely possible to write Tetris in a few hundred lines of code, with full GUI support and everything. However, this approach is too static in nature. The code fulfills a very specific purpose: nothing more, nothing less. Avoiding this is a critical aspect of good code. Good code is a brick in a building. You shouldn’t have to demolish the entire thing to add a new floor.

***

Enough preaching! I won’t explain the details of the code or anything. That would be tedious and you can see all of the code on the GitHub repo (it's pretty well documented. Start with the model package if you're interested in learning the game's structure). Instead, I’ll talk about some important things I learned, and then present some interesting problems I ran into, as well as how I solved them.

My first vision for Tetrocity was that every game would be highly customizable. I would begin by asking the player for information such as the range of Tetrimino lengths (i.e. the number of blocks that make up the piece), and the game would begin with those parameters. This was going to be a real selling point. “Play with huge pieces! Infinite possibilities!” I was going to make my game capable of producing infinite-length Tetrimino pieces anyway, so may as well give the option, right?

The first harsh realization I had was there was a huge difference between novelty and genuine entertainment. This is what it looks like when you ask for length 20 Tetriminoes. 


It’s definitely interesting to look at, but you certainly wouldn’t want to play a game like that. It would be impossible. In fact, even length 7 pieces are way too hard to use. Here’s my friend’s best attempt at playing with them (he could barely clear a single line).


If these lengths are impossible to play with, why would anybody want to? After the novelty of seeing them for the first time, there really isn’t much appeal. 

The more I thought about it, the more I began to realize that giving the player all of that control was not a good idea. How would they know what a realistic range of lengths are? Who wants to spend all of that time figuring things out before being able to enjoy the game? On top of that, my own experience with videogames lead me to realize that players would rather overcome a challenge set out for them than overcome one they set out for themselves. By taking control of the course of the game, I could much more easily control things like the difficulty curve, and high scores would have actual meaning. Tetrocity went from fully customizable to “Press start to play”.

I ended up limiting the range of Tetriminoes ever produced in a game to a minimum of 3 and a maximum of 6. It was somewhat disappointing that I wouldn’t be taking full advantage of my infinite length Tetrimino-producing object, but it was a necessary sacrifice of ego.

***

Tetriminoes are randomly constructed in a pretty straightforward way. You begin with an empty grid, place a single block in the center, and then randomly choose grid coordinates that share an edge with a block until you run out of blocks to place. This seems straightforward enough, but ended up posing a lot of problems. First off, this method does not produce every shape with equal probability.

To see this, let’s consider the original set of length-4 Tetriminoes being randomly constructed with this method. Is a line piece just a likely to be constructed as a square piece? Well, there’s only one way to construct a square piece. It looks the exact same regardless of how it’s rotated. A line piece, however, can be constructed two ways: vertically, or horizontally. These two orientations are produced via an entirely different set of coordinates, but any player would consider them equivalent. The more rotational symmetry a shape has, the less likely it is to be produced.

The first question we ask is: is this really a problem? Is the effect noticeable? As it turns out, not only is it noticeable, but it’s also somewhat desirable. In a very hand-wavy sense, the more rotational symmetry a piece has, the less likely we’ll be able to find a place for it, since you can’t rotate it to fit various openings. This is a trivial concern for length-4 pieces, but would be a huge issue for length-5 pieces. I would argue that the game would be way too hard to play with length-5 pieces were they all produced with equal probability.

In true hacker fashion, we’ve turned a bug into a feature! Except for one issue: line pieces have relatively low rotational symmetry. When playing the game, I found that I would go ridiculously long periods of time without ever seeing a line piece. As any player of Tetris knows, they’re critical. Since they have such special meaning, it was necessary to implement some way to increase the probability they would appear.

The way Tetris decides what pieces to give you is actually pretty cool. It begins with a set of all 7 length-4 pieces, and shuffles them randomly. It then feeds you pieces from that set. As such, we expect a line piece to appear once out of every 7.

To mimic this behavior in Tetrocity, I had to implement a probabilistic model, since piece shapes are not stored in software. I made the Tetrimino-production class return random shapes with a certain probability, and straight-line pieces otherwise. This way, we can control the expected number of straight-line pieces fairly easily.

***

An interesting bug I ran into with my Tetrimino-production class TetriminoFactory had to do with Java’s Random class. We’re all guilty of mentally blanking the “pseudo” in “pseudorandom” from time to time, but after the several hours of grueling debugging that resulted from that mindset, I probably won’t make that mistake again.

The bug presented itself as pretty easy to fix. I noticed it when testing TetriminoFacotry’s random shape generation. Regardless of what range of lengths I would give it, it would always give me length-4 pieces. Okay, so the method probably isn’t receiving the correct range information, right? Nope, that was fine. I managed to narrow it down to single call to my Random object’s nextInt() method. Absolutely everything up to that point was working as intended, and everything after it was not. The nextInt() method would return the same number every time. Definitely not random, although still technically pseudorandom I guess.

I isolated the relevant code and began testing, and could not replicate the problem. It seemed like magic. Every relevant parameter was recreated, yet the bug only occurred when the code existed in the Tetrimino-production class. I even tracked down the source code and learned the underlying algorithm in an attempt to figure out why it was occurring. No luck.

Eventually desperation set in and I began to just randomly tweak parameters in some vain hope of something happening. Surprisingly enough, it worked! Apparently, the bug only occurs when the requested range is a power of 2. That was not something I replicated in my isolated code. In my original test code, the range of Tetrimino lengths was 3 to 5. 5 – 3 = 2 = 21.

Why is there such an arbitrary shortcoming in Java’s Random class? I have no idea!
To this day I have no clue why this happens. Regardless of the seed you give it, Java’s random object will return the same value if the range is a power of 2 (it actually begins exhibiting random behavior again after the 15th or so call). Weird.

***

Another interesting challenge was making the program user friendly. Anybody who makes software in industry is well aware of the importance of this, but Tetrocity was the first time I've ever really had to take that into account. 

Tetrocity isn't a particularly complex or deep game. It isn't World of Warcraft, where things like complicated UIs are a justified necessity. Tetrocity is a relatively simple, 2D game. Probably even a mini-game by today's standards. As such, it was very important to convey the new features of the game in as simple a fashion as possible. 

This was pretty easy for the most part, thanks to the success of Tetris. Most people seemed to be aware of the rules and such, so I had to do very little work in explaining that aspect of it. Controls were also borrowed from Tetris With Friends, which makes it intuitive for any Facebook-savvy players. 

The biggest challenge was introducing the ability system. It was certainly new to most players, yet it needed to remain somewhat mysterious in order to motivate people to continue playing. Apart from a single line on the "Controls" welcoming screen, there wasn't even a good way to introduce the concept of abilities without shoving it in the user's face.

Pictures proved to be the best way to get around this. I found a set of free vector icons available here, and sifted through them to find the ones that best described an ability I wanted to implement. By sticking huge icons right off the side of the main grid, I wanted the player to be able to quickly glance at the sidebar, and understand what the ability did. 

I'm not sure I entirely succeeded in that regard. I've found that new players generally don't even notice when a new ability is unlocked, let alone what it does. The main game is simply too intense; your eyes are completely locked on the grid. After a couple of tries people begin to notice the abilities, but I wanted people to know it was there immediately. 

This was partly what motivated me to add sound effects, the majority of which I found here. That way, I could add a little sound that would play every time an ability became available, so that the player wouldn't have to take their eyes off the screen. Again, this was less successful than I had hoped for the reason. Players are just too focused on the game to notice the sound. 

This seemed to work fairly well, but at the end of the day I think a "How to Play" screen is really the best solution. I haven't implemented it yet, nor am I sure if I will, but it's definitely something to keep in mind when your game is too fast paced for the player to really "figure things out". 

*** 

Another important factor in making the game 'easy to use' was the fluidity of the controls. This was actually pretty interesting, because it was the first time that the behavior of my methods became fairly "hazy". I began by developing a very strict definition of what it means to move and rotate a piece. If it was found that these transformations could not occur, they wouldn't. Period. 

From a "contract" point of view everything was fine, but what I found is that the controls became extremely rigid. For example, if a line piece were flush with the right edge of the board and you tried to rotate it, nothing would happen. Logically, that rotation would cause the line piece to protrude off of the board. In fact, the line piece didn't even have to be flush. There could be several spaces in between the line piece and the edge of the board and the rotation would still fail. 

It simply didn't feel right. Again, the game is intense. It simply isn't realistic to expect the player to be that meticulous about every single rotation. When a rotation like that fails, it's incredibly frustrating and confusing. This is was motivated me to begin coding my controls to be "error tolerant". In this example, I made it so that when a failed rotation is detected, the game will automatically try to shift the piece over to the left, and check again. This way, when we encounter that same rotation failure from before, the line piece will be "pushed" left, and rotation will succeed.

Of course, you have to be incredibly careful about just how error tolerant the controls are. How far should you try shifting if left before giving up? Does it have to be the edge of the board; what about another piece? What if it gets shifted into another piece? Once you work out all of the kinks, it feels much better.

***


These are some of the things I learned while making Tetrocity. At the end of the day, I’m extremely proud of how it turned out. It’s the most substantial amount of code I’ve written personally from scratch. The game runs smoothly and is fun as hell. I should go back to studying for finals now, but I’ll probably edit this post as more things come to mind. In the mean time feel free to give the game a go. My high score is about 300000; see if you can beat it!

Monday, September 30, 2013

How to Install Hadoop

EDIT: As of 10/11/2013 brew is not installing Hadoop correctly. Just ignore the brew parts and install Hadoop manually. 

This will be boring. I planned on making a post today about how to use Hadoop with all kinds of helpful tips and jokes and then spent the entire afternoon trying to get the stupid thing installed. There's a few tutorials floating around already, but they’re all out of date and somewhat incomplete. After tinkering around and taking bits from each one I finally got it working, so here it is. How to freakin’ install Hadoop.

I’ll be using a single-node, pseudo-distributed setup. This is the best configuration if you’re looking to learn how to use Hadoop. My machine is a MacBook Pro running OSX 10.8.5 with Java 1.6. Cluster users: may god have mercy on your soul. At the time of writing this, the latest stable version is 1.2.1, but I’ll just say <version>.

First things first: if you don’t use homebrew by now, this is the time to get it. Everything in this tutorial is applicable to a manual installation, but homebrew is great and you should use it. Go get it, I'll wait. You may run into some problems with permissions because of homebrew, but these are easily solvable with a quick Google search and will only have to be fixed once. Install Hadoop with the following terminal command:


If you don’t want to use homebrew, get the latest stable build here by navigating to the stable/ directory and downloading the hadoop-<version>.tar.gz file. Then, unpack it to any directory you like.

Before getting started with Hadoop itself, we need to make sure ssh is enabled, and that self-login (i.e. ssh-ing into your own machine) can be done without a password prompt. First, navigate to System Preferences -> Sharing and make sure Remote Login is checked. I also suggest changing “Allow access for:” to “Only these users:”, and then adding only yourself to the list. This is optional, but adds some security.

Next, we need to set up an ssh-key so we can authenticate without entering a password. Just type the following into your terminal:


Make sure that worked by entering:


It shouldn’t ask you for a password. 

Now we can start configuring Hadoop. There’s actually quite a bit of tweaking you can do, but we’ll keep it to the bare basics for now. First, navigate to your conf/hadoop-env.sh file and open it in a text editor (TextEdit works fine). If you installed with homebrew, the file is located at /usr/local/Cellar/hadoop/<version>/libexec/conf/hadoop-env.sh. It’s easiest to navigate to and open the file with the following commands:


We need to do two things with this file. First, we need to specify the location of your Java root directory for the JAVA_HOME variable. Finding the root directory is tricky, but after a bit of snooping around you should get it. I personally ended up setting this line to:


Make sure you get rid of the leading '#'. Second, add the following line to the bottom of the file to a fix a known bug:


We will now modify three more files in the same directory. Their content segments should be set as follows:

conf/core-site.xml:


conf/hdfs-site.xml:


conf/mapred-site.xml:


Let’s take a moment to reflect on the fact that the last localhost port is in fact, over 9000.

It’s smooth sailing from here. From the command-line (even you non-homebrew users), navigate to the directory that contains conf and bin to format your distributed-filesystem by typing:


There should be no errors in the output. Now, let’s test it to make sure it works. Start the hadoop daemons by typing:


Copy some input files to the distributed-filesystem:


Then execute an example run:


You should get output similar to the following:


You can examine the output by typing:


Which should output something like:


Finally, stop all running daemon processes by typing:



There you have it. If everything went without a hitch, Hadoop is now installed and configured on your machine. Stay tuned for next time when we look at MapReduce, what it should be used for, what it shouldn’t be used for, and how to use it.

UPDATE: I'm still working on the main Hadoop post, but I'm having a lot of issues getting everything ready. It seems like there isn't much, if any, support for the Eclipse plugin for Hadoop 1.2.1. I've spent several hours trying to get it set up, but haven't had any luck. It's technically possible to do MapReduce programming without IDE support, but much harder. I'll keep working on it when I have the time but I may have to just come back to it later. 

Monday, September 23, 2013

How to Reverse a String

The source code for this project can be accessed here.

Reversing a string is something that will almost definitely never have any kind of real life application. It’s one of those problems you’ll only ever encounter in an introductory programming course or an interview. Despite all of that, there are actually some pretty important lessons to be learned. So lets do it.

We’re going for speed here. The goal is to find the absolute fastest way to reverse a string. Like most projects on //TODO, we’re going to be doing it in Java. In the past the logic was pretty easily transferrable across languages, but in this case using Java and its String class will actually turn out to be a critical factor in our investigation.

The naïve implementation is always a good start, so let’s give it a shot:


That should be pretty straightforward, but we don’t really expect it to be fast. Here’s what my performance-measuring method looks like:


Here, we generate a string by randomly selecting characters from the lower-case alphabet 10000 times. Then, we record the time it takes to perform the naïve string-reverse by storing the system time before and after the call, and then printing the difference. Technically, it cannot be guaranteed that only the algorithm ran in between the two time checks, as various other processes are running concurrently. However, the variance in time is within a small enough margin of error to forgot about that. Adding cases to that method is pretty straightforward, so that’s the last you’ll explicitly see of that code.

We compile, run, and get this:


For reference, that’s about 42 milliseconds. Not bad, but the scaling is awful. Changing the length of the test string reveals that the algorithm appears to be of ~O(n2) complexity, where n is the length of the input string. While the details aren't incredibly important, it comes from the fact that each iteration actually results in the creation of a new result string, where the characters of the old string are copied over the new string before the new character is appended. Super inefficient. 

How can we improve this? Let’s try to get rid of that memory-allocation overhead by building the result as an array of characters. That way we’re just setting values in a fixed segment in memory, which should be pretty fast. Here’s our second attempt:


We update our test method, run, and check the output:


Damn. That’s a lot faster. That simple change improved the run time by two orders of magnitude. Not only that, but further testing confirms that the algorithm is of O(n) complexity, as we would expect. 

Now what? We could try to apply a fancy sorting algorithm like Mergesort, but these algorithms assume the ordering of the array to be sorted is unknown. In this case, we know it’s completely reversed, so sorting algorithms would actually take longer than a direct approach that exploits that fact. You can see an actual implementation of a Mergesort-like algorithm for reversing a string in the source code, but the output confirms that it’s a bad idea:


To get an idea of how much improvement we have to go, let’s run our code against that of the Java gods and see how we stand. The Java StringBuilder class provides a reverse() method, so we’ll run our best algorithm against that. We update the testing code and run it:


Nice! We’re actually pretty close to beating Java’s go-to string reverse algorithm with almost no effort. Let’s take a look at the source code to see how Java does it:


Stay calm. That looks like a lot of gibberish, so here’s the scoop:  Java encodes characters using UTF-16. This simply means that Java uses two bytes to map values between 0x0000 and 0xFFFF to characters. The problem is that once Unicode took over, Java needed a way to convert their UTF-16 encoding to Unicode, which provides a larger mapping range of 0x000000 to 0x10FFFF. They do this by using a pair of code-points called surrogate pairs (don’t worry about the details). The reversal algorithm used in Java swaps the string byte-by-byte, not character-by-character. Therefore, a lot of additional handling has to be provided for those surrogate pairs.

So what brilliant, insightful, esoteric reason does Java have for doing it this way? I have no idea. In fact, we can actually beat Java with two more, simple optimizations. First off, we didn’t successfully minimize the role of the String object in our improved algorithm: we’re still calling String#charAt on the input. The time overhead created by repeatedly calling that method is greater than if we simply converted the input string to a char array and dealt with it instead.

The second improvement is much more subtle. By pulling characters from both ends of the input string simultaneously, we can exploit the caching of values once we begin to pull characters near the middle of the string. Basically, pulling characters from one side of the string will eventually cause characters from the other side to be placed in the cache as well, which means less overall cache misses occur. Note that the magnitude of this improvement is highly dependent on the cache configuration of the machine running the algorithm.  

We implement these two optimizations into a single, final algorithm:


The result speaks for itself:


Congratulations, you’ve beaten Java at its own game. To fair, this difference is incredibly minute and would be completely negligable during practical use. I’m competitive though, so I count it as a victory.

This is by no means as fast as it gets. There is an obvious opportunity for parallel-processing optimization, and probably a few additional tweaks you could make to squeeze a few thousand nanoseconds out of it. Note that the Java StringBuffer class exists entirely to bypass the issues we saw earlier with Java's String class, but I think that hides the mechanics of what's going on. It uses the exact same techniques we saw here, but in practice using it is highly recommended. 

String reversal is an interesting case where the naïve implementation is actually much closer to optimal than you would expect. If you’re anything like me, the discovery that Java’s methods are occasionally imperfect prompted some serious self-reflection. Try not to let your faith in the Java gods waver too much, though. They’re a vengeful sort.