Saturday 6 March 2010

Efficient Algorithms and Abstract Data Types

These are the things I've learned the most about today. I've read through roughly a quarter of my new book so far and I'm liking what it's teaching me. It started with algorithms.

Algorithms are basically just set methods to achieve some sort of goal. A majorly important aspect of them is that they're broken down into pieces, and set up in such a way that a machine (computer, mechanical, or whatever) can perform them step-by-step. You increase an algorithm's efficiency by reducing the number of "characteristic operations" done to complete the task.

Consider, as a metaphor, rabbits. You start off with two rabbits. Each set of two rabbits produces two offspring, and so on and so on. (Note: this is, indeed, a vague shroud covering the exponential equations used as a model in the book, and would be plagiarism if I didn't acknowledge the fact). Presuming we're not considering bunny lifespans in our scenario, the number of rabbits around at generation 1 is 2, at 2 it's 4, at 3 it's 8, 4 it's 16, and so on where n = 2^n rabbits. If you're in generation 1, it only takes n = 2 to calculate the answer. At generation 2, n = 2*2 for anything to determine the proper value, which is one characteristic operation (multiplication is done one time).  At three it's n = 2*2*2 (two characteristic operations), and so on to infinity.

In other words, it takes (n-1) characteristic calculations to determine the number of bunnies we've got running around our backyard.

Instead, you can consider the fact that for n = 1, it's simply 1*2 bunnies. For n = 2, it's 2*2 bunnies. For n = 3, it's (2*2)*2. This looks sloppy. Instead take n to be 100. For the first case, it takes 99 multiplications to get the proper answer. For the second, however, n = (2^50) * (2^50), which can also be broken down to simplify. When programmed properly (the value of 2^25 gets stored in a variable, then multiplied by itself to get the value of 2^50 which is stored into that variable, for example), this then only takes 6 iterations to calculate the number of bunnies at generation 100. And yes, it's possible to arrange this to apply to any value of n. And for any base (not just 2), for that matter.

It's really an amazing improvement, and in terms of code (the examples are provided in the book, I won't copy them here) you're looking at 1, maybe 2 extra lines to achieve that speed increase. Eventually I'll look over the equations I used for the MATLAB ISA Calculator assignment and try to figure out a quicker algorithm for that work. I'm sure something exists.

So, what are Abstract Data Types then? These are simply user-defined (where user- means programmer-) classes that serve to provide a structure and appropriate manipulation tools for the data stored within them. Technically pretty much every type of data in Java (or most any programming language) is an ADT, though most are defined by the language. Generally an ADT fulfills a specific role known as a contract, which states explicitly what the inputs to the class will be, what kinds of manipulation and output one can expect to get from the class, and what data must be stored within it. It's then the programmer's role to decide how to accomplish those goals most efficiently, without methods that are uncalled for (unless they serve to ease the processes that are required). I've been aware of the basic concept for a long time now, but the specifics of planning the type and implementing it as code is becoming more and more clear as time progresses.

I expect a lot of progress this weekend, as well as documentation of that progress.Wish me luck!

No comments: