The confessions of a refactoraholic: July 2011

Monday, July 25, 2011

On getting it right the first time

As I program more and more I'm beginning to realize the following:

There is no such thing as getting it right the first time. In fact, this may even be a bad way of thinking about coding altogether. You can only approach the "correct" solution assymptotically with iterations. The concept of "right" may not even exist as such in the early stages of a project. Trying to "nail it" from the start by up front design, is a misunderstanding of the process. The best you can do is a good first prototype.

Some coders I know go by the principle of "code correctly now, so you have to change less later". At first glance it makes sense, but I would take it with a serious grain of salt. In fact, I would say it could be damaging to software development to think this way, since a philosophy like that leads to being afraid of getting it wrong, and then you run the risk of settling yourself in with a rigid up front design.

Instead I find it best to expect and embrace change. To me, the first iteration is all about getting the result as quickly as possible, on top of any underlying datastructure. The "proper" datastructure will become clearer as the results come in. Rather than being afraid of refactoring, I take it as an integral part of the process, and I expect to do it. It is a process, where the goal is to get it right in the end, rather than at the beginning, it's all about exploration and constant change. Contrary to popular belief, though, this change is not equally distributed with time. Eventually we need to converge to the final product, so the amount of change needs to be less and less as time goes on. If refactoring is taken as an integral part of the process, the datastructures you have at any given moment, reflect a better and better understanding of the system, and the vocabulary of the code converges on the vocabulary of the problem space.

Tuesday, July 12, 2011

Disentangling boolean logic: The problem

One of the most difficult things i've had to do in my carreer as a refactorer is to trying to clean up entangled boolean logic. The program has many states represented by booleans and depending on their values or a combination of their values, various pieces of logic execute. All is fine and dandy, unfortunately not all that logic executes correctly, in other words we have a bug. Being new to a system i often find myself fixing one such a bug, only to introduce another, and then when i think i fixed them both, 2 others appear, and so on. Now it's time to bring in the big guns and get a proper grip on the problem. Where are all these booleans coming from? Who is using them and how? Do some of these variables produce a butterfly effect, affecting seemingly unrelated pieces of code or even the air conditioning in your building? So the first approach is to try to get an overview of the use cases. What at least some organized programmers do is create a matrix of possible inputs to expected outputs (or expected function calls). This can be quite a challenge, since if your input is represented by n booleans, your possible set of input states is 2^n, and that's just one of the dimensions.

Luckily, usually the real use cases can be encoded much more concisely (see my other post on this subject), by reducing the set of booleans by 1 enum. Suppose after an arduous journey through boolean land, we have come up with a matrix of inputs to outputs for test cases, which has a total number of cells between 15-50. Are we there yet? Unfortunately, we have only started. The matrix that we have created represents the ideal state. Now all that's left is to figure out how what we have corresponds to what we need. And as Murphy's Law would have it, we don't have a 1:1 correspondence between the variables in the current program and the entries in the matrix.

So, as a pragmatic reader, you may ask, what does this post actually advocate? Well, ahem... It advocates a heavy-handed refactoring process that attempt to arrive at a quite complex solution space (the ideal reduced matrix of inputs to outputs, which may not even be that small) from an even more complex space (in practice a less ideal matrix of much larger dimensions). I have to say I do not have a nice and clean approach to this, but I do have a few tips & tricks that have helped me in the past to reduce the complexity of boolean logic, which I'll describe in the next post.

Resign patterns

http://fuzz-box.blogspot.com/2011/05/resign-patterns.html

Monday, July 11, 2011

Don Syme quote on parallel code

"If you just take non-deteministic execution of imperative multi-threaded code today, it is just such a nightmare to write that kind of code, that we HAVE to do something to make that kind of problem tractable"
See (http://www.youtube.com/watch?v=uyW4WZgwxJE, 33:40 for reference)

Monday, July 4, 2011

Just a thought on what it means to write good code.

The structure of the program needs to be such that it is easy to accomodate future requests.

The vocabulary of the code needs to reflect the vocabulary of the use cases. When somebody says, please implement feature X, there should be a way to "localize" that feature in the code. The closer the vocabulary of the use case to the code, the smaller the portion of the code that needs to change in order to accomodate the new use case.

Sunday, July 3, 2011

A nice article about immutability in C# and F#

http://strangelights.com/blog/archive/2010/04/18/1659.aspx

The confessions of a refactoraholic