The confessions of a refactoraholic

Sunday, April 17, 2011

K.I.S.S. part II

This post is about keeping it simple on the solution side of things:

Now it's easy for me to say "don't make shit complicated". I mean some shit is complicated by nature, and the only thing that's left to do is get up to speed on the details until it becomes clear. In addition, due to the exploratory nature of software development, often shit only becomes simple after you've done it the complicated way. Our understanding of the problem may grow as time goes on, and based on the improved understanding, the code may start looking clearer, and typically more compact.

I think it was Mel Gibson who once said about acting "it takes a lot of work to make it look effortless". As a refactoraholic, I can testify to the fact that "it takes a lot of work to make code look easy". A person looking at simply written code might be tempted to say "oh yeah, I could have thought of it myself". Just like a good textbook that does a good job of explaining something, might get you quicker to the eureka moment, so will compact, loosely coupled code, with good comments and variable names that make sense get you to find logical errors and bugs, as well as make adding new features faster.

After refactoring I often stumble on some "obvious" logical errors in the code. It can look like a dumb mistake in the logic, but it only became obvious after 2-3 rounds of refactoring. Previously this mistake may have been burried so deep, it would have been much more difficult to "see".

K.I.S.S.

As I look at the experiences I've had handling difficult code written by other people, some patterns seem to emerge. One of the most persistent patterns that seem to be causing most of the software maintenance issues I've encountered is people making shit too fucking complicated! And by that I mean people not following the principle "Keep it simple, stupid" (aka, K.I.S.S.).

There is one way of making shit complicated that I am attacking and that is taking care of unnecessary use cases. In my opinion, the code that is most difficult to maintain is code, which has a lot of what-ifs in it, a lot of just in case handling. This coding strategy is the root of all evil, and together with premature optimization forms the two black pillars of death. So here I'm going to try to make a distinction between:
(1) unnecessary code to solve a necessary problem, and
(2) code to solve an unnecessary problem (the code in this case is also, by definition unnecessary).

The first case is about proposing a solution that's unnecessarily difficult, the second case is about proposing a solution that's not necessary at all. The first case is about overcomplicating the solution space, while the second is about overcomplicating the problem space.

What if the hero in our game is hanging down from a helicopter, and a bullet enters his left eye? Should we simulate the eye movement on the right eye as the bullet gets closer? What if one of his eyes gets streamed out in the process? What if there's an eye-patch on the left eye? Should we simulate it with cloth physics?

OK OK, stop right there. How about let's consider if this combination of circumstances needs to be handled at all?!

Life of a coder is difficult enough as it is, we shouldn't have to solve imaginary, or potential problems before solving real ones. One of the most obvious signs that code is solving an imaginary problem, is large amounts of boiler-plate code, a lot of methods and files that seem to be doing very little, as if the original author of the code had something grander in mind, and put in some code just in case. What's makes matters worse is that the code for solving a non-existing problem is living right alongside the code that solves an actual problem. Then, the reader of the code in question has to not only understand how this code solves an actual problem, but also what other than the actual problem is this code trying to solve?

Continued in this post: http://refactoraholic.blogspot.com/2011/04/kiss-part-ii.html

Tuesday, March 22, 2011

How to refactor huge chunks of unfamiliar code

I've had the pleasure (or the misfortune) of refactoring some large chunks of code, with the following characteristics:

It has at least one very large function (300-900 lines)
It has no clear owner, as at least 5 people have been authors on different parts of the code, and 2-3 of them already left the company
This code has gained the reputation, that nobody knows what it's really doing
There are plans to do some cleanup, but nobody knows when or who is gonna do it, and nobody dares to take the responsibility. This project is always on people's radar, but never quite higher on the priority list than everything else they gotta do
Besides, respectable programmers got better things to do than to refactor messy obscure code, right? Time to call in the Refactorinator. MWAHAHAHAHA

Here's the process backwards engineered from how I typically do it

Identify the huge functions
Identify major logic blocks within those functions

Typically these will be outer while or for loops (inner loops might just be too hard to tackle right away, if they have dependencies on variables computed in the outer loops)
Factor the identified chunks of code out into their own function

Visual Studio has "Extract Method", which does an OK job of automating this process
Typically there will be too many parameters in such a function, consider if you can

Make some of the parameters members of the class
Recompute the same values locally inside the new function (some of these computations will be cheap, like an indexed array lookup, and it pays off to reduce the size of signature for the new function).

Make it clear what the input and the output of each function is.

What values are being consumed, what values are being modified?
const and static functions are preferrable, as they don't have side effects
If it's not possible to make the function const or static, try to see if you can make some of the parameters const. That is usually possible, especially as the newly created function is smaller in scope and will modify less of the data

Factor out a few functions this way if possible

Are there any patterns emerging?
Are there certain groups of parameters that are getting passed around a lot?

If so, form a struct of these common parameters, which should further simplify the function signatures for the new functions

Are there common operations on these groups of parameters?

In this case, we might start adding functions to the struct

Is this group of paramters sufficiently isolated that they are no longer needed for individual access from other datastructures?

Then we have the birth of a new class.

Resolve overlapping or duplicate data

This is the most annoying part of the process. It very often happens that the datastructures floating around in the super-large function / class are similar (but not quite the same) or contain duplicate elements.
For example there can be two structures passed around, which are partially different, and partially they represent the same data, and then that data is copied back and forth to keep in sync.
Try and see if you can disentangle this knot by using the member of only one of these overlapping structs / classes. That way you can avoid 3 bad things:

needing to keep them in sync (simplifying logic),
copying them back and forth (saving CPU cycles and reducing code size)
stroring of duplicate data (saving memory)

This is often far from trivial, and might require some domain knowledge, or actually sitting down and understanding the code

Hopefully after the massive code-shoveling, your understanding is much improved, or it could be time to check with one of the owners
Be prepared however, that the (partial/ former) owner is also confused, doesn't fully understand the system, or doesn't recognize the code after you've messed with it
Regardless, there could be some useful information that pops up as a result of this communication

I will probably elaborate on this last step in another post (this one was probably a handful already). I don't think I have the process of resolving overlapping data nailed just yet, but I believe with more experience, I will arrive at a more solid solution.

Thursday, March 3, 2011

F# for beginners?

I think not:

"The mapfirst function takes a function as the first argument and applies it to the first element of a tuple that's passed as the second argument"

Try explaining that to someone new to programming...

Saturday, February 26, 2011

The Psychology of Incompetence

Tuesday, February 15, 2011

first experiences with F#

I've looked at F# before and i liked the idea, but recently i started to actually write in it, and here's my experience with F# so far:

Debugging is quite different. While in C# / C++ i would put a breakpoint in the faulty code, here in F# i typically use the F# interactive window to call one function at a time. This is convenient because you can build up your program from the bottom up, one subroutine at a time.

Indentation is used for scope. While it helps make code a bit shorter, i have found this to be a total pain in the ass. The indentation-sensitive parser reports all sorts of undescriptive error messages. One space in, and you're doing a nested declaration, one space back and you're declaring a member outside class scope. This has so far caused me major grief. I hope i can get used to this eventually. Perhaps there is a tool out there which helps clarify which scope F# THINKS your function is in.

Programs are shorter because of:

Frequent use of inline functions and nice syntax sugar for closures.
Indentation instead of braces for scope
Operations on lists and tuples are built in parts of the language, ensuring very compact code when dealing with lists. Iteration, accumulation, and selection are very natural in F#.

Published with Blogger-droid v1.6.6

Saturday, February 12, 2011

Refactoring and skiing

I had the good fortune to do both refactoring and skiing this week, and noticed some similarities.

When on a tough slope, a beginner skier has to be extra careful to ensure that at no time is his speed out of control. As he does that, he pays a price in energy. If every turn he makes results in a complete stop, his legs have to do the extra work. A good skiier is both faster AND uses less energy, as the turns he makes don't lose as much momentum.

Refactoring can also be done slowly & carefully, and the price you pay for safety is speed. Refactoring in smaller steps is safer, but in order to ensure safety, you have to put up all sorts of scaffolding to make the intermediate steps work.
This scaffolding (intermediate datastructures and functions) is what's needed to make the new code and the old code work together, and will later need to be discarded as old code is eliminated.

An expert refactorer would be able to skip the intermediate steps and make the final version quicker. But working that way is of course dangerous business, and could cause more crashes along the way (pun intended).