The confessions of a refactoraholic: March 2011

I've had the pleasure (or the misfortune) of refactoring some large chunks of code, with the following characteristics:

It has at least one very large function (300-900 lines)
It has no clear owner, as at least 5 people have been authors on different parts of the code, and 2-3 of them already left the company
This code has gained the reputation, that nobody knows what it's really doing
There are plans to do some cleanup, but nobody knows when or who is gonna do it, and nobody dares to take the responsibility. This project is always on people's radar, but never quite higher on the priority list than everything else they gotta do
Besides, respectable programmers got better things to do than to refactor messy obscure code, right? Time to call in the Refactorinator. MWAHAHAHAHA

Here's the process backwards engineered from how I typically do it

Typically these will be outer while or for loops (inner loops might just be too hard to tackle right away, if they have dependencies on variables computed in the outer loops)
Factor the identified chunks of code out into their own function

Visual Studio has "Extract Method", which does an OK job of automating this process
Typically there will be too many parameters in such a function, consider if you can

Make some of the parameters members of the class
Recompute the same values locally inside the new function (some of these computations will be cheap, like an indexed array lookup, and it pays off to reduce the size of signature for the new function).

What values are being consumed, what values are being modified?
const and static functions are preferrable, as they don't have side effects
If it's not possible to make the function const or static, try to see if you can make some of the parameters const. That is usually possible, especially as the newly created function is smaller in scope and will modify less of the data

If so, form a struct of these common parameters, which should further simplify the function signatures for the new functions

Is this group of paramters sufficiently isolated that they are no longer needed for individual access from other datastructures?

This is the most annoying part of the process. It very often happens that the datastructures floating around in the super-large function / class are similar (but not quite the same) or contain duplicate elements.
For example there can be two structures passed around, which are partially different, and partially they represent the same data, and then that data is copied back and forth to keep in sync.
Try and see if you can disentangle this knot by using the member of only one of these overlapping structs / classes. That way you can avoid 3 bad things:

This is often far from trivial, and might require some domain knowledge, or actually sitting down and understanding the code

Hopefully after the massive code-shoveling, your understanding is much improved, or it could be time to check with one of the owners
Be prepared however, that the (partial/ former) owner is also confused, doesn't fully understand the system, or doesn't recognize the code after you've messed with it
Regardless, there could be some useful information that pops up as a result of this communication

I will probably elaborate on this last step in another post (this one was probably a handful already). I don't think I have the process of resolving overlapping data nailed just yet, but I believe with more experience, I will arrive at a more solid solution.

The confessions of a refactoraholic