I've had the pleasure (or the misfortune) of refactoring some large chunks of code, with the following characteristics:
- It has at least one very large function (300-900 lines)
- It has no clear owner, as at least 5 people have been authors on different parts of the code, and 2-3 of them already left the company
- This code has gained the reputation, that nobody knows what it's really doing
- There are plans to do some cleanup, but nobody knows when or who is gonna do it, and nobody dares to take the responsibility. This project is always on people's radar, but never quite higher on the priority list than everything else they gotta do
- Besides, respectable programmers got better things to do than to refactor messy obscure code, right? Time to call in the Refactorinator. MWAHAHAHAHA
Here's the process backwards engineered from how I typically do it
- Identify the huge functions
- Identify major logic blocks within those functions
- Typically these will be outer while or for loops (inner loops might just be too hard to tackle right away, if they have dependencies on variables computed in the outer loops)
- Factor the identified chunks of code out into their own function
- Visual Studio has "Extract Method", which does an OK job of automating this process
- Typically there will be too many parameters in such a function, consider if you can
- Make some of the parameters members of the class
- Recompute the same values locally inside the new function (some of these computations will be cheap, like an indexed array lookup, and it pays off to reduce the size of signature for the new function).
- Make it clear what the input and the output of each function is.
- What values are being consumed, what values are being modified?
- const and static functions are preferrable, as they don't have side effects
- If it's not possible to make the function const or static, try to see if you can make some of the parameters const. That is usually possible, especially as the newly created function is smaller in scope and will modify less of the data
- Factor out a few functions this way if possible
- Are there any patterns emerging?
- Are there certain groups of parameters that are getting passed around a lot?
- If so, form a struct of these common parameters, which should further simplify the function signatures for the new functions
- Are there common operations on these groups of parameters?
- In this case, we might start adding functions to the struct
- Is this group of paramters sufficiently isolated that they are no longer needed for individual access from other datastructures?
- Then we have the birth of a new class.
- Resolve overlapping or duplicate data
- This is the most annoying part of the process. It very often happens that the datastructures floating around in the super-large function / class are similar (but not quite the same) or contain duplicate elements.
- For example there can be two structures passed around, which are partially different, and partially they represent the same data, and then that data is copied back and forth to keep in sync.
- Try and see if you can disentangle this knot by using the member of only one of these overlapping structs / classes. That way you can avoid 3 bad things:
- needing to keep them in sync (simplifying logic),
- copying them back and forth (saving CPU cycles and reducing code size)
- stroring of duplicate data (saving memory)
- This is often far from trivial, and might require some domain knowledge, or actually sitting down and understanding the code
- Hopefully after the massive code-shoveling, your understanding is much improved, or it could be time to check with one of the owners
- Be prepared however, that the (partial/ former) owner is also confused, doesn't fully understand the system, or doesn't recognize the code after you've messed with it
- Regardless, there could be some useful information that pops up as a result of this communication
- I will probably elaborate on this last step in another post (this one was probably a handful already). I don't think I have the process of resolving overlapping data nailed just yet, but I believe with more experience, I will arrive at a more solid solution.