The confessions of a refactoraholic: October 2010

Sunday, October 31, 2010

Code quality and communication

I find it interesting how much communication patterns and personalities influence code quality.

If a programmer thinks he's too cool for school, sooner or later he'll end up making changes to other people's code without asking, often with undesirable side effects
If a programmer is afraid to ask about how to do this or that, lest he's perceived as someone who is weak, he ends up creating duplicate code, reinventing the wheel, when a lot of people reinvent the wheel in different places, maintenance of large code bases can become problematic. This can also be a major waste of time, when instead of spending minutes to hours asking about an existing way to do X or Y, which may already exist, someone might spend a day or two (or god forbid even more), doing Z.
If everyone is afraid to take responsibility for a subsystem, that subsystem deteriorates. Each programmer sees the mess but doesn't want to / doesn't know how to clean it up, and ends up creating more of a mess. It's important that a software subsystem has an owner / a goto guy, so that there is an authority on how to do things right, and what wheels not to reinvent.

Unfortunately communication has an overhead.

It takes time to get off your ass to go to a colleagues desk.
An expert / owner of a subsystem may be too busy to help
Even though the subsystem expert may have a solution, it may require some wrap-up work on his side to make it fit your requirements. In the end it's a trade off between (for example) 2 hours on your side + 2 hours of his, versus 2-3 days if you do it alone.

Overprotective system owners have a detremental effect on code quality. If you are afraid to touch the expert's code, or ask them for help, you will end up doing the same thing twice.

Helpful and friendly system owners, on the other hand, can very much help reduce the overall maintenance cost, by making sure other people use their system properly, don't reinvent the wheel. They will also try to accomodate their subsystem to new use cases from others. It takes a little up front effort, but is worth it at the end.

Saturday, October 30, 2010

A manager for heterogeneous data (a refactoring puzzle)

Currently I'm in the process creating a software subsystem, which is responsible for collecting data from multiple places, and then notifying some listener-type entities when that data has been received.

It looks something like this.

Pusher classes => __raw data__ => DataManager => __processed data__ => listener classes

So far so good. The left half of this graph seems to be easy to structure, as the DataManager simply provides a few public functions to push different kinds of data, and those are the calls made from the pusher classes. Even though those functions have different signatures for different types of data, there's only one place of access, namely the manager itself.

The right side of the graph is a bit more problematic. That's because every listener listens to different types of data. Thus the function signatures of the listeners have to reflect the signatures of the corresponding DataManager functions, creating a dependency, and a maintenance nightmare.

So my code currently looks like this.

ZDataManager
{
   //heterogeneous listener registration / unregistration
   void RegisterFruitListener(ZFruitListener*);
   void RegisterVegetableListener(ZVegetableListener*);
   void RegisterBerryListener(ZBerryListener*);
   //...plus corresponding unregistration functions

   //heterogeneous receiver functions for receiving different types of data
   void ReceiveNewFruit(ETexture, EColor, ETaste);
   void ReceiveNewVegetable(string sVeggieName, float fVeggieWeight);
   void ReceiveNewBerry(float fYumminessFactor, float fVitaminsPerGram);

   //multiple arrays of heterogeneous listeners
   list<ZFruitListener*> m_FruitListeners;
   list<ZVegetableListener*> m_VegetableListeners;
   list<ZBerryListener*> m_BerryListeners;
}

void ZDataManager::ReceiveNewFruit(ETexture eText, EColor eCol, ETaste eTaste)
{
   list<ZFruitListener*>::enumerator en = m_FruitListeners.GetEnumerator();
   while (en.MoveNext())
   {
   en.Current()->NewFruitReceived(eText, eCol, eTaste);
   }
}

//...here we have almost identical definitions for ReceiveNewVegetable and ReceiveNewBerry, but with different function signatures!!

Then we have the listener classes, doing the following:

ZFruitListener::Init() {
  GetDataManager()->RegisterFruitListener(this);
}

ZFruitListener::~ZFruitListener() {
  GetDataManager().UnregisterFruitListener(this);
}

ZFruitListener::NewFruitReceived(ETexture, EColor, ETaste)
{
   //process new fruit
}

//same idea for ZVegetableListener / ZFruitListener Init, destructor, and receiving functions.

As you can see here, I'm writing a lot of duplicate code, maintaining similar lists, registering classes in a similar way, and multicasting to different functions, but in a similar way. The main stumbling block here is the fact that the function signatures are different, making a more smooth solution elusive.

Tuesday, October 26, 2010

Automation of refactoring

I'm really missing automation in my refactoring. A lot of what I do on a day to day basis, is robotic code moving that has no impact on functionality (or it should not if you don't make human mistakes). A nice tool for doing basic operations, such as moving functions from one class to another, changing a parameter list, extracting a function from a code chunk, etc. would do so much good!

Many refactoring operations aren't difficult, but even for a refactoraholic like me they can be

Robotic
Repetitive
A pain in the ass

The number of primitive actions can be quite large (up to 10-12)

Bug-prone,

It's easy to make a mistake in the long list of primitive actions, and then your code has unnecessary bugs, which were introduced out of good intention

I use Visual Studio 2008, and luckily I can see that Visual Studio 2010 has a lot of the things I've been looking for. A lot of the refactoring operations have in fact been automated, and I'll probably write about that in my future posts.

Sunday, October 24, 2010

Refactoring pitfalls

Here are some things I found that go wrong during refactoring, or reasons why refactoring is not as easy as it could be:

If you have a large enough project, iteration times tend to be long

Compilation times are long
Startup / loading times are long
Running tests takes a long time

Refactoring in small steps is difficult, since you either

End up spending a long time retesting each iteration, or
Try to squeeze in several rounds of refactoring in one sweep, which decreases turnaround time, but increases risk and testing time.

The number of test cases is too large and not all tests are (even in principle) automatable.

Corner cases are not always obvious and the effect your refactoring will have is not always clear

The combination of long iteration times and large number of test cases can make refactoring code very costly (and believe me I've paid the price). You end up doing it in large sweeps, while adding new features at the same time, after which everything is broken and you need to do a whole suite of tests needs to be designed (or redesigned). I've had refactorings that took between 1 week to 1 month, and I regretted starting them in the first place. I was happy about the end result, and probably in the long run it still paid off, but the time and the energy it took to get the system back on track was enormous, and I'm still looking for better ways. Suggestions are welcome.

Saturday, October 23, 2010

800+ lines of code in 1 function

Yes, it's true, and I just saw it with my own eyes on a real project.

This is why I do refactoring.

Some nice quotes from "Refactoring"

Some nice quotes I found in the book (will update as I'm reading).

Any fool can write code that a computer can understand. Good programmers write code that humans can understand. (p.15)

When I look at unfamiliar code, I have to try to understand what it does. I look at a couple of lines and say to myself, oh yes, that's what this bit of code is doing. With refactoring I don't stop at the mental note. I actually change the code to better reflect my understanding, and then I test that understanding by rerunning the code to see if it still works. (p. 54)

I strongly believe that a good design is essential for rapid software development... Without a good design, you can progress quickly for aa while, but soon the poor design starts to slow you down. You spend time finding and fixing bugs instead of adding new function. Changes take longer as you try to understand the system and find the duplicate code. New features need more coding as you patch over a patch that patches a patch on the original code base (p.57)

"Refactoring: Improving the Design of Existing Code" by Martin Fowler

So I started reading THE refactoring book. Admittedly I should have picked it up a long time ago, but now that I'm writing this blog, I have no excuse.

I'm on Chapter 1, and so far the book has met my expectations. It has confirmed the value of most of the techniques I've acquired through experience, and the principles behind them.

Here are some of the highlights in the very illustrative example that Fowler uses in Chapter 1.

Move code into a separate function

And make sure that the signature has the correct types

Rename variables to what makes sense
Functions that no longer reference the class that they're in, are good to move to another class
Function signatures can typically be reduced when we moved functions to the classes that are more relevant to them
After the function has been moved, we can either forward all calls to it from the original class, or replace all references to the original class's function to the new class's function
Getting rid of temporary variables

This is something I don't do as much because typically temp variables are easier to debug than function calls.
Fowler uses the example: result += "\t" + ... + each.getCharge() + "\n"
I would very rarely want to place a direct function call inside a string concatenation like this for 3 reasons

It's easier to debug the value if it's on a separate line
The return value of the function might be needed again
The string concatenation code becomes shorter easier to read, if it's not infested with multiple function calls (and god forbid those functions also have side effects, but that's another story altogether)

Replacing switch / case statements with polymorphism

I have done this before, but I find the overhead to be prohibitive most of the time
This may be worth while if the switch / case statement has large chunks of code of 10+ lines for each major case, but if it's a 2 liner for each case, then both readability and performance might be better off with the switch / case, and the overhead of going to the polymorphic solution is not worth your while.

Fowler also makes good use of UML class and sequence diagrams to illustrate the effect of each step of his refactoring. So far so good.

Tuesday, October 19, 2010

Refactoring pet peeves: Part II (Internal enums)

I dedicate this special rant to internal enums as this one I see a lot in practice. Here's a typical example:

ZActor

{

public:

//huge chunk of declarations

enum EActorState

{

ALERT,

BORED,

HAPPY,

}

//more declarations here

enum EActorType

{

MAN

WOMAN

SNAKE

}

//more declarations

}

ZDirector

{

public:

void DirectManWomanScene()

{

ZActor::EActorType eFirstActor = m_aActors[0];

ZActor::EActorState eFirstActorState = m_aActorState[0];

switch (eFirstActor])

{

case ZActor::MAN:

if (eFirstActorState == ZActor::BORED)

//do man stuff

case ZActor::WOMAN

if (eFirstActorState == ZActor::ALERT)

//prevent man from doing man stuff

}

private:

ZActor::EActorType m_aActors[3];

ZActor::EActorState m_aActorStates[3];

}

And so it goes on. You will notice the annoying overuse of ZActor:: qualifier, and the cluttering of ZActor declaration with the enum, which should serve as a warning signal to anyone making or reading this type of code.
The problem with internal enums is that typically they are so small, that people don't go thru the hassle to create a separate file for them or take them outside the class. In practice, however, enums tend to be small enough and conceptually clean that they're convenient to pass around and reuse between different classes.
In my experience, soon after an enum is created, it's often needed in another class. Internal enums are also a bad idea esthetically, because they clutter up the class declaration.
I've seen this again and again, where an internal enum constantly gets referenced from other classes and files, and create an absolutely unncessary dependency between the user class and the class that contains the enum.
If your enum has a tendency to change a lot (as some enums do), then you'll find yourself recompiling unnecessary files every time you change the enum. If the enum is kept in a separate file, only the files that have the dependency on the enum will need to be recompiled.
As a rule of thumb, I always move enums outside the class, and better yet into their own file. It is not necessary to have a separate file for each enum, but it's nice to have a separate file for a collection of enums related to the same concepts.
Of course when the enums are outside a class, we need to make sure there are no conflicts, and it's important to use an approrpriate naming convention. Otherwise we get a lot of eUndefined, eUnlisted, eUnknown, eItemCount, eInvalid, eDefault from all over the project colliding.

Refactoring pet peeves

Here's some stuff that irritates me in code

Class implementation (.cpp file), which is more than 10-15 pages

Have you created a monster class which does everything for everybody? Does it talk to the customer and is the customer at the same time? Does it clean the house and make a mess at the same time? Does it put the cabbage the goat and the wolf in the same boat, and hope they don't eat each other if you leave them alone for a second (see the wolf-goat-cabbage problem). Or does it hope that this fragile boat will actually survive the weight of all 3 animals and the peasant at the same time?

Class declarations, which take more than 2-3 pages.

Same as above

Functions that take up more than 1-2 full screens

Very large functions become hard to read often involve unnecessarily complex interactions among locally declared variables. The longer your function goes on, the more variables you can declare within its scope, and the less clear your dependencies can become.

Function signatures that contains more than 4-5 parameters

OK, what is this function actually doing? Cmon let's get real, you don't really have a clue. At least consider creating a struct and passing that in instead.
If 4-5 of a class's member functions accept the same parameter, consider making it a data member of the class

Internal enums

This is a special rant in and of it own

Internal classes

An internal class is is not your friend for the same reasons as an internal enum, except worse, as an internal class has a tendency to grow inside the belly of the outer class, making it even more likely than usual that the outer class will become a monster class.

Wednesday, October 13, 2010

Stages of work

Here's an approach that I found works for me, when undertaking a new feature, or rather one of its iterations

Prototype and figure out how to do the main parts

Investigate what goes where, what datastructs are useful
Concentrate on figuring out the parts which are technically hardest, things that require domain knowledge, interaction with unfamiliar code, etc.
See some results and get inspired for further actions
Don’t get stuck on how to do it properly, just get the result!
At the end of this stage, at least one set of inputs will yield the "correct" prototype results

Design the iteration

Settle on the proper way to do tackle the task
Setup the framework for taking care of the use cases that come up
Implement the code skeleton - classes, function declarations, calls to (possibly empty) functions
At the end of this stage, the original test case will still work, and the basic flow of control for the program will be established

Do it properly for the main case

Fill in the meat on the skeleton set up in the previous stage. This is sort of the "main part", that may take the longest, but with a proper skeleton code and the main risk areas handled, this could be much easier.
At the end of this stage, and most types of legal input will result in correct output

Wrap-up

Go thru risky areas, add error checking, and get ready to release to the users (or testers)
At the end of this stage, the iteration can be considered complete, legal input will result in correct output, and illegal input will be handled gracefully

The confessions of a refactoraholic