The confessions of a refactoraholic: September 2010

Monday, September 27, 2010

C vs C++, The Ultimate Showdown

I've been having ongoing debates with my friend _waker_, about C vs. C++, and if they had a fight, who would kick whose ass?

I will make a disclaimer that I am very much a fan of C++ and object oriented programming, and my experience in C is limited to a few experimental programs in highschool, after which I said "no thanks". Nothing wrong with C, but C++ is just so much cooler, and would kick C's ass any time of day if they met on the street.

However, I have a lot of respect for waker and other C lovers out there, and a lot of the points they make are very valid, so let's duke it out right here, and pretend C has just met C++ on a street corner, and they both rolled up their sleeves for The Ultimate Showdown. I will attempt to summarize the arguments of the other side to the best of my ability.

Arguments for C (courtesy waker)	Counter-arguments for C++
C code is easier to read, and maintain	I guess this depends on the context, but I've certainly seen some nasty C code, especially involving unnecessary pointer arithmetic, unnecessary linked lists, problems due to most members being "public", too much communication via global variables, functions with too many arguments, etc.
C is closer to the lower level machine architecture and C code tends to be more optimal	True, but optimization is not often the primary factor in software development. This day and age, performance is an important factor only in some areas, but readability and maintainability have a much greater impact on major software projects. You have to balance the cost of making the project with the speed you get out of it.

Arguments for C++	Counter-arguments for C
There are a lot more large scale projects written in C++ than in C, and for good reason. Huge software bases are a lot easier to write and maintain in an object oriented language. There's fewer lines of code to write, and a lot more tools available to make modular design	There are in fact very large code bases maintained in C, especially in the open source community, such as the Linux kernel. Besides the choice of language in a company is often driven by considerations other than performance.
C++ code provides a lot of seemingly useful tools, such as virtual functions, templates, inheritance, data hiding	Most of the language features end up being misused, and the typical traps a C++ project will get itself into are far worse than those we can create with C. When the only things you have to worry about in C is functions and structs, life becomes a lot easier in some ways. There's a lot fewer knots your program can tie itself into in C, and when it does, they're easier to unravel.

Perhaps the most convincing and intuitively appealing analogy for the inferiority of object-oriented design is the following:

"C is like working with dough or clay and building a structure that's in constant flux, while C++ attempts to build the same buliding out of bricks of different size".

Back in the world of software, this means that C++ attempts to impose structure on parts of the program prematurely, because the parts in fact depend on the whole. We use classes and class hierarchies as building blocks, and then we find ourselves continuously tearing those building blocks down, as we find them unsuitable for the task at hand. C++ and other object oriented languages are thus based on the wrong assumption that we can build large flexible applications out of stable building blocks. What we end up discovering again and again is that the building blocks we originally constructed are not quite good enough for the task, and we need to go back and rebuild the foundation (refactor). In C, the building blocks (functions and structs) are much smaller, and thus it has a lot fewer problems being fitted to the task of maintaining large software projects, which are constantly in flux.

This, I believe is a very powerful point, and does underline the main issue of object oriented languages, and maintenance of large projects written in them. This perhaps is even the reason I started writing this blog, seeing the pitfalls of object oriented design, and the way object oriented language constructs are misused. However, I do believe that the tools provided by C++ are very powerful, and can do a lot more good if used right. As for the analogy of building the building out of clay, perhaps the art of refactoring is precisely in how to change the bricks together with the building, the bricks need to be continuously revisited as the shape of the building becomes clearer.

Sunday, September 26, 2010

How to create a new class

When refactoring small-scale (this typically means creating 1 new class), we have the following 3 main actors:

U: The user class (or classes)
N: The new class, which will contain some of the original class's functionality, and relieve it from bloat
O: The original class (out of which we want to extract some functionality and put it into N).

So here are the steps we want to take in order to get the process under way systematically:

Identify the original class O.
Decide which of the O's functionality will be moved into N

Often there will also need to be a reference from N to O, as we still need the information from O to execute some of the logic. This dependency can be removed at a later point, but in early stages of refactoring, it's almost always useful to have.
For simplicity consider making N a friend of O. This may help get things working initially. This may be good to remove later, when the two classes have become sufficiently decoupled.

Decide what data members N will have, and what is its constructor signature (perhaps it gets some info from O)

Create the new function declarations in N that correspond to O's functionality (possibly with different signatures). So O.foo() and O.bar(), become N.fu(), and N.baar()
Move over the definition of each function from O to N
Change the definition with respect to N, to fit the new function signatures, and the new data members of N
Find places where the code we used was called in the U.
Replace the calls to members of O with calls to members of N.
Remove the functionality from O
If you resorted to the "friends" solution, consider if N still needs to be a friend of O.
If you included O from N, consider if this dependency is still needed

I realize that this list is quite fuzzy, with some of the steps being far from mechanical. There's a bit of an art to the process of creating a new class, and I'm hoping to pin the process down further in my future posts.

Saturday, September 25, 2010

Signs you need to create a new class

One of your classes is getting too large, has too many members doing too many things
The function signatures of one of your classes are getting too large (3-4 parameters is about the limit, and typically you want to have 0-2 parameters).
There are subparts that need to be updated with time and yet they're handled from the main class (consider another class with an update loop)
You are trying to maintain subparts in parallel (e.g. 2 lists that should be in sync with each other)

Tuesday, September 21, 2010

Moving an existing class into a separate file

Large files are just as evil as large classes or large functions. And for all the same reasons. Inevitably a large file can become a black hole, just like a large class or function can be. More and more classes get piled into one file, until you have to scroll pages and pages to find the class you want. I realize with the right tools, what file a class or a function is in can be less important because you can just "go to definition", but when you attempt to make sense of more than just one class, unnecessary junk surrounding it can create more confuson.

In addition, mega-files have the tendency to confuse dependencies. For example if you pile ZApple, ZOrange, ZKiwi, and ZPassionFruit into the same file "fruit.h", every file that only uses oranges, needs to include fruit.h. This increases compile times. Plus the larger files tend to have more people working on them at the same time, creating merge conflicts.

One class per file solution tends to be the better one, unless the classes are very small.

As with splitting up a large function, there is a simple check list to follow (I'm assuming C++ here)

Find the class that we want to factor out (let's say it's called ZOrange, and the files it's defined in are fruit.h and fruit.cpp)
Create the files that will contain the newly factored out class (let's call them orange.h and orange.cpp)
Move the contents of ZOrange from fruit.h to orange.h
Move the function definitions of ZOrange from fruit.cpp to orange.cpp
Create the appropriate #ifndef / #define / #endif in fruit.h, to make sure it only gets compiled once
Add orange.h and orange.cpp to the project
Add orange.h and orange.cpp to source control
If needed, include orange.h from fruit.h (or better yet, forward declare ZOrange and include orange.h from fruit.cpp)
Add references to orange.h from all files that need it
Remove references to fruit.h in all places where we only need access to ZOrange (this you can only do if you have domain knowledge, so leave that step alone if you know nothing about the code, or want to save time).
Build the solution on all relevant platforms to make sure nothing is broken
Enjoy the fruits of your labor (and the reduced size of the fruit.h file)

Wednesday, September 15, 2010

Moving files

Moving files is an operation many programmers hesitate to do. It complicates matters when integrating code branches, often the source control program either doesn't handle renaming / moving very well, and even though it's not very "difficult", it is often more trouble than it's worth. In fact the process of moving / renaming files is very mechanical and boring, and the one thing that would help matters is a clear check list of what not to forget. So here's one I made.

Copy files from source to destination on disk
Add new files to the solution / project
Delete old files from the solution / project
Remove old files from source control
Add new files to source control
Replace existing references - files that weren't moved should refer to the right path on disk / in the solution
Delete old files from disk (if the source control doesn't already do that for you)
Submit to source control
Make sure you didn't break the build
Sleep with a clean conscience the next night

It may be that your source control software is smart enough that you can skip some of these steps, but overall this is what I have to do with Perforce and Visual Studio.

The confessions of a refactoraholic