Sunday, November 28, 2010

Refactoring and performance

"The interesting thing about performance is that if you analyze most programs, you find that they waste most of their time in a small fraction of the code. If you optimize all the code equally, you end up with 90% of the optimizations wasted, because you are optimizing code that isn't run much. The time spent making the program fast, the time lost because of lack of clarity is all wasted time ... [When] the code is clearer, you have a better understanding of your options and what kind of tuning will work" (Refactoring, p.70)

I think Martin Fowler has a very good point here, which I have known for a while, but couldn't quite articulate as well. Writing performance-optimized code comes at a cost, that cost is typically clarity, maintenance and ultimately time. Another quote comes to mind:

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified” - Donald Knuth

Very often the most unexpected things may end up being performance bottlenecks. Someone may spend days, weeks, or even months, trying to squeeze out every inch of performance out of a piece of code, and in the end it may not matter at all, as you discover it's something else (the part of code that doesn't even look like it needs optimizing) that creates 90% of the performance loss.

What's worse, the prematurely optimized code can end up being re-written altogether, because the user requirements have changed, and then the hours spent painstakingly squeezing out miliseconds of CPU time are wasted 100%!

Wednesday, November 24, 2010

Flavors of optimization

I was thinking the other day about optimization. What does it mean to optimize code? Well it depends. What are you optimizing with respect to? It seems like everyone got their favorite type of optimization.

You can:
  • Optimize for the long-term speed of feature development and bug fixes
  • Optimize for performance (this is what people think of as optimization by default). 
  • Optimize for minimum time spent doing any work, which means
    • Offloading onto others
    • Making small features look more important
    • Making useless features look useful
    • Shoving off bugs for later (hopefully forever) or to others
  • Minimize the amount of wasted work
    • I don't want to do something now, which has to be redone later
    • I don't want to make a temporary fix now, because there might be a proper fix later
  • Maximize for maximum features per day
    • Short term speed of development above all
    • No time for design, refactoring, or performance optimizations 
    • Focus on appearance of progress, and the total volume of code produced, regardless of correctness, extensibility, and readability
    • Bug fixing is considered a completely separate process, which essentially gets in the way of progress, testers are annoying
    • Documentation is considered a nuisance
    • Other programmers who point out bugs are also annoying
  • Maximize for prettiness over functionality
What's funny is that in a large project, every one of these optimizations will be present, resulting in a mixture of crappy code, pretty code, buggy code, easily maintainable code, and performance-optimized code, all mish-mashed with parts trying to interact with each other. 

Thursday, November 18, 2010

A shiny new class

Having a new class in the right place is like having a new employee with the right skill set. As soon as you create the class, you begin to see things that were in chaos falling into place, things that didn't have an owner, having an owner, and things that you thought were difficult, now becoming easy. Hurray to new classes that make sense!

Friday, November 12, 2010

A manager for heterogeneous data (the solution)

OK, so here's the solution I ended up with (forgive the improper syntax, you should still be able to get the picture):

class ZDataManager
{
    public:
       void RegisterListener(IListener* listener)
       { m_aListeners.push_back(listener); }
       void UnregisterListener(IListener* listener) 
       {m_aListeners.remove(listener);}
       void ProcessEvent(const SEventArgs& args) {
          foreach (list in m_aListeners) 
              if (list->GetArgType() == args.GetArgType())
                   list->ProcessEvent(args);
       }
    private: 
       std::vector<IListener*> m_aListeners;
}


class IListener
{
   virtual EArgType GetArgType() const = 0;
   virtual ProcessEvent(const SEventArgs&) = 0;
}


struct SEventArgs
{
   SEventArgs(const EArgType& eType) {m_eType = eType;}
   EArgType GetArgType() const {return m_eType;}
 private:
   EArgType m_eType;
}


struct SEventArgsFruit: public SEventArgs
{
    SEventArgsFruit(ETexture eText, EColor eCol, ETaste eTaste): SEventArgs(FRUIT_ARGS)
    { m_eText = eText; m_eCol = eCol; m_eTaste = eTaste};
    ETexture m_eText;
    EColor m_eColor;
    ETaste m_eTaste;
}


class ZFruitListener: public IListener
{
    ZFruitListener() {GetDataManager()->RegisterListener(this);}
    ~ZFruitListener() {GetDataManager()->UnregisterListener(this);}
    virtual EArgType GetArgType() const {return FRUIT_ARGS;}
    virtual ProcessEvent(const SEventArgs& args) {
       SEventArgsFruit* fruitArgs = static_cast<SEventArgsFruit*>(&args);
       //process new fruit
    }
}


class ZVegetableListener : public IListener
{
     virtual EArgType GetArgType() const {return VEGGIE_ARGS;}
     virtual ProcessEvent(const SEventArgs& args) {
        SEventArgsVeggie* veggieArgs = static_cast<SEventArgsVeggie*>(&args);
        //process new vegetable
     }
}



I'm quite satisfied with this solution, it relies on a static cast, but overall is still elegant, and does the job. The key to the solution is the generic SEventArgs struct and all of the wrappers that inherit from it, allowing us to package different types of data with an arbitrary number of parameters into something that can be processed generically thru the manager. It might seem awkward at first to be passing in a wrapper to the manager each time, or to create that wrapper for each case. But remember, we may have started with a solution that had a separate list for each arg type, and separate registration functions. Now, only in the ProcessEvent function do we begin to treat the args differently. 
One assumption I make (true for my set of use cases) that multiple listeners could listen to the same type of data, and one listener cannot listen to 2 different types of data. We could also have a base class ZListenerBase (inheriting from IListener), which simplifies the registration (this registration / unregistration happens exactly in the same way in the constructor / destructor of the listener classes). 

Saturday, November 6, 2010

Replace method with method object

This technique I found very useful, and needed when untangling complex functions, with local variable dependencies. I believe that when a function becomes too complex, or too large, it is often a hint that there should be another class here, taking care of business in a more systematic way. At first it seems like a big deal to untangle the complex funcion, but as you start doing it, you usually realize how much cleaner the logic becomes when it's all separated into the other class. 
  1. The original function becomes less cluttered, as it offloads the work to the new class
  2. The new class becomes cleaner, for the same reason, but often even more than expected, because the functionality offloaded to the new class can help clean out other functions in the original class
  3. Once the logic has been dumped into the new class, it may be clearer how to simplify it further, and reduce the coupling between the original and the new class
This is not a trivial process, so I'll refer to Martin Fowler's book for the mechanics (p.136):
  • Create a new class, name it after the method
  • Give the new class a final field for the object that hosted the original mehtod (the source object) and a field for each temporary variable and each parameter in the method
  • Give the new class a constructor that takes the source object and each parameter
  • Give the new class a method named "compute"
  • Copy the body of the original method into compute. Use the source object field for any invocations of methods on the original object [my italics]
  • Compile
  • Replace the old method with one that creates the new object and calls compute
  • Now comes the fun part. Because all the local variables are now fields, you can freely decompose the method without having to pass any parameters
Now notice the italicized text. This can be an important glitch, which I'm sure Fowler is aware of but didn't want to get into for the sake of simplicity. Often you'll find that in your attempt to disentagle a function in this fashion, you end up creating 2 classes which are entangled with each other. The new class will (at first) need to make calls to functions of the original class. Of course it would be a shame to leave it there. Often if your new class ends up calling functions of the original class is a signal that perhaps some more functions should be offloaded to the new class, or some new parameters need to be passed in to the functions of the new class. There is an art to this, and at this point I don't really have a checkpoint list of how that situation should be handled, as it comes on a case by case basis. Sorry, no examples here, as last time I did this type of refactoring was a few months ago.