Monday, December 6, 2010

Optimization vs. readability

It is not uncommon that I encounter situations, where I know the code I'm writing or refactoring is clearly not performance-optimal, and yet more readable.
Here's an example of code that I've encountered recently (excuse the bad formatting, I'm experimenting here):

struct State{  
float m_Prop1; 
  int m_Prop2;  
  std::string m_Prop3;
// ...
}


    class Manager {      State oldState, newState; std::vector m_Elements; }
    //... 
foreach elem in m_Elements {   
if (oldState.m_Prop1 != newState.m_Prop1) 
elem.SignalProp1Change(); 
else if (oldState.m_Prop2!=newState.m_Prop2) 
elem.SignalProp2Change();    
// ... 
}  
    There are 2 ways to go from here. 
The first is to take the comparison outside the loop, as a result of which we only need to do the comparisons once. The second is to move both the comparison and the signaling into the elements, which would put all the important logic where it is actually needed. Here's the comparison.



    Performance-optimal versionOO version
    Code in the Manager headerstruct State {
      m_Prop1, m_Prop2, m_Prop3;
    }
    class Manager {
      State oldState, newState;
      std::vector m_Elements;
    }
    struct State {
       m_Prop1, m_Prop2, m_Prop3;
    }
    class Manager {
      State oldState, newState;
      std::vector m_Elements;
    }
    or
    class ZState {
    friend class Element;
    private: m_Prop1, m_Prop2, m_Prop3;
    }
    class Manager {
      ZState oldState, newState;
      std::vector m_Elements;
    }
    Code in the Managerif (oldState.m_Prop1!=newState.m_Prop1)
        foreach elem in m_Elements
            elem.SignalProp1Change();
    if (oldState.m_Prop2!=newState.m_Prop2)
        foreach elem in m_Elements
             elem.SignalProp2Change();
    foreach elem in m_Elements
       elem.SignalChanges(oldState, newState);

    Code in the element source file//nothing much happening herevoid Element::SignalChanges(const State& oldState, const State& newState) {
       if (oldState.m_Prop1 !=newState.m_Prop1)
          SignalProp1Change();
       if (oldState.m_Prop2 !=newState.m_Prop2)
           SignalProp2Change();
    }
    Code in the element header fileclass Element {
      public:
        void SignalProp1Change();
        void SignalProp2Change();
    }
    class Element {
     public:
        void SignalChanges(const State& oldState, const State& newState);
      private:
        void SignalProp1Change();
        void SignalProp2Change();
    }
    or
    class IElement {
     virtual void SignalChanges(const State& oldState, const State& newState) = 0;
    }
    class Element : public IElement {
    // the rest is the same as above
    }  
The first solution is more performance optimal, since if a property hasn't changed, we never call any of the functions, thus skipping the traversal of the list altogether. In the second solution is more modular, and all logic is moved into the elements. This way we hide all that logic from the manager, we can also hide the contents of the state from the manager. This latter approach is more robust, leaving the manager class completely oblivious to logic of the elements. Also a new programmer looking at the code base needs to inspect less code in order to figure out what to do. 

So what to do, what to do? I typically prefer the second option, as you may have already guessed. The benefits of modularization are great for maintenance, and the performance gain may only be important if this code is executed very frequently. But I think this example does illustrate an important point. Data hiding can prevent us from getting code that is fully performance-optimized. The more we separate code into clean independent modules, the less we can take advantage of their interdependence. 

3 comments:

  1. My vote is certainly on the performance-optimal version - and this is even without taking performance into account.

    Notice how much more code goes into the OO version, and how the logic of when to trigger changes are located in the manager (or controller) class - exactly where it belongs.

    As for easier maintenance, i would argue the exact opposite: the PO version is much easier to maintain than the OO version - precisely because the logic is located in one place - the controller. Imagine that you have another class managed by the controller, let's call it Widget.

    In the OO style you will now have to do the same thing as in the Element class. In the PE version this code will be in the controller, right next to the code that does the same thing for the Elements. IMHO, much easier to find and maintain.

    In short, data hiding is not only (sometimes) preventing us from getting performant code, but it is also hiding the logic, by coupling it with the data. In other words, OO sucks :-D

    ReplyDelete
  2. @waker: I can see this is totally not your type of blog :)

    @Corvus: Since your vote is on the first version without taking performance into account, let's take performance out of the equation and talk about other considerations exclusively.

    I can see your point about having all logic in one place. It CAN be beneficial, and I guess it does depend on the situation. I think what you're talking about is the case of
    if (oldState.m_Prop1!=newState.m_Prop1)
    foreach elem in m_Elements
    elem.SignalProp1Change();
    foreach widget in m_Widget
    widget.SignalProp1Change();

    Fair enough, and if that's all we had, I would vote for the first option too. But there is a tendency for such manager classes to be bloated. They try to take care of your kids and be a car dealership at the same time, and sometimes accidentally try to sell your kids at the car dealership. When a manager becomes a heavyweight do-it-all class (which is precisely what happened in the real-world case that inspired this example), all of the sudden the prospect of moving logic into the elements becomes more attractive. The additional minus of the PO solution is that, the State, the Element, and the Widget have to expose their internals to the Manager, thus potentially exposing them to other more malevolent classes (let's call them the car jackers, and the child molesters).

    What I also like about the second solution is that while the total amount of code may be greater, the total amount of code in the Manager is drastically reduced. And if you're looking at a function of 800+ lines of code, this can help a lot.

    ReplyDelete