So I recently discovered Resharper, and it's a refactoraholic's dream come true. Most of the tedious routine tasks are now automated, just the way they should have always been. Why oh why God did you make me wait so long for this miracle? Why did I have to waste countless hours doing manually what was always meant to be done automatically?
Not sure if your field is used from anywhere else? Resharper already knows and suggests before you may even wonder. Is your naming scheme consistent? Resharper will tell you all about it. Apparently you can even apply a solution-wide code analysis, which I haven't yet tried, but it sounds promising.
It may seem a bit intrusive, popping up with suggestions all over the place, but it works for me, so far most of the suggestions have been very sensible, and it definitely beats the shit out of Visual Assist. Oh and it provides a Unit Testing framework too, while it's at it. Not too shabby.
Monday, December 12, 2011
Thursday, December 1, 2011
Good management vs. bad management
In my relatively short career, I have seen a LOT of bad managers, and relatively few good ones (one of the readers of this blog included in that). The subject has always interested me immensely, since a manager, often to a much greater extent than any of the people that work for him, determines the success or failure of a project. So what are some of the skills that make a good manager?
Area 1: People Skillz.
A good manager is good with people, at least to the extent that the job requires it. You don't have to be a social butterfly, but you gotta be able to talk to people about what's important. The other important factor here is alpha-ness. It's about being able to ask things of people in a way that would motivate them to take action. It's also about being able to stand up to criticism from higher-ups and insisting on what's right. You need to muscle your way thru opinionated people and convince others of the merits of your case, whether it be on technical grounds or sales numbers, or just plain common sense. You need to know how to make people listen.
There is also the flip side to that. I have seen too many managers who are too alpha. That means they're too dismissive of their subordinates, they don't listen enough, and they trust their gut instinct too much at the cost of proper decision making. The uber-alpha manager always thinks he/she got things under control, which can be a bad thing (see Area 5). Being too alpha can also mean demanding the impossible. ("We have to do finish this 5-month long project in 1 month and we have to do it perfectly"). Demanding the impossible and not listening enough will also mean losing the respect of the subordinates, who will not take you seriously, or despise you for wasting their time.
Area 2: Responsibility a.k.a. giving a shit.
This is a big one. It sounds obvious, but I've seen too many managers lacking passion for the project they're working on. If a manager only cares about having the burndown chart go to zero, or meeting the deadline without getting into the details of what that would really mean, he's a lazy fuck and needs to get some sense knocked into him (which would require someone above to detect the laziness and take action). The manager who doesn't give a shit, will never take responsibility for the project going wrong. The deadline slipping is always the fault of somebody else, "I was watching the burndown chart, and it looked like we were gonna make it, and then Bob put in this extra task at the last minute, so we failed because of him." This type of statement is usually followed up by some hand-waving: "But don't worry we're gonna make it anyway". The irresponsible managers also tend to socialize with each other at the cost of the team, to give each other a pat on the back when everything goes to hell. A responsible manager admits failure when it happens, and, what's more important, is going to try to do better in the next iteration. It's about the self-improvement attitude. Unfortunately, it seems like some managers are in it for the Manager at the XXX Corp title. Once they got the title, their job is done. I know I'm being silly, but sometimes it feels like that's how things work.
Area 3: Time Management
This is the classical thing they teach in school and is covered extensively in books. How to manage your time and the time of your team. How to do project planning: estimates, burndown charts, how to handle the complexity of a project. Unlike Area 2, this is not a personality trait, but a skill, and most managers tend to have some idea what to do here. The problem comes later when (surprise!) the estimates don't reflect reality and something needs to be done. The manager inevitably needs to get into the details, as time management skills in and of themselves will only get you so far. This skill can improve with (management) experience.
Area 4: Technical Skillz and having a clue.
You need to be able to know what the hell is going on. An art manager needs to know something about art, a programming manager needs to know something about programming. You need to know what estimates make sense for a particular employee or department.You need to know when people are throwing dust in your face, or (when they're throwing dust in their own face). This bullshit-detector cannot be purchased at Walmart, and comes only with (technical) experience.
Area 5: Knowing Where the Project Is and where it needs to be
It's about really having a clue whether the project is 6 months from completion, or if it's 5 years away (as well as knowing what "completion" really means). Sounds easy, but this is the area where most of the bad managers I've seen failed completely. At least in the few places where I've worked, managers seem to get into the state of denial with unprecedented ease. Often an indicator of a manager who is bad at Area 5 is the following situation: when they declare a deadline, everybody on the team immediately knows they're not going to make it. The manager himself will continue insisting that that's where the deadline is and the scope won't change, until it's too late. You can blow dust in people's face, but you should always know that you're doing it and not kid yourself about making a deadline. The deadline is important, but just as important is the scope. What is it that is going to be delivered? What kinds of adjustments would need to be made to still make the deadline even though not everything is done? What kinds of corners can be cut, and what kinds of corners are too expensive to cut? Does the deadline need to be postponed? If so, will it be just one more week, or are you just kidding yourself again? You can SAY it's one week to sell it to the customer, but you need to know you're lying for your own sake.
Area 6: Knowing HOW to take the project where it needs to be
Area 6: Knowing HOW to take the project where it needs to be
This is about knowing how to take a project from A or wherever it is to Z, the final deliverable. It will take all of the above skills together to achieve that, and no wonder good managers are hard to find. It's a tough job, but somebody has to do it.
Tuesday, October 4, 2011
Refactoring and code ownership
So I've been given a pretty big chunk of Perl code to work with. It's about 30 pages of pure hell, all in one file, and considered unmaintainable by some. Of course the word unmaintainable is not in my vocabulary (and apparently not in Blogger's dictionary either), and somebody had to do the job, so the Refactorinator comes to the rescue.
At the surface level, refactoring Perl code is not that much different than refactoring C++.
OK, I realize there's nothing Perl-specific here, I guess I'll have to make another post about that. But perhaps the lesson I'm learning here is refactoring is pretty universal, as long as the language supports multiple files, struct-like constructs, and basic functions.
Am I regurgitating basics here, being Captain Obvious? Maybe so, but then if the principles and techniques I'm writing about are so basic, then why the hell do I so consistently come across code with exactly the same mistakes time and time again? The short answer is "up front cost". That's a cost a lot of people are hesitant to pay, especially if there are ownership issues:
At the surface level, refactoring Perl code is not that much different than refactoring C++.
- Identify groups of variables that belong together, and group them into chunks / structs, that can be passed around between files and functions without creating too much clutter.
- Break up the monster functions into smaller functions. Passing data between functions should be easier because of step 1
- Break up the monster file into multiple files, and hopefully never look at some of these files again. Hopefully at this point, we have arrived at some self-contained functions (step 2) with relatively short function signatures (step 1).
OK, I realize there's nothing Perl-specific here, I guess I'll have to make another post about that. But perhaps the lesson I'm learning here is refactoring is pretty universal, as long as the language supports multiple files, struct-like constructs, and basic functions.
Am I regurgitating basics here, being Captain Obvious? Maybe so, but then if the principles and techniques I'm writing about are so basic, then why the hell do I so consistently come across code with exactly the same mistakes time and time again? The short answer is "up front cost". That's a cost a lot of people are hesitant to pay, especially if there are ownership issues:
- Code has no clear owner - why should i clean up somebody else's code?
- Code has an unfriendly owner - your cleanup is very much unappreciated, because the owner a) thinks refactoring is a waste of time or b) has a big refactoring plan in mind himself, which he'll probably never get around to, but in the meantime you shouldn't ruin his grand plan
- Code has another owner - even if he's friendly, why should you do his work.
- Code has an owner that doesn't believe in refactoring
- Code has an owner that believes in refactoring in principle but never gets around to it in practice
Tuesday, September 20, 2011
Tips for debugging
A reiteration of some well-known truth, but it's nice to have a refresher anyway:
Watch it on Academic Earth
Tuesday, September 13, 2011
Short term memory
A few weeks ago I picked up this mobile app, that tests your short-term memory. It's basically the classical "find pairs of matching cards" game that kids always play, just with a lot of variations, and difficulty levels. On the Android store, it's called "Matchup", and I highly recommend it.
Here's some observations I made about this game:
How does this post belong on the refactoring blog? Well of course short-term memory is very useful when it comes to programming, especially in debugging mode, because you need to keep a lot of details in mind at the same time in order to make sense of the complex state of the program. So play memory games folks, and be a happy programmer!
Here's some observations I made about this game:
- It doesn't get that much easier with time. While with many "fundamental knowledge / trivia" games you might end up knowing all the answers at the end, and be quicker, with Matchup, you don't really have much of an advantage, whether you're new at it or have been a seasoned player.
- Your skill goes up after a while, and then down again, as you get fatigued and start getting confused
- It really helps to pronounce the names of objects either out loud, or at least internally. Your short-term memory is better when you connect it to auditory clues, instead of visual ones.
- Unfortunately that means that any auditory distractions (and most external distractions are auditory), will have a tendency to throw you off
- It helps to repeat, and when matching pairs are found, to re-pronounce the list of remaining cards to yourself. It helps to know the names of the objects, or have an internal self-made name for them. I typically play the flag version of the game, and I found that if I don't know the flag names, improvising them on the fly takes up more energy and is more error-prone.
- The other thing that can throw you off is if you think you need one word to represent the object, but in fact it is not sufficient to distinguish it from other objects. For example if you flip a card with a car in it, the first word that comes to mind is "car", until a car of another color comes up, then it's important to re-pronounce the list with "yellow car" and "blue car" in it.
How does this post belong on the refactoring blog? Well of course short-term memory is very useful when it comes to programming, especially in debugging mode, because you need to keep a lot of details in mind at the same time in order to make sense of the complex state of the program. So play memory games folks, and be a happy programmer!
Tuesday, September 6, 2011
Monday, July 25, 2011
On getting it right the first time
As I program more and more I'm beginning to realize the following:
There is no such thing as getting it right the first time. In fact, this may even be a bad way of thinking about coding altogether. You can only approach the "correct" solution assymptotically with iterations. The concept of "right" may not even exist as such in the early stages of a project. Trying to "nail it" from the start by up front design, is a misunderstanding of the process. The best you can do is a good first prototype.
Some coders I know go by the principle of "code correctly now, so you have to change less later". At first glance it makes sense, but I would take it with a serious grain of salt. In fact, I would say it could be damaging to software development to think this way, since a philosophy like that leads to being afraid of getting it wrong, and then you run the risk of settling yourself in with a rigid up front design.
Instead I find it best to expect and embrace change. To me, the first iteration is all about getting the result as quickly as possible, on top of any underlying datastructure. The "proper" datastructure will become clearer as the results come in. Rather than being afraid of refactoring, I take it as an integral part of the process, and I expect to do it. It is a process, where the goal is to get it right in the end, rather than at the beginning, it's all about exploration and constant change. Contrary to popular belief, though, this change is not equally distributed with time. Eventually we need to converge to the final product, so the amount of change needs to be less and less as time goes on. If refactoring is taken as an integral part of the process, the datastructures you have at any given moment, reflect a better and better understanding of the system, and the vocabulary of the code converges on the vocabulary of the problem space.
There is no such thing as getting it right the first time. In fact, this may even be a bad way of thinking about coding altogether. You can only approach the "correct" solution assymptotically with iterations. The concept of "right" may not even exist as such in the early stages of a project. Trying to "nail it" from the start by up front design, is a misunderstanding of the process. The best you can do is a good first prototype.
Some coders I know go by the principle of "code correctly now, so you have to change less later". At first glance it makes sense, but I would take it with a serious grain of salt. In fact, I would say it could be damaging to software development to think this way, since a philosophy like that leads to being afraid of getting it wrong, and then you run the risk of settling yourself in with a rigid up front design.
Instead I find it best to expect and embrace change. To me, the first iteration is all about getting the result as quickly as possible, on top of any underlying datastructure. The "proper" datastructure will become clearer as the results come in. Rather than being afraid of refactoring, I take it as an integral part of the process, and I expect to do it. It is a process, where the goal is to get it right in the end, rather than at the beginning, it's all about exploration and constant change. Contrary to popular belief, though, this change is not equally distributed with time. Eventually we need to converge to the final product, so the amount of change needs to be less and less as time goes on. If refactoring is taken as an integral part of the process, the datastructures you have at any given moment, reflect a better and better understanding of the system, and the vocabulary of the code converges on the vocabulary of the problem space.
Tuesday, July 12, 2011
Disentangling boolean logic: The problem
One of the most difficult things i've had to do in my carreer as a refactorer is to trying to clean up entangled boolean logic. The program has many states represented by booleans and depending on their values or a combination of their values, various pieces of logic execute. All is fine and dandy, unfortunately not all that logic executes correctly, in other words we have a bug. Being new to a system i often find myself fixing one such a bug, only to introduce another, and then when i think i fixed them both, 2 others appear, and so on. Now it's time to bring in the big guns and get a proper grip on the problem. Where are all these booleans coming from? Who is using them and how? Do some of these variables produce a butterfly effect, affecting seemingly unrelated pieces of code or even the air conditioning in your building? So the first approach is to try to get an overview of the use cases. What at least some organized programmers do is create a matrix of possible inputs to expected outputs (or expected function calls). This can be quite a challenge, since if your input is represented by n booleans, your possible set of input states is 2^n, and that's just one of the dimensions.
Luckily, usually the real use cases can be encoded much more concisely (see my other post on this subject), by reducing the set of booleans by 1 enum. Suppose after an arduous journey through boolean land, we have come up with a matrix of inputs to outputs for test cases, which has a total number of cells between 15-50. Are we there yet? Unfortunately, we have only started. The matrix that we have created represents the ideal state. Now all that's left is to figure out how what we have corresponds to what we need. And as Murphy's Law would have it, we don't have a 1:1 correspondence between the variables in the current program and the entries in the matrix.
So, as a pragmatic reader, you may ask, what does this post actually advocate? Well, ahem... It advocates a heavy-handed refactoring process that attempt to arrive at a quite complex solution space (the ideal reduced matrix of inputs to outputs, which may not even be that small) from an even more complex space (in practice a less ideal matrix of much larger dimensions). I have to say I do not have a nice and clean approach to this, but I do have a few tips & tricks that have helped me in the past to reduce the complexity of boolean logic, which I'll describe in the next post.
Luckily, usually the real use cases can be encoded much more concisely (see my other post on this subject), by reducing the set of booleans by 1 enum. Suppose after an arduous journey through boolean land, we have come up with a matrix of inputs to outputs for test cases, which has a total number of cells between 15-50. Are we there yet? Unfortunately, we have only started. The matrix that we have created represents the ideal state. Now all that's left is to figure out how what we have corresponds to what we need. And as Murphy's Law would have it, we don't have a 1:1 correspondence between the variables in the current program and the entries in the matrix.
So, as a pragmatic reader, you may ask, what does this post actually advocate? Well, ahem... It advocates a heavy-handed refactoring process that attempt to arrive at a quite complex solution space (the ideal reduced matrix of inputs to outputs, which may not even be that small) from an even more complex space (in practice a less ideal matrix of much larger dimensions). I have to say I do not have a nice and clean approach to this, but I do have a few tips & tricks that have helped me in the past to reduce the complexity of boolean logic, which I'll describe in the next post.
Monday, July 11, 2011
Don Syme quote on parallel code
"If you just take non-deteministic execution of imperative multi-threaded code today, it is just such a nightmare to write that kind of code, that we HAVE to do something to make that kind of problem tractable"
See (http://www.youtube.com/watch?v=uyW4WZgwxJE, 33:40 for reference)
See (http://www.youtube.com/watch?v=uyW4WZgwxJE, 33:40 for reference)
Monday, July 4, 2011
Just a thought on what it means to write good code.
The structure of the program needs to be such that it is easy to accomodate future requests.
The vocabulary of the code needs to reflect the vocabulary of the use cases. When somebody says, please implement feature X, there should be a way to "localize" that feature in the code. The closer the vocabulary of the use case to the code, the smaller the portion of the code that needs to change in order to accomodate the new use case.
The vocabulary of the code needs to reflect the vocabulary of the use cases. When somebody says, please implement feature X, there should be a way to "localize" that feature in the code. The closer the vocabulary of the use case to the code, the smaller the portion of the code that needs to change in order to accomodate the new use case.
Sunday, July 3, 2011
Wednesday, May 25, 2011
C++ woes
and the tedium of needless detail never ends ... it almost feels like a prison sentence. Do I have to write 50 for loops with an early exit every fucking day?
Sunday, May 22, 2011
Automated code analysis and refactoring
Here's another paper I found very practical:
"Many consider a cyclomatic complexity number of 10 or more to indicate an overly complex method" (p.4)
This paper talks about automated identification of "code smells" and a semi-automated approach for fixing the smells with refactoring.
Unfortunately the tools mentioned in the paper are for Java only, but it's a start. Hopefully i can find something similar for C++, and start applying it in my everyday development at work.
It's actually funny to see the low tolerance the authors have for certain practices:
"Many consider a cyclomatic complexity number of 10 or more to indicate an overly complex method" (p.4)
"A general rule of thumb I try to adhere to is to keep my methods to 20 lines of code or fewer. Of course, there may be exceptions to this rule, but if I've got methods over 20 lines, I want to know about" (p.8-9)
Code toxicity
So finally I found a piece work that is in line with what I was thinking in terms of code entropy. What I like about it is that it's actually quite comprehensive, as it takes into account several factors, such as function length, cyclomatic complexity, nested-if depth, etc.
Well without further ado, here's the link:
Doesn't look like I'm gonna be taking the world by surprise by making a brilliant academic thesis on the matter.
Well without further ado, here's the link:
Doesn't look like I'm gonna be taking the world by surprise by making a brilliant academic thesis on the matter.
Stages of development for code written from scratch
I'm going to try to reverse engineer the process I go through during software development. At the end of the day, this is some sort of optmization with respect to "maintainability" and "clarity". I'm haven't quite nailed down the formal definitions of these, but I'm going to try at a later stage.
So here's the process I go through if I start working on code from scratch:
So here's the process I go through if I start working on code from scratch:
- Rapid prototyping. Use structs or classes with all public data members and functions. At this point we don't know what the structure of the problem is, so there's no point in "locking down" classes or hiding members. When the structure of the problem is not clear, "boundaries" between classes are arbitrary. Which data and what functions belong together will become clear later.
- Exploration. Getting some results the quickest way, so as to get some iterations, user feedback, etc.
- Paradigm adjustment - The problem space is clearer. Classes actually correspond to concepts in the real world altho the data they operate on might be elsewhere. Even though the problem space is clear, the vocabulary used in the prototype no longer fits.
- Consolidation. It is now clear what problem we're trying to solve and how we're trying to solve it. As concepts are clearer so are the class boundaries. Most operations should now be on members of the same class. Most data members should be private, as they're no longer needed from the outside. Implementation details are now hidden and client code only needs to know a small fraction of it, the "interface", the API , via which the user can interact.
I believe in the end this process is about reducing code entropy, which I'm trying to define. I will ultimately need to:
- Define "code entropy" more formally
- Refine the process described in this post
- Show that the process described in this post leads to code with lower entropy
Revant previous posts:
Sunday, May 15, 2011
Why manual linked lists are evil
Recently I've been reminded of a pattern in code I particularly dislike. Manual traversal of linked lists. I couldn't quite put my finger on why I have an aversion to this practice, but I'm gonna try to pinpoint right here in this post, and solve this moral dilemma once and for all.
The pattern looks something like this (here i'm not talking about design patterns, just a pattern of usage).
//declaration code
struct SPointsList
{
SPoint m_CurPoint;
SPointsList* m_Next;
}
void foo(SPoint& pt);
//...
//client code
SPointsList points;
//.. some annoying initialization code goes here, creating a non-empty valid list
SPointsList* pt = points.m_CurPoint;
while (pt!=NULL)
{
foo(pt->m_CurPoint);
pt = pt->m_Next;
}
So what don't you like about this code, Mr. Refactoraholic? It's efficient, it's compact, it's not using templates, it's doing just what it's supposed to do and no more. Doesn't that fit into your favorite K.I.S.S. principle? Aren't you just being a tight-ass abstractionist who is just waiting for an excuse to use some templates and iterators that you read about in your favorite design pattern book?
Well, I'm glad you asked. Here's some of the things I don't like about this code:
We can argue about how pretty this code is or that it uses more characters than the original pointer version. But in the end, this approach hides all the nasty details and pointer arithmetics, and changing the representation does not affect the client code at all.
The pattern looks something like this (here i'm not talking about design patterns, just a pattern of usage).
//declaration code
struct SPointsList
{
SPoint m_CurPoint;
SPointsList* m_Next;
}
void foo(SPoint& pt);
//...
//client code
SPointsList points;
//.. some annoying initialization code goes here, creating a non-empty valid list
SPointsList* pt = points.m_CurPoint;
while (pt!=NULL)
{
foo(pt->m_CurPoint);
pt = pt->m_Next;
}
So what don't you like about this code, Mr. Refactoraholic? It's efficient, it's compact, it's not using templates, it's doing just what it's supposed to do and no more. Doesn't that fit into your favorite K.I.S.S. principle? Aren't you just being a tight-ass abstractionist who is just waiting for an excuse to use some templates and iterators that you read about in your favorite design pattern book?
Well, I'm glad you asked. Here's some of the things I don't like about this code:
- It reinvents the wheel. Implementing a linked list from scratch is an exercise that I've learned how to do in high school, and yet in practice try to avoid as much as possible for reasons below
- It does pointer arithmetics, which is an overused feature of C / C++, most other languages have wisely chosen to get away from.
- It is prone to off-by-one errors, since you have to be careful not to hit uninitialized memory or to get yourself into an infinite loop.
- Client code also has to be careful to construct a valid data structure (the last element needs to point to a NULL, so that elsewhere in client code we could check for it). I didn't include an example of this, simply to keep my blood pressure low and my stomach from getting too upset.
- All in all, the main negative thing about using manual linked lists, is that this approach exposes implementation details to the client, and client code everywhere has to adjust to that. If the representation changes, client code has to be re-written, and that's a real waste, as this code could have been written in a more flexible way from the start.
One way of writing this code would be:
typedef std::list < SPoint > SPointsList
//client code:
SPointsList points;
//some initialization code goes here, using list::insert, or list::push_back
SPointsList::iterator itr = points.begin();
for (;itr!=points.end(); ++itr) {
foo(*itr);
}
We can argue about how pretty this code is or that it uses more characters than the original pointer version. But in the end, this approach hides all the nasty details and pointer arithmetics, and changing the representation does not affect the client code at all.
There's also a more neutral approach that avoids using STLs, where we basically only implement the functions that are needed, but still use the iterator-based approach to hide the details of the pointer math. Here's a very sloppy version of that:
struct SPointsList
{
SPoint m_CurPoint;
SPointsList* m_Next;
void AddNewPoint(const SPoint&, SPointsList* pos);
void RemovePoint(SPointsList* pos);
SPointsList* GetNext() {return m_Next;}
bool IsLast(SPointsList* pt) {return pt==NULL;}
}
Perhaps only implementing the functions you need can reduce code bloat and allow you to special-case optimize access to your data structure, but you have to do more work to reinvent the wheel, and error-proof your internal pointer math for adding / removing elements.
[It seems like blogger is not the most code-snippet friendly of environments, so I'm gonna look into switching to wordpress or something else]
[It seems like blogger is not the most code-snippet friendly of environments, so I'm gonna look into switching to wordpress or something else]
Sunday, April 17, 2011
K.I.S.S. part II
This post is about keeping it simple on the solution side of things:
I think it was Mel Gibson who once said about acting "it takes a lot of work to make it look effortless". As a refactoraholic, I can testify to the fact that "it takes a lot of work to make code look easy". A person looking at simply written code might be tempted to say "oh yeah, I could have thought of it myself". Just like a good textbook that does a good job of explaining something, might get you quicker to the eureka moment, so will compact, loosely coupled code, with good comments and variable names that make sense get you to find logical errors and bugs, as well as make adding new features faster.
After refactoring I often stumble on some "obvious" logical errors in the code. It can look like a dumb mistake in the logic, but it only became obvious after 2-3 rounds of refactoring. Previously this mistake may have been burried so deep, it would have been much more difficult to "see".
Now it's easy for me to say "don't make shit complicated". I mean some shit is complicated by nature, and the only thing that's left to do is get up to speed on the details until it becomes clear. In addition, due to the exploratory nature of software development, often shit only becomes simple after you've done it the complicated way. Our understanding of the problem may grow as time goes on, and based on the improved understanding, the code may start looking clearer, and typically more compact.
After refactoring I often stumble on some "obvious" logical errors in the code. It can look like a dumb mistake in the logic, but it only became obvious after 2-3 rounds of refactoring. Previously this mistake may have been burried so deep, it would have been much more difficult to "see".
K.I.S.S.
As I look at the experiences I've had handling difficult code written by other people, some patterns seem to emerge. One of the most persistent patterns that seem to be causing most of the software maintenance issues I've encountered is people making shit too fucking complicated! And by that I mean people not following the principle "Keep it simple, stupid" (aka, K.I.S.S.).
There is one way of making shit complicated that I am attacking and that is taking care of unnecessary use cases. In my opinion, the code that is most difficult to maintain is code, which has a lot of what-ifs in it, a lot of just in case handling. This coding strategy is the root of all evil, and together with premature optimization forms the two black pillars of death. So here I'm going to try to make a distinction between:
(1) unnecessary code to solve a necessary problem, and
(2) code to solve an unnecessary problem (the code in this case is also, by definition unnecessary).
The first case is about proposing a solution that's unnecessarily difficult, the second case is about proposing a solution that's not necessary at all. The first case is about overcomplicating the solution space, while the second is about overcomplicating the problem space.
Life of a coder is difficult enough as it is, we shouldn't have to solve imaginary, or potential problems before solving real ones. One of the most obvious signs that code is solving an imaginary problem, is large amounts of boiler-plate code, a lot of methods and files that seem to be doing very little, as if the original author of the code had something grander in mind, and put in some code just in case. What's makes matters worse is that the code for solving a non-existing problem is living right alongside the code that solves an actual problem. Then, the reader of the code in question has to not only understand how this code solves an actual problem, but also what other than the actual problem is this code trying to solve?
Continued in this post: http://refactoraholic.blogspot.com/2011/04/kiss-part-ii.html
There is one way of making shit complicated that I am attacking and that is taking care of unnecessary use cases. In my opinion, the code that is most difficult to maintain is code, which has a lot of what-ifs in it, a lot of just in case handling. This coding strategy is the root of all evil, and together with premature optimization forms the two black pillars of death. So here I'm going to try to make a distinction between:
(1) unnecessary code to solve a necessary problem, and
(2) code to solve an unnecessary problem (the code in this case is also, by definition unnecessary).
The first case is about proposing a solution that's unnecessarily difficult, the second case is about proposing a solution that's not necessary at all. The first case is about overcomplicating the solution space, while the second is about overcomplicating the problem space.
What if the hero in our game is hanging down from a helicopter, and a bullet enters his left eye? Should we simulate the eye movement on the right eye as the bullet gets closer? What if one of his eyes gets streamed out in the process? What if there's an eye-patch on the left eye? Should we simulate it with cloth physics?
OK OK, stop right there. How about let's consider if this combination of circumstances needs to be handled at all?!
Life of a coder is difficult enough as it is, we shouldn't have to solve imaginary, or potential problems before solving real ones. One of the most obvious signs that code is solving an imaginary problem, is large amounts of boiler-plate code, a lot of methods and files that seem to be doing very little, as if the original author of the code had something grander in mind, and put in some code just in case. What's makes matters worse is that the code for solving a non-existing problem is living right alongside the code that solves an actual problem. Then, the reader of the code in question has to not only understand how this code solves an actual problem, but also what other than the actual problem is this code trying to solve?
Continued in this post: http://refactoraholic.blogspot.com/2011/04/kiss-part-ii.html
Tuesday, March 22, 2011
How to refactor huge chunks of unfamiliar code
I've had the pleasure (or the misfortune) of refactoring some large chunks of code, with the following characteristics:
- It has at least one very large function (300-900 lines)
- It has no clear owner, as at least 5 people have been authors on different parts of the code, and 2-3 of them already left the company
- This code has gained the reputation, that nobody knows what it's really doing
- There are plans to do some cleanup, but nobody knows when or who is gonna do it, and nobody dares to take the responsibility. This project is always on people's radar, but never quite higher on the priority list than everything else they gotta do
- Besides, respectable programmers got better things to do than to refactor messy obscure code, right? Time to call in the Refactorinator. MWAHAHAHAHA
Here's the process backwards engineered from how I typically do it
- Identify the huge functions
- Identify major logic blocks within those functions
- Typically these will be outer while or for loops (inner loops might just be too hard to tackle right away, if they have dependencies on variables computed in the outer loops)
- Factor the identified chunks of code out into their own function
- Visual Studio has "Extract Method", which does an OK job of automating this process
- Typically there will be too many parameters in such a function, consider if you can
- Make some of the parameters members of the class
- Recompute the same values locally inside the new function (some of these computations will be cheap, like an indexed array lookup, and it pays off to reduce the size of signature for the new function).
- Make it clear what the input and the output of each function is.
- What values are being consumed, what values are being modified?
- const and static functions are preferrable, as they don't have side effects
- If it's not possible to make the function const or static, try to see if you can make some of the parameters const. That is usually possible, especially as the newly created function is smaller in scope and will modify less of the data
- Factor out a few functions this way if possible
- Are there any patterns emerging?
- Are there certain groups of parameters that are getting passed around a lot?
- If so, form a struct of these common parameters, which should further simplify the function signatures for the new functions
- Are there common operations on these groups of parameters?
- In this case, we might start adding functions to the struct
- Is this group of paramters sufficiently isolated that they are no longer needed for individual access from other datastructures?
- Then we have the birth of a new class.
- Resolve overlapping or duplicate data
- This is the most annoying part of the process. It very often happens that the datastructures floating around in the super-large function / class are similar (but not quite the same) or contain duplicate elements.
- For example there can be two structures passed around, which are partially different, and partially they represent the same data, and then that data is copied back and forth to keep in sync.
- Try and see if you can disentangle this knot by using the member of only one of these overlapping structs / classes. That way you can avoid 3 bad things:
- needing to keep them in sync (simplifying logic),
- copying them back and forth (saving CPU cycles and reducing code size)
- stroring of duplicate data (saving memory)
- This is often far from trivial, and might require some domain knowledge, or actually sitting down and understanding the code
- Hopefully after the massive code-shoveling, your understanding is much improved, or it could be time to check with one of the owners
- Be prepared however, that the (partial/ former) owner is also confused, doesn't fully understand the system, or doesn't recognize the code after you've messed with it
- Regardless, there could be some useful information that pops up as a result of this communication
- I will probably elaborate on this last step in another post (this one was probably a handful already). I don't think I have the process of resolving overlapping data nailed just yet, but I believe with more experience, I will arrive at a more solid solution.
Thursday, March 3, 2011
F# for beginners?
I think not:
"The mapfirst function takes a function as the first argument and applies it to the first element of a tuple that's passed as the second argument"
Try explaining that to someone new to programming...
"The mapfirst function takes a function as the first argument and applies it to the first element of a tuple that's passed as the second argument"
Try explaining that to someone new to programming...
Saturday, February 26, 2011
Tuesday, February 15, 2011
first experiences with F#
I've looked at F# before and i liked the idea, but recently i started to actually write in it, and here's my experience with F# so far:
Debugging is quite different. While in C# / C++ i would put a breakpoint in the faulty code, here in F# i typically use the F# interactive window to call one function at a time. This is convenient because you can build up your program from the bottom up, one subroutine at a time.
Indentation is used for scope. While it helps make code a bit shorter, i have found this to be a total pain in the ass. The indentation-sensitive parser reports all sorts of undescriptive error messages. One space in, and you're doing a nested declaration, one space back and you're declaring a member outside class scope. This has so far caused me major grief. I hope i can get used to this eventually. Perhaps there is a tool out there which helps clarify which scope F# THINKS your function is in.
Programs are shorter because of:
Debugging is quite different. While in C# / C++ i would put a breakpoint in the faulty code, here in F# i typically use the F# interactive window to call one function at a time. This is convenient because you can build up your program from the bottom up, one subroutine at a time.
Indentation is used for scope. While it helps make code a bit shorter, i have found this to be a total pain in the ass. The indentation-sensitive parser reports all sorts of undescriptive error messages. One space in, and you're doing a nested declaration, one space back and you're declaring a member outside class scope. This has so far caused me major grief. I hope i can get used to this eventually. Perhaps there is a tool out there which helps clarify which scope F# THINKS your function is in.
Programs are shorter because of:
- Frequent use of inline functions and nice syntax sugar for closures.
- Indentation instead of braces for scope
- Operations on lists and tuples are built in parts of the language, ensuring very compact code when dealing with lists. Iteration, accumulation, and selection are very natural in F#.
Published with Blogger-droid v1.6.6
Saturday, February 12, 2011
Refactoring and skiing
I had the good fortune to do both refactoring and skiing this week, and noticed some similarities.
When on a tough slope, a beginner skier has to be extra careful to ensure that at no time is his speed out of control. As he does that, he pays a price in energy. If every turn he makes results in a complete stop, his legs have to do the extra work. A good skiier is both faster AND uses less energy, as the turns he makes don't lose as much momentum.
Refactoring can also be done slowly & carefully, and the price you pay for safety is speed. Refactoring in smaller steps is safer, but in order to ensure safety, you have to put up all sorts of scaffolding to make the intermediate steps work.
This scaffolding (intermediate datastructures and functions) is what's needed to make the new code and the old code work together, and will later need to be discarded as old code is eliminated.
An expert refactorer would be able to skip the intermediate steps and make the final version quicker. But working that way is of course dangerous business, and could cause more crashes along the way (pun intended).
When on a tough slope, a beginner skier has to be extra careful to ensure that at no time is his speed out of control. As he does that, he pays a price in energy. If every turn he makes results in a complete stop, his legs have to do the extra work. A good skiier is both faster AND uses less energy, as the turns he makes don't lose as much momentum.
Refactoring can also be done slowly & carefully, and the price you pay for safety is speed. Refactoring in smaller steps is safer, but in order to ensure safety, you have to put up all sorts of scaffolding to make the intermediate steps work.
This scaffolding (intermediate datastructures and functions) is what's needed to make the new code and the old code work together, and will later need to be discarded as old code is eliminated.
An expert refactorer would be able to skip the intermediate steps and make the final version quicker. But working that way is of course dangerous business, and could cause more crashes along the way (pun intended).
Published with Blogger-droid v1.6.6
Tuesday, January 11, 2011
OOP and prototyping
This post should delight my OOP-hating friends. But hey, I'm not here to defend OOP at all costs, but rather to talk about my experiences with it, and how to make the best of it.
One thing that I found OOP is good at is imposing "design up front" approaches. OOP is all about putting structure on your code. Unfortunately in early stages / prototyping, you will not necessarily know what kind of structure your code should have just yet. Unless you have a ton of experience or a crystal ball (preferably the latter), you will likely take wrong turns in the design. Granted, those wrong turns are temporary, and can be corrected by future refactoring, but refactoring is not free. It costs time and effort. In the meantime, there is an additional cost and effort to working with OOP code that is structured inconveniently. If you got the structure wrong, and your data members belong to the wrong class, you need to create accessors and forwarding calls in order to access data that's nested. It's not uncommon to see something like:
GetDataManager()->GetChineseGrammar()->ExtractCharacter1FromContext(GetContextManager()->GetContext(x))
in bad OOP code, because it's trying to access data that has been hidden in the wrong class.
Due to lack of foresight / experience, or even simply due to the nature of user requirements, I found myself temporarily locking myself down to bad partial designs. This then resulted in a struggle between growing functionality on top of a temporarily bad design versus improving the design now in hopes of quicker development later. As those familiar with my blog would easily guess, I opted for the "improve the design now" option. The problem is, however, "improve the design" meant moving from one structural lock-down to another. Development went in stages of progressively better design, each stage dictated by the results of the previous one. In the early iterations this was not particularly convenient, since requirements were "soft" and so was my understanding of where this project should go.
I always did find that this process was more and more satisfying as the project reached its final iterations, because the structural lockdown (i.e. what classes have access to what data members), needed to change less and less. OOP's biggest benefit may just be at the end of the development cycle, when all the pieces are in place and most use cases have been fit into the framework. Bug fixing on a well-written OO program is easy, because the final design reflects the use cases and reasoning about the use cases should be similar to reasoning about the program itself.
Anyways, back to the prototyping stage. I found that one of the best ways to keep the structure "soft" and maleable is to start by using structs (or classes with all public members) and global functions. Then slowly, as use cases become clear, those functions would move into the structs, initially as public functions, more data members will become private, and some of the functions also, and eventually we end up with our standard OOP, with data hiding, and all that good stuff.
One thing that I found OOP is good at is imposing "design up front" approaches. OOP is all about putting structure on your code. Unfortunately in early stages / prototyping, you will not necessarily know what kind of structure your code should have just yet. Unless you have a ton of experience or a crystal ball (preferably the latter), you will likely take wrong turns in the design. Granted, those wrong turns are temporary, and can be corrected by future refactoring, but refactoring is not free. It costs time and effort. In the meantime, there is an additional cost and effort to working with OOP code that is structured inconveniently. If you got the structure wrong, and your data members belong to the wrong class, you need to create accessors and forwarding calls in order to access data that's nested. It's not uncommon to see something like:
GetDataManager()->GetChineseGrammar()->ExtractCharacter1FromContext(GetContextManager()->GetContext(x))
in bad OOP code, because it's trying to access data that has been hidden in the wrong class.
Due to lack of foresight / experience, or even simply due to the nature of user requirements, I found myself temporarily locking myself down to bad partial designs. This then resulted in a struggle between growing functionality on top of a temporarily bad design versus improving the design now in hopes of quicker development later. As those familiar with my blog would easily guess, I opted for the "improve the design now" option. The problem is, however, "improve the design" meant moving from one structural lock-down to another. Development went in stages of progressively better design, each stage dictated by the results of the previous one. In the early iterations this was not particularly convenient, since requirements were "soft" and so was my understanding of where this project should go.
I always did find that this process was more and more satisfying as the project reached its final iterations, because the structural lockdown (i.e. what classes have access to what data members), needed to change less and less. OOP's biggest benefit may just be at the end of the development cycle, when all the pieces are in place and most use cases have been fit into the framework. Bug fixing on a well-written OO program is easy, because the final design reflects the use cases and reasoning about the use cases should be similar to reasoning about the program itself.
Anyways, back to the prototyping stage. I found that one of the best ways to keep the structure "soft" and maleable is to start by using structs (or classes with all public members) and global functions. Then slowly, as use cases become clear, those functions would move into the structs, initially as public functions, more data members will become private, and some of the functions also, and eventually we end up with our standard OOP, with data hiding, and all that good stuff.
Subscribe to:
Posts (Atom)