You’re never good enough to ignore the basics

I thought about not publishing this post for fear of it tarnishing my otherwise flawless professionalism (/chucklesSoftly). What follows is a story about how I ended up taking far too long to solve a problem.

Please note: the code I share here is not meant to be production quality. It was code that I wrote as part of my daily warmup. There are things to scrutinize, optimizations to be made, and all sorts of things I could do better. The purpose of sharing this code is to elucidate a common pitfall.

Leetcode

A friend recently introduced me to Leetcode. It’s a great way to stay sharp for interviews. Despite my general distaste for the traditional interview process, only a fool thinks that they are above the game. Solving problems is fun, and sometimes messing with the project you’re working on comprises a shocking lack of problem solving.

I love having a wealth of problems with loads of test cases just a few clicks away. I hate clicks though: the mouse is the worst thing to happen to computers since the whole carriage-return-or-line-feed thing.

What joy did I have upon finding leetcode-cli, a wonderful command line tool that enables me to write, test, and submit leetcode solutions from the comfort of my own command line.

I started tracking my leetcode solutions in a GitHub repository. After all, if a dev writes code and doesn’t commit and push it, does it even happen? Honestly, probably not…

I’m not coding everyday these days, what with all the tutoring I have been doing and all the non-dev work involved in freelancing. Being able to see all the green boxes on my GitHub stats helps keeps me honest. But I digress! Let’s talk about the problem at hand.

String to Integer

The string to integer problem is pretty straight forward. Remove leading whitespace, only allow a + or - character before the number. Got it. Let’s get cookin’.

I chose javascript for the implementation on this one because I had an interview soon thereafter in that language and wanted to avoid the overhead of context switching.

I coded up a solution fairly quickly that followed this basic structure:

  • create a few variables first:
    • min: the minimum value of a 32-bit integer (-2147483648)
    • max: the maximum value of a 32-bit integer (2147483647)
    • isNegative: a boolean to track if the final return value is negative
      • defaults to false, since a number with no sign is assumed to be positive
    • nums: an array (initially empty) that will store each digit
  • iterate through every character in the input string (for loop)
    • store the ith element in a variable char
    • if char is whitespace, continue
    • otherwise, if char is +, continue
    • otherwise, if char is - set isNegative to true
    • otherwise, if char matches /\D/ (any non digit character), return 0
    • switch over the character and push the corresponding digit onto nums
      • you could use parseInt, but that feels like cheating on this problem
  • (outside of the loop now)
  • check for overflow!
    • if the length of the array is greater than 10 then return 0
      • the min/max value for a 32-bit integer is 10 digits long
      • the problem specifies to return 0 in this case
    • if the array is exactly 10 digits long then
      • set limit to min if isNegative or max otherwise
      • loop through the array of numbers
        • if the ith number in the array is greater than the ith digit in limit then return 0
        • if the ith number in the array is less than the ith digit in limit then break
  • (now we know that all is well and we can build up the number to return)
  • make a new value ret and set it to 0
  • loop through each element in num
    • ret += nums[i] * (Math.pow(10, nums.length - i - 1));
  • make ret negative if isNegative
  • return ret

Simple! Right? Well, yeah, but it also doesn’t work.

The problem with this solution (which I wish I had kept the initial code for) is that it does an awful job of validating the input, leaving many edge and corner cases unresolved.

A programmers hardest job is validation

What followed was nothing short of a disaster. I would run the code, get a test case failure, say “ah, yes, but of course!” and make corresponding changes. Rinse and repeat. After a length of time that I’m not proud of, I found myself frustrated and my code a hot mess. It still didn’t even work.

  • “What if there are leading 0’s?”
  • “What if the input has a space between the sign and the digit(s)?”
  • “What if we have multiple signs?”

Each time I encountered a test case that asked that question, I would add in another boolean variable to manage state. This was like playing whack-a-mole: one test case taken care of meant many more failed. Soon enough, I was looking at a steaming pile of spaghetti where my code once was. With roughly 1050 passing test cases of 1079 total, I scrapped this entire for loop in favor of input validation that was readable.

FSMs FTW!

I quickly drew up a very simple finite state machine on a engineering pad on my desk (which I can show here thanks to this tool).

0 1 2 /[+-]/ /\d/ /\D/ /\s/ /\d/

If only my code looked as clean as this FSM! I commented out the spaghetti that had taken up residence in the body of the for loop responsible for validating input, replacing it with three simple lines of code:

state = stateTransitions[state](str[i]).toString();
if (state === FAIL) return 0;
if (state === FINAL) break;

You can’t really remove complexity though: you can only move it around. I moved the complexity from this for loop, peppered with booleans trying their best to track state, to the following structure:

// define states
let FAIL = 'fail_state';
let STRIPWS = 'stripws_state';
let GETNUM = 'getnum_state';
let FINAL = 'final_state';

// set initial state
let state = STRIPWS;

// define state transitions
let stateTransitions = {
  'stripws_state': function(char) {
    if (/\s/.test(char)) return STRIPWS;
    if (/[+-]/.test(char)) {
      isNegative = /-/.test(char);
      return GETNUM;
    }
    if (/\d/.test(char)) {
      getNum(char)
      return GETNUM;
    }
    return FAIL;
  },
  'getnum_state': function(char) {
    if (/\d/.test(char)) {
      getNum(char);
      return GETNUM;
    }
    return FINAL;
  },
  'final_state': function(char) { return FINAL; }
};

// using parseInt feels like cheating...
let getNum = function(char){
  switch(char){
    case '0': nums.push(0); break;
    case '1': nums.push(1); break;
    case '2': nums.push(2); break;
    case '3': nums.push(3); break;
    case '4': nums.push(4); break;
    case '5': nums.push(5); break;
    case '6': nums.push(6); break;
    case '7': nums.push(7); break;
    case '8': nums.push(8); break;
    case '9': nums.push(9); break;
  }
}

This is so much cleaner, and about a million times more readable than what I had before. You can find the code in full on my github.

Lessons learned

The take-away here is a reminder of an age old lesson that can be put many different ways. “Measure twice, cut once” is what carpenters would say. “Plan before you act” is the general axiom.

My own hubris stopped me from taking the precautionary measures that must always be taken before starting to solve any problem. “I’ve been writing code for years!” I said to myself, “this is a simple problem, I’ll figure it out as I go.”

We often don’t realize just how many edge cases there can be at first glance (hence the age old joke about validation), and taking 30-60 seconds to think about the problem before hand can save you 30-60 minutes of programming frustration.