Recurse.se Yada yada on Software Development

21Jan/120

Fun with Functions

Now we are going to write some Scheme code in order to show off some features of functional programming. (As stated earlier, I’m actually using the Scheme-dialect Racket, but that shouldn’t matter too much for the discussion at hand.) We’re going to write some code, refactor it into more reusable parts, and hopefully become enlightened when we discover the power of passing a function as a parameter to another function.

Let’s write some code, already!

Let’s assume you are given an assignment to create a number of reports for a small retail chain, regarding the use of their Gift Cards. As a programming exercise you decide to do them in Scheme (since on-the-job training is something you have to do every now and then to keep sharp, right?)

Your Lambda Card, Sir!The base data you have is a list of gift cards; gift card ID, issuing store, date of issue, and amount. If the card has been used there’s also the using store, and date of usage. You decide that the first thing to do is to write a Scheme procedure that gives you the sum of all gift cards issued at a particular store, and then move on to more complex queries.

I’m going to skip some parts, such as how you would read your base data from a file or database. (As a side note, I’m using the words ‘procedure’ and ‘function’ to both mean the same thing; a “subroutine” of a larger program).

The resulting code turns out something like this:

 1 ; get the sum of all gift card issued by my store (id "SE1")
 2 (define (sum-issued-here giftcards)
 3   ; define a helper function to recursively iterate over the list,
 4   ; building up the result in an accumulator
 5   (define (sum-issued-here-iterator remaining-giftcards accumulated-sum)   
 6     ; if there are no more gift cards, return the accumulated sum
 7     (if (empty? remaining-giftcards)
 8         accumulated-sum
 9         ; otherwise, process the current gift card
10         (let* ([current-gc (first remaining-giftcards)]
11                [still-remaining-gcs (rest remaining-giftcards)]
12                ; if this was my card, update the sum, otherwise keep it the same
13                [new-sum (if (string=? "SE1" (issuing-store current-gc))
14                             (+ accumulated-sum (amount current-gc))
15                             accumulated-sum)])
16           ;  and recurse on the remaining items
17           (sum-issued-here-iterator still-remaining-gcs new-sum))))
18   
19   ; call the helper function with initial values
20   (sum-issued-here-iterator giftcards 0))

I’ll try to explain what this code does. It’ll help if you do understand most of what it does. At line 1 we define a function called sum-issued-here that takes a list of gift cards as input. Then line 5-17 defines a helper function called sum-issued-here-iterator (which is not visible outside the definition of sum-issued-here) that does the actual work, and the helper function is then called at line 20 to give the result of the procedure. In Scheme, there’s no return keyword (or function). Instead the return value of a function is the value of the last expression evaluated. The same is true for most other scheme statements.

The workings of the helper function

The helper function is built on a very common pattern in functional programming; recursive decomposition. Here the problem is divided into two parts, first perform some work on the first element of a list (lines 13-15), and then recurse (line 17) to solve a smaller sub-problem (the work on the remaining items), until you reach the base case when the list is empty (line 7) which ends the recursion. In this case the helper is constructing the answer using an accumulator that is updated in each recursion, and then returned in the base case (line 8).

At line 10-13, we bind three identifiers to their values using a variant of let, called let*:

  • current-gc is bound to the first element of the remaining-giftcards list
  • still-remaining-gcs is bound to a list with the subsequent elements of the remaining-giftcards list (all but the first)
  • new-sum is updated, or not, depending on which store issued the current gift card. For simplicity, you’ve decided to hard code the ID of your nearest store, “SE1”.

The functions amount and issuing-store are property accessor functions for the gift card structure, they obviously return the amount, and the issuing store, respectively, for the provided gift card (their definitions are trivial and not shown). string=? and empty? are simple predicate functions (returning true or false) testing for string equality and the empty list, respectively.  The Spiral of Recursion

Note that we’re not actually updating any “variables”. Everything is immutable, and no values are harmed during the execution of these functions. It may seem like we update the accumulator at least, but in effect, a new accumulator is created at each step, based on the value of the previous accumulator.

You may suspect that this function could cause a stack overflow error if given a really long list, but that won’t happen. We’re not actually using up any stack space for our recursion, regardless of how many times the function calls itself. This is because the above helper function is tail-recursive, which in Scheme is as efficient as a standard iterative loop! So it is really pure iteration, coded as a recursive algorithm.

Your second report, and a Déjà Vu

You then continue implementing a completely different report function, one that should give you the earliest issue-date for all gift cards used in a particular store. When you look at the resulting implementation, you cannot help but notice that it is more or less identical to the previous one:

 1 ; get the earliest issue-date for all gift card used in my store (id "SE1")
 2 (define (oldest-used-here giftcards)
 3   ; define a helper function to recursively iterate over the list,
 4   ; building up the result in an accumulator
 5   (define (oldest-used-here-iterator remaining-giftcards minimum-date)   
 6     ; if there are no more gift cards, return the minimum date found
 7     (if (empty? remaining-giftcards)
 8         minimum-date
 9         ; otherwise, process the current gift card
10         (let* ([current-gc (first remaining-giftcards)]
11                [still-remaining-gcs (rest remaining-giftcards)]
12                ; if this was used here, update the minimum-date, otherwise keep it the same
13                [new-minimum-date (if (string=? "SE1" (using-store current-gc))
14                                      (earliest-date minimum-date (issue-date current-gc))
15                                      minimum-date)])
16           ;  and recurse on the remaining items
17           (oldest-used-here-iterator still-remaining-gcs new-minimum-date))))
18   
19   ; call the helper function with initial values
20   (oldest-used-here-iterator giftcards today))

Really, the only differences are that it calls different functions in a few cases. These new functions should be fairly self-explanatory; using-store and issue-date are two additional property accessor functions for the gift card structure, and earliest-date is a “minimum” function for dates, and so returns the smallest/earliest of two dates.

Refactoring functionally

At this point, you are thinking that this probably could be refactored in some way, but it’s not immediately clear what kind of rewrite would let us share parts of the implementation for our gift card report functions. As a first rewrite, you decide that you could probably implement a separate function for the part of the first report that finds gift cards for a particular store, and returns a list of such gift cards.

This already filtered list could then be used as the input for a rewritten, simpler version of the sum-issued-here function you wrote earier, that don’t have to test each gift card for inclusion, and it could be used for several other report functions that do work on lists of cards issued “here”. But it won’t be useful for the “oldest-used-here”-function, as that one needs a list of gift-cards used (not issued) in a particular store (bummer!).

Nevertheless you write something like this:

 1 ; get a list of gift cards issues in my store
 2 (define (issued-here giftcards)
 3   ; define a helper function to recursively iterate over the list,
 4   ; building up the result in an accumulator
 5   (define (issued-here-iterator remaining-giftcards result)
 6     ; if there are no more gift cards, return the minimum date found
 7     (if (empty? remaining-giftcards)
 8         result 
 9         ; otherwise, process the current gift card
10         (let* ([current-gc (first remaining-giftcards)]
11                [still-remaining-gcs (rest remaining-giftcards)]
12                ; if this was my card, include it in the result, otherwise not
13                [new-result  (if (string=? "SE1" (issuing-store current-gc))
14                                 (cons current-gc result)
15                                 result)])
16           ;  and recurse on the remaining items
17           (issued-here-iterator still-remaining-gcs new-result))))
18   
19   ; call the helper function with initial values
20   (issued-here-iterator giftcards empty))

Now, where have I seen this before? Hey wait a minute! This silly silly function keeps repeating itself with only minor variations! This time the accumulator is also a list, and we’re using a function called cons which is the list constructor. It creates a new list by placing the current-gc in front of the (initially empty) result list. The biggest change is perhaps that we no longer try to get at a property of the gift card, and instead work on the gift card itself.

Your very own Lambda-shaped lightning boltAnd then, finally, enlightenment!

What happens next is that you, out of the blue, are struck by a lambda-shaped lightning bolt of pure functional enlightenment! It makes you feel great! Parenthesis suddenly seem like the best thing since curly braces, and prefix arithmetic now seems… less awkward.

You then suddenly realize, why oh why didn’t you see this earlier, that you could parameterize this damned function with another function! It’s just so obvious now! Then you could control what it actually does inside the loop, and have it dance to your tune! It no longer needs to be specifically for finding particular gift-cards, it can do anything, everything, mo ha ha ha!

After coming to your senses, you promptly rewrite it so:

 1 ; the 'where' function could probably prove useful, as its filtering
 2 ; behavior is controlled by the user-supplied predicate? function!
 3 (define (where predicate? list-of-items)
 4   ; our well-known helper
 5   (define (where-iterator remaining-items results)
 6     (if (empty? remaining-items)
 7         (reverse results) ; restore original ordering
 8         (let* ([current-item (first remaining-items)]
 9                [still-remaining (rest remaining-items)]
10                ; We're now letting the predicate? function decide
11                ; if the current-item should be included or not!
12                [new-results (if (predicate? current-item)
13                                 (cons current-item results)
14                                 results)])
15           (where-iterator still-remaining new-results))))
16 
17   ; call the helper function with initial values
18   (where-iterator list-of-items empty))

Now, this function is quite general and useful! It takes a list, and builds a new list containing all items for which the function passed in returns true. The ability to use functions like this, the same way you can use say a number (passing it as a parameter to another function, storing it in a variable, and returning it as a result from a function) is perhaps the defining characteristic of functional programming.

Note that as the list constructor function cons puts new items at the front of the list we need to reverse the result list as the last step, in order to keep the result in the same order as the original list, which seems like a good thing to do. It didn’t really matter in our earlier implementations.

Putting ‘where’ to work

You then write two simple one-liners; a predicate function issued-here? that checks if the provided gift card was issued at the “SE1” store or not, and another called used-here? that checks if the provided gift card was used at the “SE1” store or not.

(define (issued-here? gc) (string=? "SE1" (issuing-store gc)))
(define (used-here? gc) (string=? "SE1" (using-store gc)))

You can then use them together with your new where-function like this:

(sum-amount (where issued-here? giftcards))
(sum-amount (where used-here? giftcards))

to sum the giftcards issued, or used, at the “SE1” shop! Wonders upon functional wonders! This is the kind of code we’d like to write! Will you ever need to see a recursive helper function again, you wonder! In the next post, we’ll try to continue our refactoring of our Gift Card report functions, and show the definition of the sum-amount function used above.

12Jan/120

A Functional Language

In this post I show the basics of the language Scheme, in order to be able to use Scheme in later posts to prove a few points on functional programming as a useful and powerful paradigm. Scheme is chosen for it’s simple syntax and minimalistic design.

Choice of language

I decided to start my journey towards understanding functional using F#, a new functional programming language from Microsoft. This was because it seemed like the perfect choice for someone like me who knew C# and the .NET Framework, as you can use .NET Framework classes from within F#, as well as easily call your F# code from C#, and vice versa. But I quickly began worrying that I wasn't doing it right, as F# is really a multi-paradigm language in which you can code in an imperative style, and I felt I was doing Frankenstein-style code (Frankencoding, if you will).

I therefore looked at other functional languages, that would not let me drift back into the imperative world so easily, with the intention of coming back to F# later. The weird looking syntax and terminology of Haskell really threw me off, and Erlang was also quite confusing. Lisp looked more promising, except for a bit of initial parenthesis angst, but it was when I learned more about Scheme, a beautifully minimalistic Lisp dialect, that I knew I’d found what I had been looking for!

Scheme schematics

I started toying with Scheme using PLT Scheme. (Since then that particular dialect of Scheme I’m using has been renamed Racket but it’s still “Scheme” for all our intents and purposes).

Key points about the Scheme language are:

  • Truly minimalistic design, meaning very few keywords and syntactic forms
  • Strongly but dynamically typed
  • Everything is either a value, or a list of values. The list is the primary data structure.
  • Functions are values too, and can be stored and passed around
  • As with all Lisp dialects, the code itself is list structured, as all statements are list-formed expressions
  • All statements are also expressions returning a value
  • All values are immutable (unchangeable), even lists
  • A powerful macro system for rewriting code at compile time

First let’s look at the syntax, and then we can elaborate on some of the other key points. The explanations of some things will have to wait until later posts, in order to keep this post to the point, but feel free to ask in the comments.

The Syntax of Scheme

This table summarizes most of what you need to know about Scheme syntax:

Concept Syntactic form Examples
Identifier ‹id› convert
bigger?
append-name
pos
Definitions ( define ‹id› ‹expr› )

Binds the identifier ‹id› to the value of the provided expression.

(define num 42)
(define powers (list 1 2 4 8 16))
Conditionals ( if ‹expr1› ‹expr2› ‹expr3› )

if expr1 is true, evaluates expr2 otherwise evaluates expr3

(if (empty? lst)
    (min num lower-bound)
    (max num upper-bound))
Local bindings ( let ([‹id› ‹expr›]) ‹body-expr› )

Binds one or more identifiers to the value of the provided expressions. The identifiers are visible only inside the body-expr.

(let ([a (random 10)]
        [b (random 10)])
   (if (> a b)
       "A was bigger than B"
       "B was bigger than A"))
 
Function calls ( ‹id› ‹expr›* )

When ‹id› is bound to a function, invokes that function with the value of the remaining expressions as parameters

(write-log "num is ~a now" num)
(sum 2 8 5 6) 
(multiply 2 (sum num 1))
(exit)
Function definition syntax ( define ( ‹id› ‹id›* ) ‹body-expr› )
binds the first id to a function which takes parameters named by the remaining ids, and the body-expr as the implementation
(define (max a b)
          (if (> a b) a b))
Anonymous functions ( lambda ( ‹id›* ) ‹body-expr› )

creates an unnamed function which takes parameters named by the ids, and the body-expr as the implementation

(lambda (a b) (if (> a b) a b))

And, yes, I’m skimming over some parts, but this will get you a long way. All these are also expressions in their own right - with the exception of the define form which returns no useful value - which means you can do something like in this silly example:

(define min/max (let ([num 3])
                  (if (negative? num) 
                      (lambda (a b) (if (> a b) a b)) 
                      (lambda (a b) (if (> a b) b a)))))

where min/max is defined as either a function returning the minimum of two values, or another function returning the maximum of two values, depending on the sign of num (and num is not visible outside the let). Note that whitespace and indentation is not significant in Scheme.

No operators, but there are always functions

You are probably used to writing expressions such as 1 + 2 * 3 * 4 + 5 / 6. where “operators” like + and * are “infix” (placed in the middle), between two sub-expressions.

In Scheme there’s no infix operators, as these operators are instead ordinary functions, and thus written in prefix form (placed in the beginning) just like any other function call in Scheme. The above expression then becomes;

(add 1 (multiply 2 3 4) (divide 5 6))

Or, rather, since the add, multiply and divide functions is actually named +, * and / (which are all perfectly valid Scheme identifier names), it actually becomes:

(+ 1 (* 2 3 4) (/ 5 6))

Yes, this feels awkward when you are used to the traditional way of writing expressions, but there are compelling reasons. This rule makes the syntax far more regular, if nothing else. And you may freely store and pass around these functions, just like any other value or function. In C#, even though you can pass around functions nowadays, you still cannot pass around the operators as if they were functions.

Note that prefix arithmetic like the above is a property of Lisp-like languages like Scheme, and has absolutely nothing to do with functional programming per se. So don’t write off FP just because prefix arithmetic at first glance seems like the dumbest thing since 7-bit ASCII!

Comparison operators are also functions

Using the same logic, comparison operators like ‘and’, ‘or’, <, >, and = are also functions, so you would write:

(or (and (> a 0) (< b 0)) (= a b 10))

instead of say (a > 0 && b < 0) || (a == 10 && b == 10) in C# or Java. (Note that and/or are really “special forms”, as they short circuit, and returns the last value that needed to be evaluated in order to determine the result)

Now I think you know enough Scheme to get your feet wet! Let’s fool around with it in the next post.

Tagged as: No Comments
9Jan/120

A Paradigm Shift

I started out, like most programmers do, writing imperative code. In recent years, however, I have begun to understand the power of functional programming. By understanding the thinking from a different programming paradigm you can become a better developer, even if you don’t switch to a different language or framework. You can use functional programming techniques in C# or Java, and properly applied, they are sure to improve the quality of your code! This series of posts tries to explain why anyone would want to take a closer look at functional programming.

In the Beginning was the Code

The Emperor of the Imperative EmpireI first learned programming in BASIC on the Apple IIc in the mid eighties. Incidentally it was Microsoft that supplied BASIC for a majority of the popular home computers of the seventies and eighties, including both the Apple II and the C64, and it was selling licenses for Microsoft BASIC that made Bill Gates his first gazillons.

After many years of doing BASIC I then learned a couple of other languages, including C and some Pascal. I also tried to learn C++ on my own, but didn't get it until at university when I was taught object oriented design and development. I also learned a bit of Java. When I was employed I learnt Visual Basic and later C#.

… and the Code was Imperative

Superficially these languages are rather different from each other, but the common denominator for all these languages is that they are all imperative languages! “Imperative” comes from the Latin word for “to command” and this means that programs are constructed by setting up an initial state (a collection of variables with known values) and then executing a sequence of statements that manipulate the program state.   

A simple example would be calculating the sum of an integer array, which looks basically the same in any of these languages.

int sum = 0;
for(int i = 0; i < values.Length; i++)
{
    sum = sum + values[i];
}

Here, the program state is contained in the two variables sum (keeping track of the running sum) and i (keeping track of the current iteration index). Both of these are initialized to zero, and then updated in each iteration of the loop. The three different statements are clearly imperative (since they order the computer “Do that”, “Change this”), and modifying state:

  • declaration, allocating (and initializing) a variable
  • iteration, the for statement explicitly manages the iteration, allocating, initializing and updating the iteration index as well as checking the termination condition before each iteration
  • assignment, which change the assigned variable

Computer Hardware is Imperative

Most programming languages are imperative, because this is an abstraction that closely models the way the computer hardware works, where executing (imperative) machine code instructions changes the state of the machine (such as the CPU registers or memory contents).

Object oriented languages like C++, C# and Java improve this model by encouraging encapsulating and protecting most state inside objects instead of using a global scope, but are nevertheless imperative in nature, as they all emphasize changes in state. Indeed the point of most instance methods are modifying (or exposing) the object's internal state.

Introducing Functional Programming

The all-powerful Lambda

However, a few years ago I decided it was time to look at an entirely different class of languages, namely functional languages. Now, I’m not talking about functional as in if something works or not. The term “functional” is used because  of the emphasis on application of “mathematical functions”, instead of changes in state. I’ll explain some of the important differences between imperative and functional programming in a later post. The rest of this post is aimed at helping you understand why it could be interesting to know more about functional programming.

Higher productivity and quality

I had read articles like Why Functional Programming Matters, Beating the Averages and Lisp as an Alternative to Java, which all claim you can get higher productivity and quality using functional programming. Statements such as [functional programming] is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days from How To Become A Hacker was also intriguing.

All this made me interested in functional programming (or “FP” for short), but did not really help learning how or where I could apply FP techniques without learning a whole new language. It was still also unclear to me as what FP was all about, exactly. I just didn’t get it.

My curiosity was further prompted by discussions on FP with Adam Petersen, who I had the pleasure of working with for a few months when I was doing a consulting job using C++. I was looking at the usage of the C++ STL and boost libraries in the existing code base, and even though I knew C++ well (or so I thought) I still could not fully understand some of the impossibly terse and elegant code that Adam put out, much less write code like it, as it seemed to require a different way of thinking. It was not that it did something I couldn’t have written, but I would have written the same thing much differently, using at least three times the number of lines of code, in a manner that felt clumsy in comparison. Adam, of course, used functional programming techniques in C++.

No Assembly Required

I was at this point certainly interested in “getting” functional programming, as it now seemed clear it was worthwhile to learn, and useful even when not actually programming in a FP language. However, learning FP would not prove very easy! Learning the syntax of a new language is easy, learning a different way of thinking is much harder. I basically had to rewire the programming patterns etched into my brain by 25 years of imperative thinking, and learn new problem solving techniques using a fundamentally different way of thinking. My first attempts in using functional languages had a heavy imperative accent.

However, don’t start thinking that this FP stuff is much too hard, or that you will have to spend years and years until you finally get it. Yes, doing a cold turkey switch to a FP language without a mentor would probably be abysmal for your productivity, but the good news is that it is entirely possible to benefit from FP in smaller chunks than what I tried to bite off, and that you can get clear benefits from using FP within the comfort of your primary programming language (especially C#).

In the next post I'll try to elaborate on what it is, exactly, that makes functional programming so powerful, compared to imperative programming.

11Dec/110

Implementing for Robustness

Focus: Robustness
Description: Set robustness goals, and consider what should happen, and what should not happen in the face of likely (and less likely) errors.

In the last post, we were able to increase robustness, by modifying the design of an order import application separating the process of downloading orders from the process of importing them into the warehouse system (see previous posts for more details on the example we're working on). The redesign required some understanding on the existing design’s effect on the robustness of the system, and some experience to decide what to do about it.

It may be argued that we traded complexity for robustness as we split one component into two, as we need to add some glue to integrate the two components. However, each of the two new components are now conceptually simpler and smaller than the combined one they replace, and it should therefore be simpler to implement each part in a robust manner, as now error handling for the download process does not need to be intermingled with error handling for the import process. Also the glue needed in this case is more or less just a shared folder. Modularization and componentization of your software is therefore often a good thing when it comes to robustness.

Now lets consider the implementation of each component, in this post we take a look at the the downloading process, and we leave the importing process for a later post. Please note that I’m going to have to show simplified implementations, or they will become way too long to reason about in a single post. But I hope they will be able to get the message across anyway.

We’ll start by setting some robustness goals for our processes.

  1. Errors should not cause the processes to exit or stop processing.
  2. A single order file should not be able to block further processing.
  3. Orders should never be permanently lost.
  4. All correct orders should be imported exactly once (not zero, nor twice).

These goals will help us take appropriate action when handling errors.

It will be rather impossible to guarantee we’ll meet all goals, especially the first one, regardless of which errors will occur. In the face of some errors, such as the complete loss of a necessary resource such as disk space (or power), or an access denied failure that needs manual administrator intervention to correct, we may have no other option but to stop processing. However, we should hopefully be able to resume processing when the error condition is cleared, without needing a restart.

Also, if we suffer a hard drive crash after downloading orders, but before importing them, some order will probably get lost. There’s always a trade-off between the cost of failure, and the cost of preventing failure.

The Order Downloading Process

How can we make the downloading robust? We need to consider the problems that could occur. Some likely errors are:

  • Network issues, that could temporarily prevent access to FTP Server, or affect downloads in progress, or make the file server unreachable.
  • Permission issues, such as not being able to delete a downloaded file from the FTP, or even log in to the FTP.
  • File system issues, such as disk full, file already exists, or access denied errors.
  • File format issues (empty files, unrelated files placed in the download folder).

(and yes, there are lots more of potential issues, but to list more takes focus away from what I want to discuss.)

We will not try to interpret the files when downloading them, as that task is better left for the import process, that clearly must understand the contents of the order anyway.

  1:     void DownloadOrders()
  2:     {
  3:         do {
  4:             try {
  5:                 FTPServer webshopFTP = FTPServer.Open(webshopFTPaddress, user, password);
  6:                 foreach (RemoteFile remoteFile in webshopFTP.Files("Order*.xml"))
  7:                 {
  8:                     DownloadFile(remoteFile)
  9:                 }
 10:             } catch(Exception ex) {
 11:                 Log(ex);
 12:             }
 13:             Pause(retryDelay); 
 14:         } while(!doomsday);
 15:     }
 16:     
 17:     void DownloadFile(RemoteFile remoteFile)
 18:     {
 19:         try {
 20:             string tempFile = DownloadToTemporaryLocation(remoteFile);
 21:             MoveToOrderImportFolder(tempFile);
 22:             // if it cannot be deleted from the FTP, it will be downloaded again later
 23:             remoteFile.Delete();
 24:         } catch(Exception ex) {
 25:             Log(ex, remoteFile);
 26:         }
 27:     }

In order to meet goal 1, we’ll simply add an outer catch-all exception handler (line 10). We also add a short pause to the outer loop (line 13), not to overload any services, if there’s an error.

In order to meet goal 2, we will use a file mask (Order*.xml, say) to include only relevant files. Even if we have a folder on the FTP server dedicated to the order files, an irrelevant file might accidently be placed there and this could potentially interfere with processing - say, a 12GB file called ‘webshopdb.bak’, or thousands of image files, would likely cause issues and, if nothing else, consume time and download bandwidth unnecessarily. We’ll also add an inner catch-all exception handler (line 24) to catch errors while processing individual files.

In order to meet goal 3, we will only delete files from the FTP once we are certain we’ve successfully stored a local copy (we’ll reach line 23 only if line 20 didn’t throw an error).

As the code is written now, if we cannot delete a file from the FTP once downloaded, we will download many copies of this order and place in the import folder (and keep doing so until it can be removed from the FTP). This means that the import process must be able to handle duplicate orders, or we’ll violate goal 4! If we instead tried deleting before calling MoveToOrderImportFolder, undeletable files would delay the orders instead of creating duplicates, so it’s a trade-off.

We’ll take a closer look at the order import process in the next post.

Filed under: Design No Comments
21Nov/110

Designing for Robustness

Focus: Robustness
Description: Robustness is not just about exception handling! System architecture and design plays an important role!

In the last post, we introduced an example of an application that clearly needed to be robust. Now we need to figure out how to make design decisions that help us achieve that goal.

A naive implementation of the order import application from the last post - fetching orders from a web shop via FTP, importing them into the local warehouse system - is something like the following pseudocode;

    void ImportOrders() 
    {
        do { 
            FTPServer webshopFTP = FTPServer.Open(webshopFTPaddress, user, password);
            foreach (RemoteFile remoteFile in webshopFTP.Files)
            {
                XmlDocument orderDoc = XmlDocument.Load(remoteFile.DataStream);
                ImportOrder(orderDoc);
                remoteFile.Delete();
            }
        } while(!doomsday); 
    }

Some may think that making the processing robust simply amounts to adding a try-catch inside the inner loop in order to catch per-file errors, and another try-catch inside the outer loop. While I agree that it would indeed make this implementation considerably more robust, it’s a bit like thinking that adding sprinkles to your vanilla ice-cream makes it a five-star dessert! Its better, but there’s certainly room for improvement!

We need to consider robustness before rushing to do implementation. We need to consider the various error scenarios. For each scenario we need to consider our options to improve robustness in the face of errors, and find an appropriate strategy to handle the error, such as retrying, fallback to an alternate strategy, ignore the error and continue, or throwing an exception, aborting processing.

Let’s start with one of the most likely sources of errors, the network connection to the FTP Server.

Unnecessary dependencies

So, your application might not be able to reach the FTP Server at all times. No amount of try-catch will help you reach it when the network is down. But what would improve robustness, then? Well, if we did not need to reach it at all times!

One important thing to realize is that the naive implementation needs both the FTP Server and the Warehouse system up and running at the same time, in order to import anything. It may seem like they obviously must both be up, because we are in a sense moving orders from the FTP to the warehouse system. But this is not really true, and this misunderstanding has a negative effect on reliability.

Imagine what would happen if for some reason the network connection to the FTP server was unexpectedly lost for a couple of hours. Maybe the FTP Server is located within driving distance, and someone in desperation fills a USB flash drive with hundreds of orders from the web shop and drives to the warehouse, handing over the drive to you. Unfortunately, this would be of no immediate help, as the naive application would not be able to process these orders from the removable drive, as it only knows how to fetch orders from an FTP Server! Sure, if you’re lucky you might be able to do an rather stressful ad-hoc set up of a local FTP Server and re-configure your application to use that, but there are other problems with this design.

This version of the application is also always downloading and importing one order at a time. Let’s assume that downloading is rather quick, and takes no more than a tenth of a second per order, but importing into the warehouse system is rather slow, lets say around 30 seconds per order. If you were told well in advance that the warehouse external connection will be down between 10 to 12 for maintenance, your application would still not be able to do any processing between 10 and 12. Even if there were hundreds of orders waiting at the FTP Server by 9:50, your application would only manage to fetch and process about 20 orders until 10:00, and then processing would stop when the application was unable to fetch another order.

Independent processes

Clearly, we can improve both of these scenarios by re-designing the application into two independent processes, separating downloading from importing. If we do that, the download process would be able to easily download a couple of hundred orders in less than a minute, and you are likely to have an empty order queue at the FTP when the network goes down. Then the importing process could churn away on two orders per minute during the two hours of network maintenance.

Even if we were not that lucky, and hundreds of lunch orders was placed at the web shop at around 11 (temporarily out of reach from your application) the “manual move” USB flash drive trick would then be simple to perform if needed, just a matter of dumping a batch of order files into the application processing folder.

Either way, this design is way more robust than the naive implementation, regardless of how much error handling is put into the latter! Now, we are likely not to even notice shorter periods of downtime.

At an infrastructural level, you may also be able to improve robustness by installing network devices that can fallback to a secondary internet connection, if the primary connection fails. Note that this is no replacement for separating downloading from importing, however, as there are still scenarios where the separation is beneficial, even if the network would be 100% reliable.

Filed under: Design No Comments
13Nov/110

A Robust System

Focus: Robustness
Description: Make sure the system is able to handle and recover from failures

Robustness is a very important property of a system, especially for unattended and/or long-running operations. You should be able to estimate for each module, component, or function you are building, what the need for robustness is, and the things that are most likely to go wrong. You then have to make sure those things do not compromise your system’s overall functionality in a manner more severe than the error actually calls for.

There’s no reason your system shouldn’t be able to gracefully handle at least all of the common errors, most of the anticipated errors, and even a lot of the unanticipated errors that can, and will, occur at run-time. A small, commonly occurring error should not be able to cause a lot of trouble!

An Example

Let’s say you are employed by the FruutXpress company that is in the business selling fresh fruit with express delivery to customers in your city via their web shop. The customer orders needs to be shipped pronto, and there’s a warehouse full of fruit and 30 workers to pick the incoming customer orders into boxes and onto the delivery trucks.

Your task is implementing a program that imports the orders from new FruutXpress web shop, into the company warehouse system. We are told that the incoming orders are available as XML files, one order per file, on an FTP server for your import program to read. You will use the warehouse system’s rather simple API, that will let you query and import data.

On the surface a rather simple task, right? I mean, how hard can it be, just reading a couple of lines of some text file containing how many of which fruits have been ordered, a delivery address, and not much more, and then importing this into the warehouse system (which then can print out picking lists, shipping labels, and whatever else is needed for the warehouse personnel to do their work).

However, when thinking about it some more, you realize it’s a rather important function that is absolutely critical for the FruutXpress company. If your program would stop working, the customers would still place orders at the web shop, but the orders will never reach the warehouse system, and the delivery trucks and workers would sit idle, because they wouldn’t get any new orders to pick. Or rather, they would be running around screaming about probably having to work overtime in order to fulfill all pending deliveries, and you’d have warehouse management on the phone within minutes complaining!

Clearly you must design and build a really robust import program! It’s important that it does its job well, and only rather catastrophic errors, such as the network being down, or a server crash, should be able to force it to abort processing. It should be able to keep processing orders, even in the face of many different error conditions.

More about this in the next post.

Filed under: Design No Comments
5Nov/110

Non-functional requirements

Focus: Functionality
Description: Make sure system meets even the unstated requirements!

If you are dealing with a customer with little experience of specifying software requirements, they are likely to only include functional requirements in the specification. They will likely leave out most or all of the non-functional requirements. I’m not talking about things that doesn’t work, I’m referring to a requirements that does not directly specify functionality the system should provide, but instead specify how the system should be, describing a quality of the system. These requirements, even though often unstated, are often crucial to the eventual success of the system.

Most of our other system properties are non-functional in nature, and thus I’m arguing that capturing these non-functional requirements are the key to actually providing the customer with a high-quality system, making it functional by design.

If we’re building a web-shop, there’s likely a requirement to send each order to the warehouse for picking. This is an entirely functional requirement. There could be multiple unstated non-functional requirements hidden in this requirement.

  • It does not say what delays would be acceptable (from the customer placing the order to it being available for picking in the warehouse) - a non-functional requirement related to performance.
  • It also might not say anything about the importance of not crashing or stalling upon trying to import an incomplete order (say one without an OK delivery address, or for obsolete items) - a non-functional requirement related to robustness.

This web-shop system can comply completely to the functional requirements, but still infuriate the customer because it doesn’t fulfill the (unstated) non-functional requirements. Consider if it hangs when receiving a single incomplete order, blocking further order processing, needing manual intervention to restart, or if orders from the web-shop are being imported to the warehouse system at a rate far slower than the customers are placing them at the web shop during peak hours, causing unwanted delays in picking.

Obviously, the best would be if all requirements were stated, instead of unstated, but in my experience customers seldom includes non-functional requirements. Of course, that doesn’t stop them from being unhappy when the system does not meet the unstated requirements (“but it’s obvious this shouldn’t take this long!”)

Don’t underestimate the power of the stuffing

Possibly the customer hasn’t even fully considered all these details when you are starting to build the system. Since not all details are known in advance, you (or your team) will have to fill the gaps with some “stuffing”, when you get down to that level of detail. Very often this stuffing corresponds a great deal with any unstated non-functional requirements.

The quality of the stuffing you and your team fill the system with, will be directly proportional to your understanding of the problem (and problem domain), and also to your development experience. Yes, you can ask questions, but if you ask too many, you’ll start getting contradictory answers or annoy your customer who had the hope you would be able to take sensible action on your own.

Domain knowledge and experience will tell you when the customer says one thing but really means another. It will help you decide to make configurable those things that are likely to change (even if the customer didn’t explicitly tell you so). It will help you make the right decisions, and guide your development towards a system that is not merely implementing each requirement, but is functional by design, and where our other system properties have been considered, and added in an amount appropriate for the system you are building.

The importance of experience and domain knowledge should not be underestimated! A team of developers, lacking either experience or domain knowledge are likely to get several things wrong at first. A team lacking both are more or less guaranteed to get most things wrong, and are even unlikely to ever get a moderately complex system to a point where it’s mostly working. This is because the inexperienced team hasn’t yet learned to consider the unstated non-functional requirements, but hopes to achieve full functionality by more or less independently working their way through implementing the requirements to the letter, one by one.

This reminds me of an assignment in programming class at university, where we were implementing a text editor, and had a requirement to warn the user of unsaved changes if he tried to exit. Some students then simply let the editor exit with a message like “Warning, you had unsaved changes” printed on the console.

Filed under: Design No Comments
2Nov/110

A Functional System

Focus: Functionality
Description: Make sure the system meets the stated requirements!

Surprisingly, when discussing design, it's easy to forget about perhaps the most important design criteria, namely that the system should be able to provide the required functionality! I actually didn’t include this in my first draft of important design criteria.

Perhaps it is thought that the necessary functionality is expressed solely in the requirements specification, and then realized by the implementation, and that the design is in some way orthogonal to this. However, a sloppy system design can ruin even the most well-specified requirements. If nothing else, it might set some unfortunate decisions in stone (well, code), and make other requirements hard or impossible to meet.

Examples

That quick and dirty hack you did a couple of months ago – you know, the one where you remove obviously invalid customer email addresses on the incoming orders from the new web shop (because those caused issues at your end, and the consultants had far more pressing issues than improving email validation rules).

You know the hack doesn’t catch all possible errors, but it clearly improved the situation, and has been running OK since day one. Even if that hack had no other of the desirable properties we are discussing here, it was functional, and provides value, or you would've removed it from production.

And that new web shop, if it isn’t functional – if it does not even provide the basic shopping functionality that your company wanted it to have – then it’s in effect worthless! This holds even if it would have a whole host of other properties (including some less desirable properties such as costing too much).

Obviously, a more complex system can provide value, even though it does not do everything exactly the way the customer wanted. It may have bugs, or a huge misunderstanding that makes one module clearly less useful than intended, but it can still be functional as a whole.

How to achieve functionality by design

The most important tools for achieving functionality are:

  • A good requirements specification. (Hah! as if that’ll ever happen!)
  • Good communication with the stakeholders (the customer, the users etc.)
  • A good understanding of the problem domain. (Domain knowledge.)
  • General development experience. (The more, the merrier!)
  • Several releases of the system. (Since you will never get it right the first time!)

It is important to note that a requirements specification, or even communication with the customer, will never in itself give the whole picture. At least, I’ve never been on such a project, and I’m pretty sure I never will. This is because the customer won’t be able to get every tiny detail down on paper, and won’t even be able to tell you everything during meetings.

The more complex the system, the less likely you are to get the whole picture up front before starting development. In all likelihood, no one (including the customer) will even understand the full system completely at that point, so understandably they cannot give you all the details.

It is also important to realize that it is very common that many quite important requirements are not actually stated explicitly in the requirements specification! There is a whole category of requirements that the customer often leave implicit and unstated, but that we should nevertheless understand if we want to build a great system. More on this in the next post!

Filed under: Design No Comments
23Oct/110

A Question of Scale

In the previous post we decided that there are a number of desirable properties that we'd like a software system to have. I'll quickly repeat the ones I mentioned; A system should be Functional, Testable, Robust, Monitorable, Deployable, Scalable, Adaptable, Efficient and Elegant.

(There's probably ample opportunity to create a catchy and memorable abbreviation out of these words, but let's not.)

These properties can be achieved at different scales, or levels. By that I mean that they are affected by architectural (large scale) and design (medium scale) decisions as well as implementation (small scale) decisions.

So, in this series of articles, I am using the word "design" in a very broad sense, and not something limited to the designer role. These design principles is about reasoning how to create software, not just by hacking away on the problem at hand, but instead by having a plan at a higher level, describing how the system should be built to get a professional result by design (i.e. on purpose, not by accident). This plan should then affect everyone on the development team.

I could have called this series "friendly advice on how to increase the level of professionalism in your software development effort", but choose to use the word "design" to the same general meaning.

Role playing

First I'd like to elaborate a little on the differences and interconnections between the roles of architect, designer and implementer, as I will not place much focus on these individual roles later in this series.

I consider these roles to be more or less a matter of on which scale one operates on. The architect considers issues on the largest scale, how the whole will operate as a function of the major components, while the implementers (craftsmen) considers the issues on the smallest scale, performing the practical construction of the needed parts according to the designs.

The designer and architect roles are really only needed because sufficiently large systems have far too many "moving parts" for any one person to understand the entire system in any detail, and it becomes more efficient having some people focus on the general higher level goals, and delegate the lower level details on how to reach specific goals to others focusing on their part of the system.

How are these things connected?

The smaller the system, the less need for designer and architect skills on the team. If you are building a dog house, you don’t need to be an architect. However, as soon as the system complexity increases and it becomes composed of several cooperating services or applications, those skills become important. It’s hard to rescue bad architecture with good design, or a terrible design with good implementation. (On the other hand it's simple to wreck even the best designs with a lousy implementation, anyone can do that!)

Analogies are often made between software construction and building construction. I guess that's because building construction is easy to visualize, and that it shares roles and questions of scale with software construction.

The Sydney Opera House under constructionSo, let's for a moment consider the Sydney Opera house. Surely an example of grand architecture! But is it also an example of grand design, and grand implementation?

I have no idea, but my point here is that at smaller scales, it could be a real mess! It does not follow that anything is superbly implemented at lower levels of detail, even though we can agree it is grand in the larger scales.

For example, the electrical wiring could be installed in an amateurish and unsafe manner, and placed where there are no easy access for maintenance workers.

The white ceramic tiles (manufactured in Sweden, by the way) could have been unprofessionally set on the roof in a way that let water in underneath, letting rust and mold starting to do damage to the building.

So the architect and designer(s) could have been doing a good job, but the craftsmen could have been amateurs, causing a whole lot of problems, that may last for the entire lifetime of the building.

And even if the architecture makes it a fantastic landmark, it could be that it is not optimally functional for an opera house. The Wikipedia entry indeed suggests that there are problems with the acoustics in the two major halls. But apparently the customer wanted both a landmark and an opera house, and then compromises was made.

This leads us to my next post in this series, where I’ll get to the first, and perhaps most important property of any system; being functional!

Filed under: Design No Comments
22Oct/110

Principles of Software Design

Software should be well designed, right? What then is well designed software? I'd say it's much easier to spot badly designed software, than to postulate what constitutes well designed software. Nevertheless, I'm going to give it a try. I'll also try to provide illustrative examples and justifications for my arguments.

However, I think we all can agree that the answer to what constitutes good design depends heavily on the context and purpose the software is developed for.

Are we tasked with building a service without a user interface, a web or desktop application, a game or an operating system, a smart-phone app, or an embedded system? Will it be stand-alone or part of a larger system? Will it be single user or multiuser, a commercial product or an in-house solution, will it run unattended, is it life-critical? Will it handle 100 KB or 100 TB of data per day? What is the cost of failures? How much data loss is acceptable? How much downtime is acceptable? What are the performance goals?

Design decisions that would be considered "good" (or "good enough") for one kind of system, might be detrimental to another.

Target software

So let's define what target software I'm considering in this discussion of "good design".

I'm most familiar designing or reviewing software in what we can call the "medium to large" scale, where the systems are comprised of several interacting components, such as desktop or web applications, web and windows services, and relational databases, loosely connected together in order to form a whole cohesive system.

My experience is that most systems have very few, if any, constraints that are really really hard, which must never be broken. This means in practice that nothing will be life critical, and the cost of failure is not overwhelming. A small amount of downtime may also be acceptable by the users. We are probably not dealing with data sets of hundreds of terabytes, and are not handling extremely sensitive or secret information.

The principles I want to discuss may also not be equally valid for very large scale systems such as MMORPGs, or the very small hobbyist scale such as the Sudoku solver you write for fun, but will in my opinion still be applicable to a rather wide range of enterprise or business software, from single user desktop applications, to web sites with tens of thousands of users, or business critical information processing services running as windows services or internet-facing web services.

The principles

Now let's get closer to the actual design discussions, shall we?

Experience have taught me a few things about software design, and I think that there are a number of desirable properties that we'd like our system to have. The following lists some of the most important ones:

  • Functional - able to provide the required functionality.
  • Testable - so we can ensure the system does what we want it do to.
  • Robust - able to handle and recover from failures.
  • Monitorable - the status of the system can be observed from the outside while it is running.
  • Deployable - easy to move from the development environment to the production environment.
  • Scaleable - able to gracefully handle a growing amount of data and users.
  • Adaptable - able to handle changes in the requirements or run-time environment.
  • Efficient - able to make the most out of the available computing resources.
  • Elegant - attains perfection not when there's no more to add, but when there's no more to remove.

My plan is to expand on the characteristics of these individual design properties in a series of future posts. I could probably expand on this list, but these are the really important ones, and we have to start somewhere, don't we? If I come up with more desirable properties, I'll have to come back to them later, or I'll never get started. This is a blog, not a book, after all.

Feedback from users or by observation is invaluable for the evaluation of a design (be it your own, or any design you know and understand well). Negative feedback on the design is a powerful motivator to design something better next time. Positive feedback helps reinforce the good decisions made, and we are likely to try to re-use good ideas the next time around.

Filed under: Design No Comments