Recurse.se Yada yada on Software Development

19Sep/170

Docker Multistage Builds

In one of my current projects, we used Create React App to set up a React project.

Our build pipeline is built around Docker containers, the build runs in a container on our Buildkite agent, and the output is a Docker image. This keeps the agent nice and clean, as all build tools and dependencies are kept inside the containers, except for a small number of glue scripts. The quite large size of Docker images does, however, bring some maintenance issues to clean up images from older builds, or you'll quickly run out of disk space on your build agent.

Due to cascading dependencies npm install installs more than a thousand npm packages, several hundred megabytes in size, and we don't actually need the packages, npm or node in the finished image. Deleting data from a Docker image during build merely hides the files in Dockers layered file system, and squashing the image layers invalidates the build cache feature.

Before mid-2017, solving this problem would likely have meant running several builds, bouncing some artifacts to the host in between, but a much simpler solution (introduced in Docker 17.05) is now available with Docker Multistage Builds.

Our current (single) Dockerfile now looks like this;

# the first build stage (named 'cibuild') is based on node:6,
FROM node:6 AS cibuild
# Tell npm (react-scripts) we are running in a CI environment,
# which runs all tests and fails the build on linting failures
ENV CI=true
# Please remove a couple of thousand lines of info-log messages
# from npm output
ENV NPM_CONFIG_LOGLEVEL error
# Enable docker build cache for 'npm install' by only
# including files required for the install
COPY package.json ./

# and run the install
RUN npm install
# then add public resources and application source
COPY public public
COPY src src
# run tests and create an optimized production build
RUN npm test && npm run build
# then add a second build stage based on nginx
# for serving static content
FROM nginx:1.13
# take the output from the cibuild stage and place it at
# nginx's default location for static content
COPY --from=cibuild /build /usr/share/nginx/html
# and expose nginx default port
EXPOSE 80

We don't even need a custom nginx configuration file, as we place the files at the default location.

The resulting image is a nice and simple nginx image (about 128MB) to serve our static content and only the build output is fetched from the large node image (about 1.4GB). A container based on the nginx image is then placed behind a proxy that will send requests for static content (the app itself) to this nginx-container, and the React app's requests to a backend service running in a different container.

Filed under: Docker No Comments
13Sep/170

Get sublime help with git commit messages

  1. Use Sublime Text - my preferred text editor
  2. Configure git to use Sublime Text to edit commit messages by:
    • git config --global --replace-all core.editor "subl -w" (on macOS)
    • git config --global --replace-all core.editor "'C:/Program Files/Sublime Text 3/subl.exe' -w" (on Windows)
  3. Install the Sublime Text Package Git Commit Message Syntax which will add file type detection and syntax highlighting for files named COMMIT_EDITMSG, which is what git uses.
  4. Open the settings for that syntax;
    • Open ~/Library/Application Support/Sublime Text 3/Packages/User/commit-message.sublime-settings (on macOS), or
    • Either use git to open a commit message, or set the current syntax to Commit Message, and go to Preferences > Settings - Syntax Specific (which opens up the settings file specific to the current syntax)
  5. Add rulers for the recommended commit message lengths by adding
    "rulers":[72, 50] between the curly brackets, and save the file.

Now, when writing a commit message, these rulers will help you visually keep the first line (the commit title) around 50 characters long, and the commit body around 72 characters.

A bonus feature is that using the command Edit > Wrap > Wrap Paragraph at Ruler, will now reformat the current paragraph to wrap along the first ruler (72), which makes it really easy to keep longer commit messages nicely formatted without manual fiddling!

Filed under: git No Comments
5Jul/170

Accessing the Aurelia viewModel from the browser

If you want to accessing the Aurelia viewModel from the browser console, note that there's a difference between the element hosting the Aurelia app itself, and an element hosting an Aurelia component. If the app is hosted on the body element (<body aurelia-app="main">) you can use;

document.querySelector('body').aurelia.container.viewModel or
document.querySelector('body').aurelia.root.viewModel

and in the component case, you can find the viewModel via

document.querySelector('my-custom-element').au.controller.viewModel

Filed under: Aurelia, Frontend No Comments
4Jul/170

Empty if null – Preventing null infections

In OO-style development, the value null causes endless issues. In FP-style development null is a much smaller problem, because using null isn't ideomatic style. In FP, instead of using null to indicate absence, lists tend to be empty, and optional values tend to use Option (a.k.a Maybe) types.

In one of our C# codebases, which is predominantly OO, using FP-style list processing often runs into the null problem.

Code like:

var currencies = contract.sections.Select(s => s.currency).Distinct();

raises exceptions if 'sections' is null, and constantly guaranteeing that all properties we use list operations on isn't null is a pain and breaks the chain of list processing.

We could have used the null conditional operator (?.) and written:

var currencies = contract.sections?.Select(s => s.currency).Distinct();

but this just propagates the the null value, spreading the null-infection downstream! Now currencies is either a list of currencies, or null.

Enter one of the most useful extension methods I've come up with, EmptyIfNull;

public static class EnumerableExtensions
{
  public static IEnumerable<T> EmptyIfNull<T>(this IEnumerable<T> items)
  {
    return items ?? Enumerable.Empty<T>();
  }
}

Now we can write:

var currencies = contract.sections.EmptyIfNull().Select(s => s.currency).Distinct();

which won't raise an exception even if 'sections' is null, allowing the code to flow more naturally, and we are also guaranteed that currencies is always a list (though maybe empty) but never null.

29Jun/170

Copy commits from one git repo to another

If you have done some work in one (or more) repos, that you want to transfer into a different target repo, you can of course simply copy you work to the new repo. But what if you also would like to copy the commit history?

This can be done by using git format-patch plus git am.

We'll assume that we have two different repos; labrepo and goodrepo.

somewhere $ cd ~/repos/labrepo
labrepo $ git format-patch --root
0001-Initial-TestDataProvider.patch
0002-One-assert-per-test.patch
0003-Allow-adding-schedule-for-multiple-days.patch
...
labrepo $ cd ../goodrepo
goodrepo $ mkdir lab
goodrepo $ git am --directory=lab --whitespace=nowarn ../labrepo/*.patch
Applying: Initial TestDataProvider
Applying: One assert per test
Applying: Allow adding schedule for multiple days
...

This series of commands would transfer all your commits from labrepo into the subfolder "lab" in the goodrepo. And you'll be able to move several other repos into the goodrepo this way, without conflicts. Remember that the new commits are just copies, and will have different commit-ids than the corresponding labrepo commits.

git format-patch can of course extract just a subset of the history, such as if work continues in the labrepo, you can create patches just for those commits, and run a new git am on those patches to import them.

somewhere $ cd ~/repos/labrepo
labrepo $ git format-patch -v2 commitid-of-last-transferred-commit
v2-0001-Never-schedule-Sundays.patch
v2-0002-Increase-test-coverage.patch
...
labrepo $ cd ../goodrepo
goodrepo $ git am --directory=lab --whitespace=nowarn ../labrepo/v2-*.patch
Applying: Never schedule Sundays
Applying: Increase test coverage
...

The -v2 argument to git format-patch gives the resulting files a prefix, so you know which patch files belong together (or, just remove old .patch-files before exporting new ones)

Filed under: git No Comments
15Apr/170

Fork and Switch – Pointing a git workspace to a fork

So, I'm in a couple of projects where all developers have push access to a Github repository, but we'd like all developers to move over to using a pull-request based workflow to encourage code reviews. This often results in people just doing work in a branch, pushing the branch to the common repo and then doing a PR from there.

This works fine, but creates a lot of branches in the main repo. Also we've had minor incidents where people forgot to create a branch, and pushed unreviewed commits directly to the master branch. I usually recommend people to do their work in a fork - a copy of the codebase under your github account.

Now, if you go ahead and clone your fork into a new folder on your local machine, when you already had a working setup towards the original repo in another folder, you'll need to change a lot of things, paths in your IDEs and editors etc, probably download half an internet worth of NPM packages and so on, so it's much simpler IMHO to just update the "remotes" on your current repo workspace folder to look like they would if you'd cloned your fork.

This is achieved like this. Let's assume the original repo is https://github.com/megacorp/enterprisey and you already have a working setup in the folder C:\dev\megacorp\enterprisey

  • Create your fork by browsing to the https://github.com/megacorp/enterprisey repo in Github, and press the "fork" button
  • Then select your github user as the destination of the fork. This will create a fork named https://github.com/your_username/enterprisey
  • Then, in your original working folder (C:\dev\megacorp\enterprisey) run the following git commands;
git remote set-url origin https://github.com/your_username/enterprisey.git
git remote add upstream https://github.com/megacorp/enterprisey.git

That's it! The only change in workflow after that, is that you need to remember to say git pull upstream (or git fetch upstream) to get the latest stuff from the common repo (the megacorp one, that is). Now, there are ways to configure git to default pulls to upstream instead of origin, but I don't think it's an improvement.

The whole point of this will be that, now, when you do git push it'll send your stuff to your fork, instead of to the common repo, which keeps the number of branches in the common repo down, as well as minimizing the risk of pushing unwanted stuff to your common master.

Then when you want to do a PR from your fork to the common master, it'll be just as simple as before, as github already knows about your fork, and will prompt you if it notices a new branch in your fork.

Filed under: git No Comments
28Jun/130

Multimethods and default arguments

I had a situation the other day on my Clojure project, where I wanted to refactor a (parsing-related) function that was becoming too complex (ironically named "simplify") that looked like this;

(defn simplify 
    ([tokens]
        (simplify tokens 1)) ; provide a default weight of 1
    ([tokens weight]
        (case (first tokens)
            :choice
                ; several lines of :choice-specific code

            :list 
                ; several lines of :list-specific code

            ; and several more lines of implementations for other token types
            )))

Default values for arguments

Notice that "simplify" uses a Clojure idiom for providing default argments, providing several versions of the function with different arity, where the lower arity versions calls the higher arity versions with some default values for the omitted parameters. An example:

(defn add-some
  ([num]        (add-some num 10))      ; 1-argument version
  ([num some]   (+ num some)))          ; 2-argument version

Here the 1-argument version of "add-some" simply calls the 2-argument version with a default value for "some" (here "some" defaults to 10). We could even have a 0-argument version if we thought that made sense.

Enter multimethods

I was planning to break "simplify" apart using a Clojure multimethod, defining one version of simplify (a "handler") for :choice, another for :list and so on. I found that using the above default value idiom didn't map as cleanly as I expected to multimethods.

First, I wasn't sure if multimethods even supported several arities (it turned out they do), and I didn't find anything relevant online about it, so I banged out the following to start with:

; define simplify as a multimethod
(defmulti simplify 
    ; the dispatch function just selects the first token in the list 
    ; as the dispatch value
    (fn [tokens weight] (first tokens)))

; define a version of simplify that is called for the dispatch value :choice 
(defmethod simplify :choice 
    [tokens weight]
    ; some impl for handling :choice tokens
    )

; define a version of simplify that is called for the dispatch value :list 
(defmethod simplify :list 
    [tokens weight]
    ; some impl for handling :list tokens
    )

This looked like an improvement. But inside these functions I had several cases where I just wanted to simplify a list of tokens, where the weight was irrelevant, so a default weight of 1 would work fine. The old version just did

(map simplify tokens)

which now didn't work (as simplify is strictly a two-argument function). I could of course re-write all such instances into something like:

(map #(simplify % 1) tokens)

but #()-lambdas makes the code less readable in my opinion. I then briefly tried to provide a one argument version of simplify as a normal function using defn, hoping Clojure had a way to separate the two, but you cannot have a multimenthod and a normal method with the same name - rather obvious in hindsight - so that try didn't work out. I then renamed it "simplify-1" which didn't collide, but wasn't particularily beautiful.

(defn simplify-1 [token] (simplify token 1))

But at least then I could write:

(map simplify-1 tokens)

which I then just wrapped-up as `simplify-all, taking a list of tokens, which was a slight improvement as the code became a bit shorter.

Multi-arity multimethods

Later I read up on the arity of multimethods in my copy of "Clojure Programming" (as neither clojure.org nor ClojureDocs mentioned anything about multiple arity support in multimethods). It turns out they support the same arities as the multimethod dispatch function. I now thought the problem was solved, and came up with this variation:

(defmulti simplify 
    ; all versions of the dispatch function just selects 
    ; the first token in the list as the dispatch value
    (fn 
        ([tokens] (first tokens))
        ([tokens weight] (first tokens))))

But that means that if we call the 1-argument version of simplify, having a first token of, say, :list, we will end up calling the 1-argument version of simplify for :list, which we then have to implement.

(defmethod simplify :list
    ; call the 2-argument version (via the dispatch function) 
    ; with a default weight
    ([tokens] (simplify tokens 1))  
    ([tokens weight]
        ; some impl for handling :list tokens
    )

We also must implement it for all the other token types, such as :choice

(defmethod simplify :choice
    ([tokens] (simplify tokens 1))  ; call the 2-argument version
    ([tokens weight]
        ; some impl for handling :choice tokens
    ))

This approach quickly seemed much worse than having a simplify-1 function due to all the duplication.

The breakthrough

Then I realized that there was a much simpler solution! My mistake was to think the dispatch function should always do the "right thing" (TM) and dispatch to the token-specific version of simplify directly. My new insight was that the 1-argument dispatch function should statically dispatch to a single function that just called simplify again, providing the default value, which then would call the 2-argument version of the dispatch function to select the proper 2-argument handler!

Something like this:

(defmulti simplify 
    (fn 
        ; statically dispatch 1-argument calls to the :add-default handler 
        ([tokens] :add-default) 
        ([tokens weight] (first tokens))))

(defmethod simplify :add-default
    ; call the proper 2-argument handler through the dispatch function
    [tokens] (simplify tokens 1))

Then there's no need to provide other 1-argument versions of simplify, as no other handler beside :add-default will ever be called with just one argument!

(defmethod simplify :list
    [tokens weight]
        ; some impl for handling :list tokens
    )

So just to make it very clear, here's the step by step resolution of this process:

  • a call to the multimethod (simplify [:list 1 2 3]) ...
  • results in a call to simplify's 1-argument dispatch fn, which recommends the :add-default handler.
  • the :add-default handler will be called with [:list 1 2 3],
  • and will add our much-needed default value, and call the 2-argument multimethod (simplify [:list 1 2 3] 1),
  • resulting in a call to the 2-argument dispatch fn, which recommends the :list handler,
  • the :list handler is then finally called with two arguments, [:list 1 2 3] and our meticulously provided default weight of 1.

This might seem like huge overhead compared to what could have been a single function call, but I think the improved clarity of the code is totally worth it. Especially since simplify is just called relatively few times after parsing when breaking down the initial parse tree into a more appropriate data structure. Thus simplify is not worthy of micro-optimizations of removing a few function calls.

Defaults using variable arity or map destructuring

When writing this post, I came up with two more variations, using Clojures support for variable arity functions, and destructuring.

(defmulti simplify 
    ; allow, but ignore additional arguments
    (fn [tokens & _] (first tokens)))

(defmethod simplify :choice
    ; allow additional arguments, and unpack the first additional arg
    [tokens & [maybe_weight]]
        ; if a second arg was provided, it's in maybe_weight, else nil,
        ; in which case 'or' selects 1
        (let [weight (or maybe_weight 1)]
            ; some impl
        ))

(defmethod simplify :list
    ; allow additional arguments, and unpack the first additional arg
    [tokens & [maybe_weight]] 
        ; but we need to repeat the same code for providing defaults 
        ; in every defmethod
        (let [weight (or maybe_weight 1)]
            ; some impl
        ))

This would work, but has some obvious drawbacks.

  • A need to repeat the code for destructuring, and providing defaults
  • You can call the method using more than two parameters, but the additional parameters will just be silently ignored.

It would also be possible to use Clojures map destructuring support, to add a keyword argument for weight, with a default value of 1, something like;

; the param w takes its value from the keyword arg :weight, or 1 if not provided
(defn alt-simplify [tokens & {w :weight :or {w 1}}] 
    ; do something with w
    )

; called as
(alt-simplify some-tokens :weight 2)

I don't think this is a good use of keyword arguments, and the destructuring would still need to be repeated for all versions of the function. So even if the version using map destructuring is a tad shorter than the one using variable arity, I'm not very fond of any of them. At this point, I prefer the version using :add-default.

5Mar/120

Implementing for Robustness (cont’d)

Focus: Robustness
Description: Set robustness goals, and consider what should happen, and what should not happen in the face of likely (and less likely) errors.

In the previous post we set up robustness goals, and implemented the first half of the order import functionality, performing downloads of orders via FTP. In this post we are going to complete the example by implementing the second half, which imports the orders into the warehouse system.

What we need to do

An outline of the functionality needed for the import process is as follows:

  • The download process leaves the order files it has downloaded in a folder where the import process takes over ownership of the files.
  • We are going to import the orders oldest-first.
  • If there’s an error processing an order, we’ll signal an error, move the file to the error folder and continue with the next order.
  • We are going to archive each order file in an archive folder after successful import.

Avoiding starvation and false prioritizationTiming Issues

Why do we care about in which order we do the import, what has that to do with robustness, you might ask. Aren’t we simply going to import them all, so why care about ordering? The answer is that since we said import takes a good 30 seconds per order, we may create a so called “starvation” situation if newer order files gets processed before older order files are chosen, such that some orders might be delayed “indefinitely”. When the orders are few and processing time is abundant, any strategy will do. But when there’s enough orders so that it will take a significant amount of time to import them all, we will have problems without an ordering strategy.

For example, if we’re importing order files in “unspecified” order (which in case of .NET Directory.GetFiles on NTFS would be alphabetical order) and the naming scheme for the order files happens to be such that, say, the name starts with a two digit area code. We will in this case inadvertently cause a false prioritization for lower area codes over higher area codes.

To understand what ill effects this might have, let’s say there’s 120 new order files for lower area codes, and one older file for a higher area code – this will then then delay the order from the higher area code by around an hour, even though it was “first in line” before the other orders arrived! Since we’ve made promises to the customers regarding delivery times, the longer it takes for an order to be imported, the less time warehouse workers will have to pick the ordered goods, and we might not be able to fulfill the delivery in time.

Note that this also applies to the downloading process, as it wouldn’t do much good if we implemented “oldest-first-import” if downloading also falsely prioritize some orders. However, as we’re assuming that the downloading process is quite fast (magnitudes faster than importing) the order it uses is of little concern to us.

Processing errors

The kind of errors we can expect from order processing can roughly be placed in one of three categories:

  • Errors from the file system (reading, archiving)
  • Errors due to order contents (format issues, content validation issues)
  • Errors from the warehouse system API we’re using (validation issues)

Sample code

The presented code is kept as short as possible - possibly even too short - but I find too much code takes focus from what I actually want to discuss, which really isn’t the exact implementation, but the need to do robustness by design, not by accident.

 1 public static void ImportOrders()
 2 {
 3     do {
 4         string orderFile = null;
 5         try {
 6             while ((orderFile = GetNextOrderFile()) != null)
 7             {
 8                 ImportAndArchiveOrder(orderFile);
 9             }
10         } catch (Exception ex) {
11             Log(ex, orderFile);
12         }
13         // only pause if there is an error, or there are no orders to import
14         Pause(RetryDelay);
15     } while (!doomsday);
16 }
17 
18 private static void ImportAndArchiveOrder(string orderFile)
19 {
20     bool success = false;
21     try {
22         success = ImportOrder(orderFile);
23     } catch (Exception ex) {
24         Log(ex, orderFile);
25     }
26     ArchiveOrder(orderFile, success ? ArchiveFolder : ErrorFolder);
27 }

Care has to be taken in the implementation of GetNextOrderFile() not to commit to a huge number of files, as this can cause starvation, since it also translates into a huge amount of processing time before the next decision on which files to import. It also should not re-read the directory to find the oldest one at each call, as we then (in case of a single very old unprocessable, unmovable file) would try to import the same order over and over again.

But why pause?

I include a pause here and there in the sample code. Why is that a good idea? Well, if you’ve seen eventlogs filled with tens of messages a second all stating the same unexpected error (“Access Denied” say) you know that if some error bubbles up to the top level, it can be a good thing to allow processing to take a short break, if nothing else to avoid overloading the logs, and possibly other systems.

If your code normally processes two orders a minute, an unforeseen error situation can force it to do one lap around the processing loop much much quicker than you anticipated, and the code may then start to hammer some other service with requests because you expected the normal processing to provide a natural pause between such calls.

Just a few days ago, I saw a service almost bring a server to it’s knees because of an “Access Denied” error, which made the service unable to remove a file, and it therefore tried processing the file over and over again. The processing needed for a single file took a significant amount of resources, and when it did this over and over – without the slightest pause – it really stressed out the server. Had there been a top level pause of even 10 seconds, the service would not have overloaded the server due to this error, and processing would be only very slightly delayed by this, as it really only pauses when it has nothing actual processing to do.

Improvements

An improvement to order-first would be to import in time-to-requested-delivery order, but this would be more complicated to get right, as orders then wouldn’t arrive in anything near the order we wanted to import them in, and we would also have to read the order files to get at this information. Oldest-first is good enough at this point, and certainly more robust than importing in “whatever” order. However, the implementation above does not care about the actual ordering method used, as that is abstracted away in the implementation of the GetNextOrderFile method.

There are likely lots of other improvements to the code, as it’s really just pseudo code.

Filed under: Design No Comments
27Jan/120

Reducing Your Troubles

In this post we’ll continue the refactoring of our now-a-bit-more-elegant Scheme code for doing reports in our hypothetical Gift Card scenario. We will discover another fundamental building block in the functional programming world, a function that takes a list, and reduces it to a single value, using an user provided reducer function.

Together with the list filtering function developed in an earlier post in this series, and the list conversion function in the previous post, we’ll see how our more generic functions can help rewrite a couple of twenty-line problem-specific functions into one-liners, showing off the composability of functions, and hopefully help you increase your understanding of functional programming at the same time.

Aiming for perfectionDarting the Obvious

Continuing our series of rewrites and generalizations, you notice that if you implement a date-minimizer called min-date you could also re-write oldest-used-here into a beautiful one-liner:

(define (oldest-used-here giftcards)
  (min-date (select issue-date (where used-here? giftcards))))

(min-date could internally use an earliest-date function, mentioned in the previous post, that would select the earliest date from its two parameters, but I’m not showing an implementation here)

However there’s something troubling you. Your refactoring-sense is tingling. While you’ve made good progress, you’re not quite there yet. You cannot quite put your finger on it, but there’s something you are missing. You are sure of it!

After a few hmm’s and ooh’s, and thatsnotit’s, you decide it’s the strange similarities yet differences between sum and min-date that’s bothering you. Both functions take a list of something (integers, and dates, respectively) and reduces this down to a single value (the sum, and the minimum date, respectively). Using this observation, could it then be possible to write a generic list-reducer-function somehow!?

After contemplating this insight for a while, you produce the following function in a burst of inspiration:

 1 ; the 'reduce' function turns a list into a single value using 
 2 ; a user-provided reducer function, and an initial seed value
 3 (define (reduce reducer current-value items)
 4   ; if there are no more items, return the current value
 5   (if (empty? items)
 6       current-value
 7       ; otherwise reduce the current value and the current item
 8       ; into a single value using the provided reducer function
 9       (let* ([current-item (first items)]
10              [remaining-items (rest items)]
11              [reduced-value (reducer current-value current-item)])
12         ; and recurse on the remaining items
13         (reduce reducer reduced-value remaining-items))))

(Notice that the reduce function didn’t need to define an iteration helper like we’ve used so many times now, as reduce already has all the parameters needed to simply recurse on itself.) Using the reduce function, you are able to re-write sum, and min-date as the following one-liners;

(define (sum numbers) (reduce + 0 numbers))
(define (min-date dates) (reduce earliest-date (current-date) dates))

I assume we have a current-date function, which is called without parameters to get todays date as a starting value for finding the minimum date. Obviously, if we are trying to find the minimum from a list of some future dates, “today” wouldn’t be a good starting value, would it. But in our given scenario we know we won’t have to deal with gift cards from the future (or will we, Mr. McFly?)

The value of uniformity

Also notice how in this example the sum function is able to use the addition function (+) in a way that for example C# cannot use its addition operator (+), as C# makes a language-level distinction between functions, expressions and operators, whereas these are interchangeable concepts in Lisp and Scheme, as well as in other functional programming languages, as it’s proven useful to be able to pass them around like this!

The Empty List This version of reduce can handle being called with an empty list (or “the” empty list, as it’s normally called, as there’s really only one such list in existence in Scheme) in which case the user-provided value of current-value is returned. You could omit the current-value parameter, and let the reducer instead start off by reducing the first and second elements of the list (instead of the initial value and the first element of the list) but then it wouldn’t know what to return in case we called it with the empty list. But it would be a tad simpler to use.

If we wanted, we could implement this variation as, say, reduce1, calling reduce with the first item of the list as the initial value, but it will need at least one item to reduce:

(define (reduce1 reducer items) (reduce reducer (first items) (rest items)))

Eh, Reduce Me?

Perhaps it’s not clear how reduce does its job. In the case of the sum function, the accumulator (current-value) starts off at 0 and is then updated at each iteration with the sum of itself and the current number from the list, using + as our reducer. (And when I say “update” I don’t mean modified – the accumulator is immutable – but it’s getting a new value when passed into the next iteration!) For a three-element list (1 2 3), and an initial value of 0, the reduction basically becomes:

(+ (+ (+ 0 1) 2) 3)

In the case of the min-date function, we start off at “today”, and at each iteration finds the minumum date of the accumulator (current-value) and the current date from the list (using earliest-date as our reducer). If the current date from the list is later than the currently earliest date found, the accumulator won’t be updated. If an earlier date is found in the list than we’ve previously seen, the accumulator is then updated to this new value, and at the end of the iteration through the list, we have found the minimum date! The reduction then becomes:

(earliest-date (earliest-date (earliest-date init-val first-elem) second-elem) third-elem)

And so on…

Where are we now?

Over the last three posts, you’ve now devised and implemented “where”, “select” and “reduce”, and hopefully learned how to pass around functions as parameters, to specialize the workings of a generic algorithm. Congratulations! These are powerful tools, and fundamental building blocks in the world of functional programming. Use them for good, not for evil! Understanding them is a milestone towards getting a FP mindset! You are likely to use select and where fifty times more often than you’ll use reduce, however.

You should also know that in most, if not all, functional languages these functions, and several more, are already a part of the standard library, but perhaps under different names. where is often called “filter”, select is often called “map” and reduce is often called “fold” (or “foldL”, where L for the fact that it reduces in left-to-right order, which sometimes matters). It’s also likely that these library functions are more advanced than our implementations, for example including error handling, as well as versions that can handle additional input lists.

I’ve chosen other names here than the standard Scheme ones, for several reasons:

  • increased clarity - at least in my opinion; ask someone what they think a function called “map” does, or if then believe a function that “filters” includes, or excludes the filtered items.
  • similarity to SQL clauses - many “non-functional” Microsoft-class programmers already know SELECT and WHERE from writing SQL-syntax.
  • similarity to their C#/.NET names (where they’re called Select, Where and Accumulate)

You should now be able to take over the world! Or at least, you won’t need to manually iterate through lists anymore, as these three functions should be able to cover most of your iteration needs for the next decade or two.

25Jan/120

Select Your Weapon

In this post we’ll continue the refactoring of our less-than-elegant Scheme code for doing reports in our hypothetical Gift Card scenario. We will discover another fundamental building block in the functional programming world, a function that takes a list, and converts each element into a new form using an user provided conversion function.

Together with the list filtering function developed in the previous post in this series, and a simple list-of-numbers summer, we’ll see how our more generic functions can help rewrite a couple of twenty-line problem-specific functions into one-liners, showing off the composability of functions, and hopefully help you increase your understanding of functional programming at the same time.

Refactoring continued

In our previous post, we saw that in Scheme (or any functional programming language) using a function as a parameter to another function can be very useful, and helps separating generic behavior from the specifics needed for a particular situation. We implemented a generic list filtering method we called where, that let the user provide the filtering rules used by passing in a function that decides which elements to include, and which to exclude.

As you now have the useful where function, you can now redefine the sum-issued-here function (also introduced in the previous post) as a slightly more generic amount-summer-function plus the where-function to find the gift cards to sum. The amount-summer can be defined like this:

 1 ; sum the amounts of all gift card in the list
 2 (define (sum-amount giftcards)
 3   ; define a helper function to recursively iterate over the list,
 4   ; building up the result in an accumulator
 5   (define (sum-amount-iterator remaining-giftcards accumulated-sum)   
 6     ; if there are no more gift cards, return the accumulated sum
 7     (if (empty? remaining-giftcards)
 8         accumulated-sum
 9         ; otherwise, process the current gift card
10         (let* ([current-gc (first remaining-giftcards)]
11                [still-remaining-gcs (rest remaining-giftcards)]
12                ; get the amount from the gift card, and update the sum
13                [new-sum (+ accumulated-sum (amount current-gc))])
14           ;  and recurse on the remaining items
15           (sum-amount-iterator still-remaining-gcs new-sum))))
16   
17   ; call the helper function with initial values
18   (sum-amount-iterator giftcards 0))

It’s very much like several of the other functions you’ve written (and please see the previous post for a thorough explanation of the recursive iteration helper function). The difference from the original implementation is the fact that this function sums all provided gift cards, as you can now use a separate function to decide which gift cards to sum, which have greatly improved the composability of our functions. By composability I mean that we can compose (combine) them in several different ways to get different results.

We can now rewrite sum-issued-here simply as:

(define (sum-issued-here giftcards)
  (sum-amount (where issued-here? giftcards)))

using the one-line function issued-here? we defined in the previous post, to find us the relevant cards to sum. Now the sum-issued-here cannot become much simpler, but the oldest-used-here function (defined in the previous post, but basically finds the oldest gift card (the one with the earliest issue-date) used in a particular shop) is still in need of a rewrite. While we can use the where function to do part of the job, we cannot re-use much else. We’d rather not write yet-another similar function.

Further ideas for refactoringsThe Wrench Connection

You have a gut feeling that there’s more generic functionality waiting to get extracted, that would let you rewrite many of your previous functions. Thinking hard about this, considering the definition of the sum-amount and the oldest-used-here functions, combined with your new-found insight that it’s possible to parameterize a function with a different function (something we just put to excellent use in our where function), you come up with a few different options for further refactorings:

  • Write a more generic summer, where the property to be summed is controlled by a user-supplied function, for example; (sum-by amount giftcards)
  • Write a generic list converter, where the conversion is controlled by a user-supplied function, so that you could convert a list of gift cards into a list of issue-dates, or amounts.
  • Write a generic summer of a list of numbers, that could then be used to sum those amounts
  • Along the same lines, you’d need a date-minimizer, to find the earliest date among a list of issue-dates

After a short session of guru meditation, you decide a generic list converter function would be a most beautiful and useful companion to your generic list filtering function, as well as being more composable than sum-by, and thus you start with that one. You decide to name it the select function.

Select - A generic list converter function

Your implementation of the select function turns out much like many other functions in this series, based on the same recursive decomposition of the problem, first work in the current item, and then recurse on the rest of the items.

 1 ; the 'select' function converts each item in a list into a new form.
 2 ; the conversion is done by the user-supplied convert function! 
 3 (define (select convert list-of-items)
 4   ; our well-known helper
 5   (define (select-iterator remaining-items results)
 6     (if (empty? remaining-items)
 7         (reverse results) ; restore original ordering
 8         (let* ([current-item (first remaining-items)]
 9                [still-remaining (rest remaining-items)]
10                ; let the passed-in function convert each item into
11                ; a new form and preprend it to the accumulated results
12                [new-results  (cons (convert current-item) results)])
13           ;  and recurse on the remaining items
14           (select-iterator still-remaining new-results))))
15           
16   ; call the helper function with initial values
17   (select-iterator list-of-items empty))

This is an immensely useful function. It may appear just like many other, but you’ll use it time and time again when using functional programming techniques. I’d say it’s one of the most fundamental building blocks of functional code. You’ll use it to convert lists of item-IDs to item objects, using a database lookup function, or to convert from items to XML that you’ll write to a file, and many other things.

Pillow, pillow on the bed, will I ever rest my head?With this masterpiece done, do you pat yourself on the back? Do you take a well deserved nap? No, there’s no stopping you now, and you proceed relentlessly even though your eyes burn from lack of sleep, by re-writing sum-amount (defined at the top of this post) into a generic summer of a list of numbers.

After briefly considering the name ‘reginald’, you rename it sum. It really only needs a trivial change on line 13, to replace the call to the amount function with the current list element itself (and perhaps a few identifiers could use more generic names, as the sum function is no longer gift-card specific). You decide to spare the readers the sight of the actual implementation.

The revised implementation of sum-issued-here is more interesting:

(define (sum-issued-here giftcards)
  (sum (select amount (where issued-here? giftcards))))

This is quite beautiful! Using a few generic and very composable functions and your initial 20-line function has become a one-liner (ok, it’s a two-liner, but the line break is for readability, OK?) with the aid of a few helpers such as amount and issued-here? – also one-liners.

Step by step

The way this version of sum-issued-here works is that if it is called with a list of gift cards, we’ll call them gc1, gc2 and gc3, it will return the sum of the amounts of those issued in a particular shop nearby (as per our definition in the previous post).

  1. The list of gift cards is passed on to the where function that returns a new list, including only those gift cards for which the issued-here? function returns true. Let’s say that was true for gc1 and gc3, but not gc2.
  2. This list with gc1 and gc3 is then passed to the select function, that calls the amount function on each gift card, to extract their amounts into a new list that it returns. This list of amounts (well, plain numbers really) now contains say 250, and 175.
  3. This list of numbers is passed on to the sum function that sums it up into the number 425, which then becomes the return value of this particular call to sum-issued-here.

In the next post, we’ll try to finalize this refactoring session after discovering yet another useful function, so that we can move on to solve a larger and more challenging problem.