less code, more software: coupling

Showing posts with label coupling. Show all posts

Adapt - why success always starts with failure

is an excellent book by Tim Harford (isbn 978-0-349-12151-2). As usual I'm going to quote from a few pages:

Cross the river by feeling for stones [Deng Xiaoping]

Accepting trial and error means accepting error.

Darwin, a meticulous observer...

The art of success is to fail productively.

Complexity is a problem only in tightly coupled systems.

Make sure you know when you've failed, or you will never learn.

What Palchinsky realised was that most real-world problems are more complex than we think. They have a human dimension, a local dimension, and are likely to change as circumstances change. His method for dealing with this could be summarised as three 'Palchinsky principles'

seek out new ideas and try new things

when trying something new, do it on a scale where failure is survivable

seek out feedback and learn from your mistakes as you go along

If we are to take the 'variation' part of 'variation and selection' seriously, uniformly high standards are not only impossible but undesirable.

When John Nagl served in Baghdad in 2003, he found that while his young inexperienced soldiers had the authority to kill, he - a major with a doctorate and a decade of experience - didn't have the authority to print his own propaganda pamphlets to counteract the clever PR campaign that the local insurgents were running.

Speciation - the divergence of one species into two separate populations - rarely happens without some form of physical separation.

Tight coupling means the unintended consequences proliferate so quickly that it is impossible to adapt to the failure or to try something different.

The first thing Timpson does when it buys another business is to rip out the electronic point-of-sale machines (there are always EPOS machines) and replace them with old fashioned cash registers. 'EPOS lets people at head office run the business', explains John Timpson. 'I don't want them to run the business.'

John Timpson describes one instance where he couldn't buy half-price happy hour drinks at a hotel bar, because midway through giving his order, the hour ended and the bar's computerised sales system refused to allow the half-price offer to be applied.

Timpson's company training manual describes the twenty easiest ways to defraud the company, making it clear that the company understands the risks it is running and trusts its employees anyway - and many people respond to being trusted by becoming more trustworthy.

A central point of the corporation, as a legal structure, is that it is supposed to be a safe space in which to fail. Limited liability companies were developed to encourage people to experiment, to innovate, to adapt - safe in the knowledge that if their venture collapsed, it would merely be the abstract legal entity that was ruined, not them personally.

Fail better. [Samuel Beckett]

Isolating legacy C code from external dependencies

Code naturally resists being isolated if it isn't designed to be isolatable. Isolating legacy code from external dependencies can be awkward. In C and C++ the transitive nature of #includes is the most obvious and direct reflection of the high-coupling such code exhibits. However, there is a technique you can use to isolate a source file by cutting all it's #includes. It relies on a little known third way of writing a #include. From the C standard:

6.10.2 Source file inclusion
...
A preprocessing directive of the form:
  #include pp-tokens 
(that does not match one of the two previous forms) is permitted. The preprocessing tokens after include in the directive are processed just as in normal text. ... The directive resulting after all replacements shall match one of the two previous forms.

An example. Suppose you have a legacy C source file that you want to write some unit tests for. For example:

/*  legacy.c  */
#include "wibble.h"
#include <stdio.h>
...
int legacy(int a, int b)
{
    FILE * stream = fopen("some_file.txt", "w");
    char buffer[256];
    int result = sprintf(buffer, 
                         "%d:%d:%d", a, b, a * b);
    fwrite(buffer, 1, sizeof buffer, stream);
    fclose(stream);
    return result;
}

Your first step is to create a file called nothing.h as follows:

/* nothing! */

nothing.h is a file containing nothing and is an example of the Null Object Pattern. Then you refactor legacy.c to this:

/* legacy.c */
#if defined(UNIT_TEST)
#  define LOCAL(header) "nothing.h"
#  define SYSTEM(header) "nothing.h"
#else
#  define LOCAL(header) #header
#  define SYSTEM(header) <header>
#endif

#include LOCAL(wibble.h)  /* <--- */
#include SYSTEM(stdio.h)  /* <--- */
...
int legacy(int a, int b)
{
    FILE * stream = fopen("some_file.txt", "w");
    char buffer[256];
    int result = sprintf(buffer, 
                         "%d:%d:%d", a, b, a*b);
    fwrite(buffer, 1, sizeof buffer, stream);
    fclose(stream);
    return result;
}

Now structure your unit-tests for legacy.c as follows:
First you write null implementations of the external dependencies you want to fake (more Null Object Pattern):

/* legacy.test.c: Part 1 */

static FILE * fopen(const char * restrict filename, 
                    const char * restrict mode)
{
    return 0;
}

static size_t fwrite(const void * restrict ptr,   
                     size_t size, 
                     size_t nelem, 
                     FILE * restrict stream)
{
    return 0;
}

static int fclose(FILE * stream)
{
    return 0;
}

Then #include the source file. Note carefully that you're #including legacy.c here and not legacy.h and you're #defining UNIT_TEST so that legacy.c will have no #includes of its own:

/* legacy.test.c: Part 2 */
#define UNIT_TEST
#include "legacy.c"

Then write your tests:

/* legacy.test.c: Part 3 */
#include <assert.h>

void first_unit_test_for_legacy(void)
{
    /* writes "2:9:18" which is 6 chars */
    assert(6, legacy(2,9));
}

int main(void)
{
    first_unit_test_for_legacy();
    return 0;
}

When you compile legacy.test.c you will find your first problem - it does not compile! You have cut away all the #includes which cuts away not only the function declarations but also the type definitions, such as FILE which is a type used in the code under test, as well as in the real and the null fopen, fwrite, and fclose functions. What you need to do now is introduce a seam only for the functions:

/* stdio.seam.h */
#ifndef STDIO_SEAM_INCLUDED
#define STDIO_SEAM_INCLUDED

#include <stdio.h>

struct stdio_t
{
    FILE * (*fopen)(const char * restrict filename, 
                    const char * restrict mode);
    size_t (*fwrite)(const void * restrict ptr, 
                     size_t size,  
                     size_t nelem, 
                     FILE * restrict stream);
    int (*fclose)(FILE * stream);
};

extern const struct stdio_t stdio;

#endif

Now you Lean On The Compiler and refactor legacy.c to use stdio.seam.h:

/* legacy.c */   
#if defined(UNIT_TEST)
#  define LOCAL(header) "nothing.h"
#  define SYSTEM(header) "nothing.h"
#else
#  define LOCAL(header) #header
#  define SYSTEM(header) <header>
#endif

#include LOCAL(wibble.h) 
#include LOCAL(stdio.seam.h)  /* <--- */
...
int legacy(int a, int b)
{
    FILE * stream = stdio.fopen("some_file.txt", "w");
    char buffer[256];
    int result = sprintf(buffer, 
                         "%d:%d:%d", a, b, a*b);
    stdio.fwrite(buffer, 1, sizeof buffer, stream);
    stdio.fclose(stream);
    return result;
}

Now you can structure your null functions as follows:

/* legacy.test.c: Part 1 */
#include "stdio.seam.h"

static FILE * null_fopen(const char * restrict filename, 
                         const char * restrict mode)
{
    return 0;
}

static size_t null_fwrite(const void * restrict ptr, 
                          size_t size, 
                          size_t nelem, 
                          FILE * restrict stream)
{
    return 0;
}

static int null_fclose(FILE * stream)
{
    return 0;
}

const struct stdio_t stdio =
{
    .fopen  = null_fopen,
    .fwrite = null_fwrite,
    .fclose = null_fclose,
};

And viola, you have a unit test. Now you have your knife in the seam you can push it in a bit further. For example, you can do a little spying:

/* legacy.test.c: Part 1 */
#include "stdio.seam.h"
#include <assert.h>
#include <string.h>

static FILE * null_fopen(const char * restrict filename, 
                         const char * restrict mode)
{
    return 0;
}
    
static size_t spy_fwrite(const void * restrict ptr, 
                         size_t size, 
                         size_t nelem, 
                         FILE * restrict stream)
{
    assert(strmp("2:9:18", ptr) == 0);
    return 0;
}

static int null_fclose(FILE * stream)
{
    return 0;
}

const struct stdio_t stdio =
{
    .fopen  = null_fopen,
    .fwrite =  spy_fwrite,
    .fclose = null_fclose,
};

This approach is pretty brutal, but it might just allow you to create an initial seam which you can then gradually prise open. If nothing else it allows you to create characterisation tests to familiarize yourself with legacy code.

You'll also need to create a trivial implementation of stdio.seam.h that the real code uses:

/* stdio.seam.c */
#include "stdio.seam.h"
#include <stdio.h>

const struct stdio_t stdio =
{
    .fopen  = fopen,
    .fwrite = fwrite,
    .fclose = fclose,
};

The -include compiler option might also prove useful.

-include file
Process file as if #include "file" appeared as the first line of the primary source file.

Using this you can create the following file:

/* include.seam.h */
#ifndef INCLUDE_SEAM
#define INCLUDE_SEAM

#if defined(UNIT_TEST)
#  define LOCAL(header) "nothing.h"
#  define SYSTEM(header) "nothing.h"
#else
#  define LOCAL(header) #header
#  define SYSTEM(header) <header>
#endif

#endif

and then compile with the -include include.seam.h option.

This allows your legacy.c file to look like this:

#include LOCAL(wibble.h) 
#include LOCAL(stdio.seam.h)
...
int legacy(int a, int b)
{
    FILE * stream = stdio.fopen("some_file.txt", "w");
    char buffer[256];
    int result = sprintf(buffer, "%d:%d:%d", a, b, a*b);
    stdio.fwrite(buffer, 1, sizeof buffer, stream);
    stdio.fclose(stream);
    return result;
}

Coupling, overcrowding, refactoring, and death

I read The Curious Incident of the Dog in the Night Time by Mark Haddon last week. I loved it. At one point the main character, Christopher, talks about this equation:

P_g+1 = α P_g (1 - P_g)

This equation was described in the 1970s by Robert May, George Oster, and Jim Yorke. You can read about it here. The gist is it models a population over time, a population at generation_g+1 being affected by the population at generation _g. If there is no overcrowding then each member of the population at generation g, denoted P_g, produces α offspring, all of whom survive. So the population at generation g+1, denoted P_g+1 equals α P_g. The additional term, (1 - P_g ) represents feedback from overcrowding. Some interesting things happen depending on the value of α

α < 1: The population goes extinct.
1 < α < 3 : The population rises to a value and then stays there.
3 < α < 3.57 : The population alternates between boom and bust.
3.57 < α : The population appears to fluctuate randomly.

In biological systems each generation has a natural lifespan. The cycle of death naturally helps to reduce overcrowding. But even with the inevitable death of each generation, the system's behaviour is delicately poised. Once α gets into the 3 - 3.57 range the growth starts to outpace the death leading to a rising population, but this leads to overcrowding which reduces the growth leading to a falling population. That reduces overcrowding, and growth rises again. The population repeats in a stable up/down cycle. But only because of the inevitable death. And only while the rate of death is sufficiently high to maintain the cycle. Once α gets beyond 3.57 the rate of death can no longer regulate the increased rate of growth and the system destabilises.

You can think about the process of writing software with this equation.

You can think of over-crowding as being analogous to over-coupling. We feel that a codebase is hard to work with, difficult to live in, if it resists our attempts to work with it. When it resists it is the coupling that is resisting.

You can also think of death as being analogous to refactoring. Just as death reduces overcrowding, so refactoring reduces coupling.

Refactoring is a hugely important dynamic in software development. Without refactoring a codebase can grow without check. Growing without check is bad. It leads to overcrowding. Overcrowding hinders growth.

less code, more software

Pages

Adapt - why success always starts with failure

Isolating legacy C code from external dependencies

Coupling, overcrowding, refactoring, and death

followers

labels