:::: MENU ::::

Logsearch for Cloud Foundry presentation

Logsearch is the opensource project I lead as part of my day job at City Index Ltd.  Based on the Elasticsearch ELK stack; and packaged as a BOSH release, it builds you a log processing cluster tailored to making sense of your IT environment and the apps that run on it.

I gave a talk showing how Logsearch can be used to analyse the logs of a Cloud Foundry cluster at the London PaaS Users Group last week that was well received.

Below is a screencast of that Logsearch for Cloud Foundry presentation (youtube)

How to debug the CF/PAAS deploy process

The new breed of PAAS systems are all converging on a common deployment model.

  1. A CLI tool uploads your code / executables to the PAAS.
  2. The PAAS launches a clean “LXC based” staging container VM and invokes a buildpack on your code / executable
    1. The buildpack bin/detect ‘s whether it knows how to work with your code type (eg, it looks for a Gemfile, or a .java class).  If no, it fails.
    2. The buildpack bin/compile ‘s your code, combining it with any runtime dependancies – i.e a specific JVM or Mono runtime – and any related libraries – eg, what is specified in your .nuget package or Gemfile.  This process basically results in an “app” folder which contains all the binaries your app requires to run.
    3. The buildpack bin/release ‘s by specifying the “startup command”, and any ENV vars that should be set
  3. All of this output is then zipped up into your app.tgz and added to the PAAS blobstore.  That done, the staging container is deleted.
  4. The PAAS then fires up as many runtime “LXC based” container VMs as you have specified, unzips your app.tgz into your $HOME folder, loads the ENV vars and runs the start command specified in step 2.3
  5. The PAAS monitors your start command – if it exits it will automatically rerun step 4 to give you a fresh runtime container.
  6. The PAAS also monitors the host machines running the containers – if any of those fail it will restart all the affected runtime containers on a new host machine.  This step happens more frequently than you think (typically nightly), because its also the way that the PAAS keeps the host systems operating systems updated.

Each PAAS (Heroku, Cloud Foundry, flynn.io) has custom components that orchestrate everything, but there is a healthy open source community creating buildpacks for the languages and runtimes near and dear to their hearts; and these typically work (with minor modifications) on any of the PAASes.

Yours truly has now written 4 buildpacks for Cloud Foundry:

  1. https://github.com/mrdavidlaing/stackato-buildpack-wordpress <- WordPress running on Facebook’s HipHop PHP runtime
  2. https://github.com/cloudfoundry-community/nginx-buildpack <- Static HTML sites running on Nginx
  3. https://github.com/cloudfoundry-community/.net-buildpack <- .NET apps running on Mono
  4. https://github.com/mrdavidlaing/java-buildpack-with-Procfile-container <- Extension of the CF Java buildpack

One of the major pain points in the process is debugging the staging and runtime containers because:

  1. You can’t SSH into them to poke around and explore
  2. In the event of a catastrophic failure you the container (and its logs) get deleted before you can extract any of the files.

So, my holiday project was to try and build something to make debugging the deployment process easier.

The result is https://github.com/cloudfoundry-community/container-info-buildpack – a buildpack that exposes information about the staging and runtime containers via a web-app.  See the README.md for details on how to use it.

This little experiment has been received with enthusiasm by the CF dev community; so I think I’ve identified a common pain point.

In its development I learnt about two useful things:

  1. pstree -a <- lists all the processes currently running in your shell.
  2. forego <- a speedy and memory efficient Go implementation of foreman
  3. openresty <- a collection of nginx modules that turn nginx into a simple (and very efficient) app server, with the ability to script logic using LUA.

I’m currently experimenting with being able to wrap this “info” buildpack around another buildpack, so you can

  1. Gather additional debugging info when deploying an app – say a mono app based on https://github.com/cloudfoundry-community/.net-buildpack
  2. Re-run the staging and runtime processes without having to redeploy your app.



WordPress Development Workflow using Stackato, HipHop, and Jenkins

A recent project of mine to host WordPress using HipHop-PHP on the Stackato/CloudFoundry PAAS recently got profiled by Stackato

Developing like this enables:

  1. Consistent environments for development and production deployment
  2. A test site for every Pull Request
  3. Automation of the setup of a WordPress development environment
  4. And as a nice bonus serving WordPress via the HipHop-PHP compiler gives a 5x performance improvement :)

If you’re interested in finding out more, please join the mailing list

Customise your .gitattributes to become a Git Ninja

One of the things I love about Git is that your .gitignore file travels with the repo, so ignore rules remain consistent no matter which machine you are working on.

In the same vein, adding a .gitattributes to your repo allows you to ensure consistent git settings across machine.  This enables the following subtle, but very useful features.

  1. Force line ending normalization inside your repo to LF
    Adding * text=auto causes Git to autodetect text files and normalise their line endings to LF when they are checked into your repository. This means that simple diff tools (I’m looking at you Github) that consider every line to have changed when someone’s editor changes the ending won’t get confused.
    Importantly, this doesn’t affect the line endings in your working copy. By default Git will convert these to your platform’s default when checking code out of your repo. You can override this using the core.eol setting.
  2. Language specific diffs
    When git shows you diff information it gives you some context as to where in the code the diff lives. Using the *.cs diff=csharp setting tells Git to be a little smarter about tailoring this for a specific language. Notice how in the example below Git is telling us the method name where the change occured for th .cs file, compared to the default first non comment line in the file.
  3. Normalize tabs vs spaces
    The filter= attribute instructs Git to run files through an external command when pulling them from / to the repo. One use of this functionality would be to normalise tabs to spaces (or visa versa).
  4. Encrypting sensitive information
    It is convenient to store config files in your git repo, but for public repo’s you don’t really want to expose things like your production db credentials. Using Git filters you could pass these config files through an encryption/decryption step when checking in/out of the repository. On machines that have the encryption keys your config files will be placed in plaintext in your working copy; everywhere else they will remain encrypted.
  5. Useful defaults
    If you use the GitHub Git clients, they add useful default settings. Further, the github/gitignore and Danimoth/gitattributes projects contain some useful defaults.

The more I use Git, the more I realise what a powerful tool it is. And I haven’t even touched on how you can use Git hooks for advanced Git Ninja moves…

HOWTO – configure Netbeans PHP debugging for a remote server, over a SSH tunnel

Having tripped myself up on multiple occasions setting this up, I’m recording these config steps here for future-me.

Scenario:  You have a PHP site running on a remote [Ubuntu 12.04] server, and want to connect your local IDE [Netbeans] to the Xdebug running on that server over a SSH tunnel.

  1. apt-get install php5-xdebug
  2. vi /etc/php5/apache2/conf.d/xdebug.ini
  3. restart apache2
  4. Create remote->local SSH tunnel ssh -R 9000: [email protected]
  5. Launch Netbeans debugger

The key is that your Netbeans IDE acts as the server in this scenario, listening for incoming connections to port 9000 from the remote server’s XDebug.  Thus the tunnel must be from the remote port to your local port, not the other way around.

Some helpful debugging technques

Start ssh with -vv for debugging output

netstat -an | grep 9000

should show something like:

tcp 0 0* LISTEN
tcp6 0 0 ::1:9000 :::* LISTEN

AMEE in Excel

The AMEEConnect API gives access to a vast amount of climate related data. It also exposes standardise methodologies and to perform calculations based on that data.

As part of the London Green Hackathon I created the AMEE-in-Excel addin to tightly integrate this data and calculations into Excel.

So, if Excel is your preferred way to work with climate data, then this should be in your toolkit.

All code is open source and hosted at . Pull-requests are welcome!

Hurrah! AMEE in Excel won the behaviour change prize:

We believe over 80% of the sustainability field currently use spreadsheets. As a process, this is broken, not scalable and inaccurate. AMEE in Excel Integrates spreadsheets with web-services, to create a behaviour change that could address this issue and bring more credibility to the market.

So, if you want to collaborate on some Award Winning Software :), send in those pull requests


Hurrah! My first googlewack, discovered by complete accident.

Functional programming in Javascript and F#

During June 2011 I presented a session at the SPA2011 conference in London, UK.

My session was a hands on introduction to functional programming techniques with code samples in Javascript and F#. The focus on the session was to get peopling thinking about first class functions; and the techniques they enable to simplify and increase readability of code when solving certain classes of problems.

The code samples can be found at:

An online/executable version of the Javascript code is at http://functional-javascript.davidlaing.com.

Judging by the feedback I received, the session went very well. People seemed to like the hands-on format of the session; and just being left alone for a while to learn something at their own pace.

Implementing the strategy pattern without an explosion of classes – part 3 of ??

I feel uncomfortable when I see large switch statements. I appreciate how they break the Open Closed Principle. I have enough experience to know that they seem to attract extra conditions & additional logic during maintenance, and quickly become bug hotspots.

A refactoring I use frequently to deal with this is Replace Conditional with Polymorphism; but for simple switches, its always seemed like a rather large hammer.

Take the following simple example that performs slightly different processing logic based on the credit card type:

Its highly likely that the number of credit card types will increase; and that the complexity of processing logic for each will also increase over time. The traditional application of the Replace Conditional with Polymorphism refactoring gives the following:

This explosion of classes containing almost zero logic has always bothered me as quite a lot of boilerplate overhead for a relatively small reduction in complexity.

Consider however, the functional approach to the same refactoring:

Here we have obtained the same simplification of the switch statement; but avoided the explosion of simple classes. Whilst strictly speaking we are still violating the Open Closed Principle; we do have a collection of simple methods that are easy to comprehend and test. It’s worth noting that when our logic becomes very complex; converting to the OO Strategy pattern becomes a more compelling option. Consider the case when we include a collection of validation logic for each credit card:

In this case the whole file starts to feel too complex to me; and having the logic partitioned into separate strategy classes / files seems more maintainable to me.

To conclude then, the fact that languages treat functions as first class constructs, gives us the flexibility to use them in a “polymorphic” way; where our “interface” is the function signature.

And for some problems, like a refactoring a simple switch statement; I feel this gives us a more elegant solution.