Tag: Docker

A Docker environment for Windows (October 2015 edition)

This blog post provides an outline method for installing a nice environment for developing in Python using Docker on a Windows 10 machine. Hopefully I have provided sufficient of the error messages I encountered that both myself and others will find this post when in distress!

The past three years I’ve been working for ScraperWiki as a data scientist, this has meant a degree of coding in Python and interacting with my colleagues, and some customers, who use Linux (principally Ubuntu) or OS X. I have continued to use a Windows laptop. You can see my review of it here.

Until recently my setup was based on a core installation of Python and a whole bunch of handy libraries using Python(x,y). I also installed Git for Windows which gave me a shell prompt, and the command-line git commands along with some fraction of the bash environment. I also installed msysgit which provided further Linux style enhancements to my shell. I configured my shell so I could get ssh access to ScraperWiki servers in the cloud. For reasons I can’t recall I also installed ansicon.exe which gives the Windows Command prompt some of the colour highlighting of a modern shell prompt.

With this setup I could do most of what I needed from Windows, and if I had to I could fire up my Ubuntu VM and work in there. Typically I did this when I had some tricky libraries to install, or I wanted to be sure I could deploy onto ScraperWiki’s servers in the cloud. I never really got virtual environments working nicely on Windows – virtualenvwrapper, which makes such things, nicer is challenging to configure on Windows.

Students of this sort of thing will appreciate that the configuration described above is reached with a degree of trial and error, and a lot of Googling of error messages.

Times have changed and this setup was getting a bit long in the tooth, the environment around me was also changing – we started using Docker. I couldn’t get code using the Python requests library to run because of problems with SSL. Also, all the cool kids were talking about Python 3 and how new projects should all be in Python 3. I couldn’t work out how to add Python 3 to my Python(x,y) installation, and furthermore I was currently tied to 32-bit rather than 64-bit Python. ScraperWiki had recently done some work on making an easily deployable Python application and identified that the Anaconda Python distribution by ContinuumIO was the way to go.

Installing Python 3 and 2 using Anaconda

This worked very smoothly, there is an installer here. I had Python 3.4.3 (64-bit) working in the twinkling of an eye, and from my bash prompt I could now run that Python code which was previously broken due to OpenSSL problems. However, all was not rosy since it turns out my latest project was accidently Python 3 compatible, whilst my older projects were not. I therefore needed Python 2 as well. In principle, with Anaconda this is as simple as doing:

conda create -n python2 python=2.7 anaconda

and then

source activate python2

This puts you into a Python 2 virtual environment which will run your old code. However, it doesn’t work from the Git Bash prompt, you need to use a Windows Command prompt, as discussed here. But at least I now have the latest whizzy Python 3 installation and I can also run Python 2, when required. It’s worth noting that installing new libraries on Python under Windows has become rather easier with newer versions of pip, I believe due to the introduction of pip wheel. In the past installing some libraries was a pain because of a need to compile binary components.

Using Docker on Windows with Docker Toolbox and the Git SDK

The next task was to get support for Docker, the container system. You can find out more about Docker in my blog post here. Essentially it is a method for running an application in an isolation unit which is defined by a simple Dockerfile, largely removing problems of dependencies and versions. Docker is intrinsically a Linux technology, it relies on several deeply embedded components of the operating system and so does not run on Windows. However, you can boot up a very lightweight Linux-based VM and run Docker images on that from either Windows or OS X. Until recently this was done using boot2docker. The new way is to use the Docker Toolbox. I held off installing this until it became Windows 10 compatible since as a neophile I have obviously upgraded to Windows 10 at earliest opportunity. Docker Toolbox installs VirtualBox to run a VM to host Docker and Git for Windows to provide a bash shell prompt, as well as the Docker commandline tools.

I found installing Docker Toolbox relatively smooth although I had a problem with it finding ssh key files with an error message “open <filepath\ca.pem : The system cannot find the file specified” which was fixed by regenerating the key files:

docker-machine regenerate-certs default

But this alone does not give me the right workflow since ScraperWiki make heavy use of Make to build and run containers and Git for Windows does not come configured with Make. You can see this in action for the Simple API we made for the NewsReader Project. I used the Git for Windows SDK to provide Make and other build tools. This is designed for use by Git for Windows developers, it’s based on msys2 which I also tried to install but which errored on a couple of steps. The Git SDK is more verbose in its installation appeared to install cleanly.

Once we have Git for Windows SDK we need to use its git-bash to launch Docker Quickstart Terminal (rather than the version provided by the Toolbox), this means changing the command executed by the Docker Quickstart Terminal shortcut from:

"C:\Program Files\Git\git-bash.exe" "C:\Program Files\Docker Toolbox\start.sh"

to

C:\git-sdk-64\git-bash.exe "C:\Program Files\Docker Toolbox\start.sh"

Update 2016-03-21: I modified start.sh to give the docker-machine binary an absolute path, this means I can launch a plain Git Bash shell and run the start.sh script later, if required. This change requires further modification to make sure paths were properly escaped. You can see my version of start.sh here: https://gist.github.com/IanHopkinson/85453a90212eb6627f29

Simply trying to run the Git SDK version of the make tool does not seem to work, you get an error like “unable to make temporary trusted Dockerfile”.

We’re into the final straight now!

My final problem was that when I tried to make my previously working application it failed with an error message:

IOError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/home/newsreader-demo']

The problem seems to be the way in which msys2 handles paths in Windows it needs to have two preceding //, rather than one, as described here. So all I need to do is change this line in my Makefile

@docker run -p 8000:8000 --read-only --rm --volume /tmp -e NEWSREADER_PUBLIC_API_KEY ianhopkinson/newsreader_demo

To this:

@docker run -p 8000:8000 --read-only --rm --volume //tmp -e NEWSREADER_PUBLIC_API_KEY ianhopkinson/newsreader_demo

Can you see what I did there?

Update 2015-10-14 – interactive shells into docker

If you try to get an interactive shell on a container then you get an error like:

cannot enable tty mode on non tty input

To avoid this you can use winpty:

winpty docker exec -i -t [CONTAINER_NAME] bash

There’s some discussion of this on the Docker Toolbox issue tracker

Update 2015-10-22 – Which Python are you using?

It turns out I was accidentally using the Python shipped with Git for Windows SDK, rather than the Anaconda version I had so carefully installed. I fixed this by adding this to my .profile file:

export PATH=/c/anaconda3/:$PATH

I didn’t spot it earlier because I checked Python version by running ipython rather than python.

Concluding thoughts

I wrote this partly in frustration at the amount of time I spent getting this all fixed up, and the fact that I couldn’t stop until I had fixed it. The scheme above worked for me but I suspect it is quicker and easier to do on a laptop with no history.

There’s no doubt that the situation is better than I found it 3 years ago but it is still a painful process involving much trial and error. Docker brings great benefits for developers, and once it is working makes sharing your work across multiple users very straightforward.

Book review: Docker Up & Running by Karl Matthias and Sean P. Kane

This review was first published at ScraperWiki.

This last week I have been reading dockerDocker Up & Running by Karl Matthias and Sean P. Kane, a newly published book on Docker – a container technology which is designed to simplify the process of application testing and deployment.

Docker is a very new product, first announced in March 2013, although it is based on older technologies. It has seen rapid uptake by a number of major web-based companies who have open-sourced their tooling for using Docker. We have been using Docker at ScraperWiki for some time, and our most recent projects use it in production. It addresses a common problem for which we have tried a number of technologies in search of a solution.

For a long time I have thought of Docker as providing some sort of cut down virtual machine, from this book I realise this is the wrong mindset – it is better to think of it as a “process wrapper”. The “Advanced Topics” chapter of this book explains how this is achieved technically. This makes Docker a much lighter weight, faster proposition than a virtual machine.

Docker is delivered as a single binary containing both client and server components. The client gives you the power to build Docker images and query the server which hosts the running Docker images. The client part of this system will run on Windows, Mac and Linux systems. The server will only run on Linux due to the specific Linux features that Docker utilises in doing its stuff. Mac and Windows users can use boot2docker to run a Docker server, boot2docker uses a minimal Linux virtual machine to run the server which removes some of the performance advantages of Docker but allows you to develop anywhere.

The problem Docker and containerisation are attempting to address is that of capturing the dependencies of an application and delivering them in a convenient package. It allows developers to produce an artefact, the Docker Image, which can be handed over to an operations team for deployment without to and froing to get all the dependencies and system requirements fixed.

Docker can also address the problem of a development team onboarding a new member who needs to get the application up and running on their own system in order to develop it. Previously such problems were addressed with a flotilla of technologies with varying strengths and weaknesses, things like Chef, Puppet, Salt, Juju, virtual machines. Working at ScraperWiki I saw each of these technologies causing some sort of pain. Docker may or may not take all this pain away but it certainly looks promising.

The Docker image is compiled from instructions in a Dockerfile which has directives to pull down a base operating system image from a registry, add files, run commands and set configuration. The “image” language is probably where my false impression of Docker as virtualisation comes from. Once we have made the Docker image there are commands to deploy and run it on a server, inspect any logging and do debugging of a running container.

Docker is not a “total” solution, it has nothing to say about triggering builds, or bringing up hardware or managing clusters of servers. At ScraperWiki we’ve been developing our own systems to do this which is clearly the approach that many others are taking.

Docker Up & Running is pretty good at laying out what it is you should do with Docker, rather than what you can do with Docker. For example the book makes clear that Docker is best suited to hosting applications which have no state. You can copy files into a Docker container to store data but then you’d need to work out how to preserve those files between instances. Docker containers are expected to be volatile – here today gone tomorrow or even here now, gone in a minute. The expectation is that you should preserve state outside of a container using environment variables, Amazon’s S3 service or a externally hosted database etc – depending on the size of the data. The material in the “Advanced Topics” chapter highlights the possible Docker runtime options (and then advises you not to use them unless you have very specific use cases). There are a couple of whole chapters on Docker in production systems.

If my intention was to use Docker “live and in anger” then I probably wouldn’t learn how to do so from this book since the the landscape is changing so fast. I might use it to identify what it is that I should do with Docker, rather than what I can do with Docker. For the application side of ScraperWiki’s business the use of Docker is obvious, for the data science side it is not so clear. For our data science work we make heavy use of Python’s virtualenv system which captures most of our dependencies without being opinionated about data (state).

The book has information in it up until at least the beginning of 2015. It is well worth reading as an introduction and overview of Docker.