class: title, cover, middle, center <img class="logo-full" src="img/fathom-logo-full.svg" width="20%"> --- class: middle, center .big7x[Which<br> .deep-sky.big12x[three]<br> Docker<br> images?] ??? What 3 Docker images would you want with you if you were shipwrecked on a desert island? Good evening. I'm Andrew Collier. I'm Lead Data Scientist for Fathom Data, a remote-only Data Science consulting company based in South Africa. Over the last few years the frequency with which I use Docker has escalated from once every couple of months to daily. It's probably my most useful tool. Having said that, I'm by no means an expert. Perhaps a enthusiastic amateur. --- class: middle, center <img src="img/warning.png" height="250px"> .big6x.warning[WARNING!] .big6x[No advanced content.] ??? I'll begin with a caveat: this is not an advanced talk about Docker. It's a high level introduction to what's possible with Docker, specifically aimed at people who use Python. --- class: section, center .big6x[What is Docker?] --- class: middle .big5x[It's _like_ a .deep-sky[Virtual Machine].] .footnote[* For example, you can use it to run Linux processes on a Windows machine.] ??? One might say that it's like a Virtual Machine and makes it possible for you to run, for example, Linux processes on a Windows machine. But it's a lot more than that. --- class: middle .big5x[But also .deep-sky[totally different].] .footnote[* Nonsense! You're explaining one technology by referring to another (equally obscure) technology.] ??? However, I realise that I'm trying to explain one technology by referring to another technology that might be equally obscure. Perhaps a diagram will help. --- class: inverse background-image: url("data:image/png;base64,#img/docker-versus-vm.svg") background-repeat: no-repeat background-size: 100% 100% ??? The setup we are probably most familiar with is the one depicted on the left. It's what you have on your laptop or desktop machine: an operating system with a collection of system library that can run one or more applications. The challenge with this is that it can be really tricky to handle multiple versions of the same application. Suppose, for example, that you needed to run both Python 3.2 and Python 3.10. Even virtual environments are not going to help you with that. On the other extreme are virtual machines. In this case you can host one or more complete guest operating systems. This can be very useful but it's also a rather heavyweight solution because multiple instances of the virtual machine will replicate the entire operating system, system libraries etc. There are two types of hypervisors. A _bare metal_ hypervisor sits directly on the hardware, below the operating system. These are high performance. A _hosted_ hypervisor sits on top of the operating system. Easier to configure but not as performant. There are a lot of things in an Operating System that we just don't need. Docker is a compromise between these two extremes. It allows you to have multiple guest processes (or "containers") running on your machine. These processes have just the minimal required runtime environment, so they are lightweight. <!-- ====================================================================== --> <!-- ====================================================================== --> <!-- === === --> <!-- === PYTHON === --> <!-- === === --> <!-- ====================================================================== --> <!-- ====================================================================== --> --- class: section, center .big12x.top[1] .big5x[Food & Water: .text-black[Python]] ??? Many of these problems can be resolved by just using the Python Docker image. --- background-image: url("data:image/png;base64,#img/cookie-cutter-docker.webp") background-repeat: no-repeat background-size: 100% 100% ??? It's time to introduce some terminology. Two terms are pervasive with Docker: image and container. You can think of an image as a template (or by analogy, a cookie cutter). A container then is what you get by applying the template (or using the same analogy, cut out a cookie). You can have multiple images (perhaps for Python, NGINX and R) and each can be used to create one or more containers. --- class: middle .big3x[A .deep-sky[container] comes from an .deep-sky[image].] ??? So, a container comes from an image. --- class: middle .big3x[Where does an .deep-sky[image] come from?] ??? But where does an image come from? --- class: inverse, middle <img src="img/image-official-python.png" width="100%" > <br><br> .big2x[https://hub.docker.com/_/python] More than 1 billion downloads. 🤩 Around 8 million downloads per week. ??? There are a variety of image sources, but the most common is Docker Hub. It hosts both "official" as well as "user" images. You can find the details of the official Python image at this URL. It's fairly popular: as you can see it's been downloaded more than 1 billion times. --- class: middle <img src="img/pull-run.svg" width="100%" > ??? There are two steps to using Docker. First you need to pull and image. Then you run the image to create a container. --- class: inverse, middle, center <!-- Image made with "screenshot - large" profile. --> <img class="shadow" src="img/python-pull.png" width="950px"> ??? The first step towards creating a Python container is to download (or "pull") the image using the `docker pull` command. That doesn't actually run the image. It just downloads it onto your host. --- class: inverse <!-- Image made with "screenshot - large" profile. --> .center[<img class="shadow" src="img/python-run.png" width="950px">] Containers are non-interactive by default. .footnote[* Er, that's rather underwhelming. So what?] ??? To create a container you use the `docker run` command. The result might seem underwhelming. And it is. But there's a reason for that: containers are non-interactive by default. --- class: inverse <!-- Image made with "screenshot - large" profile. --> .center[<img class="shadow" src="img/python-run-interactive.png" width="950px">] What's `:latest`? ??? If you want to interact with the container then you need to provide the `-i` (interactive) and `-t` (terminal) flags. You might be wondering what the "latest" is all about. --- class: middle .big3x[It's a .deep-sky[tag].] ??? Images come with tags. These are labels for different versions of an image. --- class: inverse, middle <img src="img/image-tag-example.png" width="100%" > ??? These tags can be viewed on Docker Hub. Here, for example, are the details of the 3.9.19-slim-bullseye version of the Python image. --- class: inverse What's funky with this code? 🤔 ```bash import ConfigParser config = ConfigParser.ConfigParser() config.add_section('Greetings') config.set('Greetings', 'message', 'Hello, World!') print config.get('Greetings', 'message') ``` --- class: inverse What's funky with this code? 🤔 ```bash *import ConfigParser config = ConfigParser.ConfigParser() config.add_section('Greetings') config.set('Greetings', 'message', 'Hello, World!') *print config.get('Greetings', 'message') ``` -- It's Python 2. How do I know? - The `ConfigParser` package is now `configparser`. - The `print` function now requires parentheses. -- What can I do with it? - Migrate the code to Python 3. - Run a Python 2 interpreter (no longer widely available). 👈 ??? This code contains a few anachronistic features which are only valid in Python 2. Suppose that you wanted to run the code without making modifications. --- class: middle .big3x[Use the .deep-sky[right tag]!] --- class: inverse, center <!-- Image made with "screenshot - large" profile. --> <img class="shadow" src="img/python-run-interactive-old.png" width="950px"> ??? These tags mean that we can pull and run different versions of an image. So, for example, here I pull and run Python 2.7.18. --- background-image: url("data:image/png;base64,#img/container-stormy.webp") background-repeat: no-repeat background-size: 100% 100% ??? But there's a problem. --- class: middle .big3x[A container is a .deep-sky[sealed environment]!] .footnote[* You could say that everything is "contained". Nothing comes in. Nothing goes out.] ??? Not only are containers non-interactive by default, each container is a sealed environment. Nothing comes in or goes out. --- class: middle .big3x[ But you can 👊 .deep-sky[punch] 👊 holes in it. `-v` — mount a (disk) volume<br> `-p` — publish a (network) port ] ??? However, there are some run time flags that can be used to share information with a running container. --- class: inverse, center <img class="shadow" src="img/hello-world-broken.png" width="950px"> -- <img class="shadow" src="img/hello-world-fixed.png" width="950px" style="margin-top: -20px;"> ??? If you try to run that code with the latest Python image then you get an error. Why? It's Python 2 code and you're trying to run it on Python 3. If, however, you use a Python 2 image then it works fine. --- background-image: url("data:image/png;base64,#img/jungle.webp") class: section .big6x[Tag Jungle] ??? I mentioned tags a few moment ago. These can be a source of confusion, so it's worthwhile taking a closer look at them. --- class: inverse, middle <table> <tr> <th>Tag</th> <th>Operating System</th> <th>Size (MB)</th> </tr> <tr> <td>3.9.19</td> <td>Debian</td> <td class="text-align-right">358.91</td> </tr> <tr> <td>3.9.19-bookworm</td> <td>Debian 12</td> <td class="text-align-right">358.91</td> </tr> <tr> <td>3.9.19-bullseye</td> <td>Debian 11</td> <td class="text-align-right">336.75</td> </tr> <tr> <td>3.9.19-slim</td> <td>Debian</td> <td class="text-align-right">46.66</td> </tr> <tr> <td>3.9.19-slim-bookworm</td> <td>Debian 12</td> <td class="text-align-right">46.66</td> </tr> <tr> <td>3.9.19-slim-bullseye</td> <td>Debian 11</td> <td class="text-align-right">45.61</td> </tr> <tr> <td>3.9.19-alpine</td> <td>Alpine</td> <td class="text-align-right">17.70</td> </tr> <tr> <td>3.9.19-alpine3.19</td> <td>Alpine 3.19</td> <td class="text-align-right">17.70</td> </tr> <tr> <td>3.9.19-alpine3.18</td> <td>Alpine 3.18</td> <td class="text-align-right">17.44</td> </tr> </table> .footnote[* All Python images (and most images in general!) are Linux based.] ??? These are the various tags available for Python 3.9.19. The tag tells you about the Python version. But it also tells you about the underlying operating system. The default tag corresponds to a Debian image. But you can also be specific about what version of Debian you want. There are also "slim" versions of the Debian images which exclude all non-essential functionality. You can also use an image that's based on Alpine Linux, a very lightweight distribution. --- class: middle .big4x[How to choose the .deep-sky[right] tag?] - What version of Python? - What underlying operating system? - How small do you need the image to be? .footnote[* If ambivalent, go with `latest`. Lighter weight images require more work to extend.] ??? How to choose the right version? First decide what version of Python you need. If you're ambivalent then go with the latest version. Then decide what operating system and version. Your choice will determine how much work will be required if you want to extend the image with extra functionality. <!-- ====================================================================== --> <!-- ====================================================================== --> <!-- === === --> <!-- === JUPYTER === --> <!-- === === --> <!-- ====================================================================== --> <!-- ====================================================================== --> --- class: section, center .big12x.top[2] .big5x[Shelter: .text-black[Jupyter]] --- class: inverse, center <img class="shadow" src="img/jupyter.png" width="950px"> ??? Many people who use Python for working with data will operate in a Jupyter environment. --- class: inverse ```bash docker run jupyter/minimal-notebook ``` But that's .deep-sky[inaccessible]! ☹️ .footnote[* This is not an "official" image.] -- Share the current working directory (so you can access your files). ```bash docker run \ * -v $(pwd):/home/jovyan/work \ jupyter/minimal-notebook ``` -- Make it accessible via port 8888 (so you can access notebooks with your browser). ```bash docker run \ * -p 8888:8888 \ -v $(pwd):/home/jovyan/work \ jupyter/minimal-notebook ``` ??? Not surprisingly there's a Docker image for that. This is not an "official" image but is published by the `jupyter` organisation. Simply creating a container is not very useful. Like Python in the terminal you will not be able to interact with it. To make that possible you need to share a port from the container with the host machine. You'll probably also want to share data files and notebooks with the container and this is done via a volume mount. --- class: inverse background-image: url("data:image/png;base64,#img/jupyter-image-relationships.svg") ??? There's a hierarchy of Jupyter images. We've used the minimal notebook but there are others below it in the hierarchy that have more functionality installed. There's a compromise though: more functionality leads to a larger image. So unless you specifically need something from one of these images it's best to go with the simplest one possible. --- class: inverse, center <img class="shadow" src="img/jupyter-error.png" width="950px"> ??? Because this image is _minimal_ it doesn't contain some of the most useful packages. --- class: section-slide-orange .big8x[Make<br>this<br>.underline[much more]</br>useful!] --- background-color: #fdfbf7 background-image: url("data:image/png;base64,#img/dockerfile-image-container.webp") background-repeat: no-repeat background-size: 100% auto .footnote[* Roll your own Docker image by creating a `Dockerfile`.] --- class: inverse, middle Create a `Dockerfile`: ```bash FROM jupyter/minimal-notebook:latest RUN pip install numpy ``` Then build and run: ```bash docker build -t jupyter-numpy . docker run -p 8888:8888 jupyter-numpy ``` ??? We can make this more useful by creating a derived image and installing Numpy on it. <!-- ====================================================================== --> <!-- ====================================================================== --> <!-- === === --> <!-- === TENSORFLOW === --> <!-- === === --> <!-- ====================================================================== --> <!-- ====================================================================== --> --- class: section, center .big12x.top[3] .big5x[Build a Raft: .text-black[TensorFlow]] ??? Cranking up the level of difficulty a little more, the third image is for TensorFlow. --- class: inverse .center[<img src="img/install-tensorflow.png" height="550">] .footnote[* Installing TensorFlow is notoriously challenging.] ??? Installing and configuring TensorFlow, especially when you have a GPU, is notoriously challenging. Typically you'd need to set aside a few hours to get this sorted. Let's see how this can be accomplished quickly and easily using Docker. --- background-image: url("data:image/png;base64,#img/chip-cpu.webp") background-size: 100% 100% ??? We'll look at both CPU and GPU compute. Let's start with CPU. --- class: inverse ## TensorFlow on CPU ```bash docker run -it tensorflow/tensorflow:2.16.1 ``` The tag reflects the TensorFlow version. .footnote[* This too is not an "official" image.] -- ### 🚀 Test TensorFlow ```python import tensorflow as tf # Create 3D array of random numbers then calculate average. tf.reduce_mean(tf.random.normal([100, 100, 100])) ``` ??? There's a `tensorflow` image that's published by the `tensorflow` organisation. --- background-image: url("data:image/png;base64,#img/chip-gpu.webp") background-size: 100% 100% --- class: inverse ## TensorFlow on GPU First check that there is a GPU. ```bash lspci | grep -i nvidia ``` ``` 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) ``` Need to install on host: - NVIDIA Drivers 👍 and - NVIDIA Container Toolkit 👍. -- Don't need to worry about: - Python and library compatibility 😭 - installing CUDA 😱 or - configuration 🥱. --- class: inverse ## TensorFlow on GPU ```bash docker run --gpus all -it tensorflow/tensorflow:2.16.1-gpu ``` .footnote[* Also not an "official" image.] -- ### Check GPU Use NVIDIA System Management Interface (SMI) to check GPU. ```bash nvidia-smi ``` -- ### 🚀 Test TensorFlow ```python import tensorflow as tf # Create 3D array of random numbers then calculate average. tf.reduce_mean(tf.random.normal([100, 100, 100])) ``` --- class: middle .big4x[All the cool kids use .deep-sky[Jupyter]!] --- class: section-slide-orange .big8x[Make<br>this<br>.underline[outrageously]</br>useful!] --- class: inverse, middle Create a `Dockerfile`: ```bash FROM tensorflow/tensorflow:2.16.1-gpu-jupyter ENV TF_CPP_MIN_LOG_LEVEL 2 RUN apt-get update -q && apt-get install -y libpq-dev COPY requirements.txt . RUN pip install -r requirements.txt ``` ```bash scipy==1.13.0 psycopg2==2.9.9 ``` Then build and run: ```bash docker build -t jupyter-tensorflow . docker run -p 8888:8888 jupyter-tensorflow ``` ??? So there you have a complete development environment including the latest version of Python, Jupyter notebooks, required packages and TensorFlow with GPU support. That should definitely keep you busy until you're rescued. --- class: middle .leftcol60[ .big3x[Desert Island Docker picks:] .big4x[1. .deep-sky[Python] 2. .deep-sky[Jupyter] 3. .deep-sky[TensorFlow]] ] .rightcol40[ <div style="text-align: right;"> <img src="https://avatars.githubusercontent.com/u/6134409?v=4" width="200px"><br> Andrew B. Collier <code>andrew.b.collier@gmail.com</code> .smaller.list-style-none[ -
https://github.com/datawookie -
https://twitter.com/datawookie -
https://www.linkedin.com/in/datawookie ] <img class="logo-full" src="img/fathom-logo-full.svg" width="40%"> </div> ] <div style="clear: both;"></div> Slides at https://bit.ly/pydata-london-desert-island-docker. .footnote[* Add a mechanical keyboard and gaming mouse and I might never leave.] ??? As a Python developer these are the images that I'd want to have with me on a desert island. They should keep me happy and productive until I'm rescued. Throw in a mechanical keyboard and gaming mouse and I might never want to leave. But don't tell my wife.