Docker for research

(and fun)

J. Fernando Sánchez (jf.sanchez@upm.es)

2018

Intro

Before we begin

Code available at:

https://github.com/balkian/lab-in-a-box

Live demos at:

  • https://github.todevnull.com
  • https://lab.todevnull.com
  • https://hub.todevnull.com

Feel free to log in, but try not to break them for now :)

My name is Fernando and…

At Grupo de Sistemas Inteligentes

http://www.gsi.dit.upm.es
http://www.gsi.dit.upm.es
  • Big Data and Machine Learning
  • Natural Language Processing (NLP) and Sentiment Analysis (SA)
  • Social Network Analysis
  • Agents and Simulation
  • Linked Data and Semantic Technologies

And I ❤ Docker

  • Using it for research for 3+ years
  • Actively pushing it for ~2 years

For individuals

Experiment, publish, repeat

The scientific method
The scientific method

Hard to reproduce

  • Missing data
  • Bleeding edge tools and libraries
  • No testing
  • Little to no documentation
  • Multiple languages
https://xkcd.com/1742/
https://xkcd.com/1742/

Lack of experience

Is it a problem?

https://www.nature.com/
https://www.nature.com/

Jupyter notebooks

Jupyter architecture

http://jupyter.readthedocs.io
http://jupyter.readthedocs.io

Docker to the rescue

https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5
https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5

Docker-stacks

Reproducible environment

Reproducible and friendly environment

Other tools

https://github.com/NVIDIA/nvidia-docker
https://github.com/NVIDIA/nvidia-docker

Other tools

https://mybinder.org/
https://mybinder.org/

Other tools

https://cocalc.org/
https://cocalc.org/

For small groups

Features

  • Shared environments
  • Resource sharing
  • Easy configuration
  • Versioning
  • Backups

And little to no overhead

Isolation

Jupyterhub

Authenticators

  • OAuth (GitHub, GitLab, Google)
  • LDAP
  • JWT

https://github.com/jupyterhub/jupyterhub/wiki/Authenticators

Spawners

  • Local
  • Docker
  • Kubernetes
  • Marathon

https://github.com/jupyterhub/jupyterhub/wiki/Spawners

It’s demo time!

https://github.todevnull.com https://github.com/balkian/lab-in-a-box

Conclusions

Benefits of docker

  • Docker + Docker-compose
    • Reproducible environments (partially)
    • Reduced tooling / experience
  • Jupyterhub
    • Shared environments
    • Web interface (zero knowledge)

Thanks for listening!

https://github.com/balkian/lab-in-a-box

jf.sanchez@upm.es