Docker, Portainer, NVIDIA Container Toolkit

High-performance GPU servers in a modern AI data center environment

Share this

About

This application stack is composed of three pieces of software:

Installation

From the SHARON AI public web billing portal, choose your desired virtual machine product.  We have a wide array of CPU and GPU based virtual machines with dedicated resources that guarantee you performance without contention.

 

After choosing your product, choose your Operating System and Application.  We recommend a recent Ubuntu LTS based distribution such as Ubuntu 22.04 or 24.04 (released in 2022 and 2024 respectively, and each maintained with security patches for 5 years).  Older distributions may have problems or performance issues with outdated versions of Python and hardware drivers, and are not recommended.

 

5252eb358a4befd12f49df130d8321629607cc55edc3137c000b5b84f55fba1e081b56e711e6a8da?t=60e6ac464888bc5ee5cbe930a2f0f38a

 

Configure the rest of the options to suit your needs, including your disk space, SSH public key, etc.

NOTE: The password you set here will be applied to the default `ubuntu` user.  We will need this to log in to JupyterHub later.

 

When happy with your configuration, complete your order process and wait for your virtual machine to start.  This process can take several minutes as the application deployment collects the various applications and drivers necessary. Output can be seen in the files `/var/log/cloud-init.log` and `/var/log/cloud-init-output.log`, and for newer distributions followed via the systemd-journal logger using the command `sudo journalctl -f`.

 

Using the application

Portainer allows simple management of Docker from a web GUI.   It exposes itself on TCP/9443 for HTTPS.  Find your VM’s IP on your product information page:

So, for example if your product was assigned the IP “123.456.789.123”, you could connect to your Portainer service in your browser on “https://123.456.789.123:9443/” .   Note that Portainer defaults to a self-signed SSL certificate by default, so you will need to accept that in your browser when you first connect.

When first starting up, Portainer asks the user to create a new user account which will be the administrator account.  Because these services are often live on the public Internet, a timeout is in place so that you don’t expose this system for too long in an unconfigured state.  If you happened to order your VM instance and find yourself distracted by another task, you may come back to the following screen:

927582d498c79ec122aef4f7a8cd02d5f7369f07ba1b8f2f0af36b3d5894d39ba9346de10d0cbffc?t=55752bd2161a0b375ff217b4211e85c7

 

If you see this screen, you will need to restart the container manually.  You can do this by connecting to your instance via SSH, and running the command:

 

sudo systemctl restart docker

 

This will restart the Docker service as well as the Portainer container.  On reconnect, you should see the default screen prompting you to create an admin account:

5af91fee9babb6c44f79ab739f98d468ffb599965cf6641c654e9fabd5cff160365a2a91ed4682cc?t=7896e0fc1f83adff1791e1c9a0fa0e4e

 

Once logged in, we see our default management screen:

 

518cad5193e104b30890c012e7220de24a75bfc7bf5792aa980b06cfa2236f7dc6a203f0559af7e5?t=f314604a9126a4113c057aa16fc20c4a

 

First, let’s do some setup.  By default Portainer doesn’t allow containers to use GPUs, which isn’t a whole lot of fun at all.  So let’s enable that.

Clicking “Home” shows us our default “local” environment:

6a1c6348d42888b560701cd7bdc60c03a0a29d0826e17329fbd0459f23f6f303758c4d3609b8cfae?t=51da9587a7952e70f4e6bc5e80c3d2cb

 

Clicking on the environment name “local” itself shows us the dashboard for the “local” environment:

 

b7358627500ab9e96785a216077bdba4a3a80e444e3b5dbd6197358712845467cf79e6e7be201719?t=e895d5da9f62d32bbf8a9871cb448ac8

 

In our left-hand menu, we navigate to Host -> Setup.  In the right hand window, scroll all the way to the bottom.  Enable “Show GPU in the UI”, and click “Save configuration”:

28ecc43e0db87156b95831dfea0bb178d443db9915cfddd195994204547e7bb933a55d48f8868c75?t=12782c248565b946a0b29f846d92a828

 

By default Portainer allows a number of public container registries from across the Internet.  One popular registry that’s missing however is the GitHub container registry, or `ghcr.io`.  Let’s add that in as a custom registry so we can easily run containers from there.  In our menu above, click the drop-down arrow next to “Hosts” and then “Registries”:

c760729991c60251fb0101b4aefa6c2978422a3487eba44270ed3b0916d48fa0b93861dcc1c10ffd?t=cce120bb59a678e9b12f52ac2a14b055

Here we see that Portainer has set up the popular DockerHub container registry with free anonymous access.  Click the “Add registry” button:

 

3ac910c479d952752266a61c80b1332d845197051cbe7d40d2d7a32a261447e25dddd1ac330c3538?t=23c4a1487ecd73a05b688b192b1de41d

 

We’re going to choose “Custom registry”, give it the name “ghcr.io”, and the URL “ghcr.io”.  Click “Add registry” to save the changes.

 

And that’s it!  We’re ready to start adding containers to our system.

 

Open WebUI and Ollama – testing containers with LLMs

As an interesting use case, let’s run a container that offers a fully open source, locally hosted, private LLM (Large Language Model).

Often we’ll hear concern over privacy issues when it comes to using third party LLM services.  Running open source models on trusted infrastructure gives us the peace of mind to be able to test LLMs in save environments.  We can add in strict firewall settings, monitor traffic, encrypt storage and communication, and destroy instances when we’re done testing.

 

Two excellent projects that can work hand-in-hand to test models are:

  • Open WebUI: An open source “chat” interface that can use a number of engines and LLMs behind the scenes
  • Ollama: An open source “engine” that can talk between user tools such as chat interfaces or software development tools, and an LLM

Thankfully the clever developers behind Open WebUI have provided a container with both of these tools all configured and ready to go.  Consulting the Open WebUI GitHub page, it tells us to run the following:

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

 

But we’re not going to do that.  Instead, we’re going to use the mighty Portainer container manager!

 

Make sure you’ve followed all the steps above – for this example to work, we need to have enabled GPU access in the Portainer GUI, and we need to have added the “ghcr.io” GitHub Container Registry.

Click on the “Dashboard” link under our “local” environment, and we can see what containers are running.  Right now there’s only Portainer itself, so we have a single running container, and a single volume which is the persistent disk storage for the Portainer container.

 

0a567a9253e1f5a03757f51e8494fd4371b8a129df0f11163d3f7ebbc2553ac88530ee9346e848ed?t=6bf98975009a0e5e0a8a53a6dd6fc219

 

Click the “Volume” button on the right (or alternatively the “Volumes” link in the left hand menu), and we’ll see the persistent storage volumes.   Again, with only one active container, all we see is the “portainer_data” volume that was created by the automation tool.

 

cd4b2da7f155459bf63074a8300fa9d43d5392b0f307527259e167d7a94bf3a073bf713be8eecc95?t=8d4af063baa933286642ecc72249fc44

 

Click “Add volume” to create a new volume.  The docker command line we looked at above wants to map a Docker volume called “ollama”, and present that to the container as the path “/root/.ollama” inside the running container.  So we’ll call our volume “ollama”.  Simply type in the name, leave all the other values default, and click “Create volume”:

09049675f106288b52f0f06bb3797bfc46b458534957a32f29dd314e4ac1641180769d71a3b66816?t=7392c867aa3af77322a01ca7a319508d

 

Once done, you should see a message saying it was successful, and you’ll return to the “Volumes” screen and see the new “ollama” volume listed, and tagged as “unused”.

Let’s repeat for the “open-webui” volume.  Volumes -> Add volume.  Name it “open-webui”, and click “Create the volume.  Again, this should return us to the “Volumes” screen, showing our two new volumes, both in state “Unused”.

Time to create/pull the container itself.  Click on the “Containers” link in the left hand menu.  Again, we see just our Portainer container running and nothing else.  Click “Add container”:

ada3a7f5feaefdc075b2b23af78d59e4c329a0d7176d59cf63294bcbff4fc0cfe38d26d6d575f3a5?t=271beb99d9cc483032eba9673b2a71f0

 

We’re going to copy-paste all the values from the Docker command supplied above and enter them in the relevant fields.

  • Name: open-webui
  • Registry: ghcr.io (Not seeing this?  You forgot to follow the steps to add it above)
  • Image: open-webui/open-webui:ollama
  • Port mapping:
    • Host (i.e.: what we expose on our VM to the world): 3000
    • Container (i.e.: what the service inside the container is configured to run): 8080
    • Note that your “host” ports must be unique: you can’t specify the same listening port twice on the host side.   Each container spawns in a separate network, so there’s no problems with containers having conflicting ports internally, as they’re kept segregated.  This feature allows us to map any port we wish on the host into listening ports inside containers.

We’re not quite finished yet, but here’s what it should look like so far:

f2f2bf344688292a1fd2f8c3cc353144e47242ac2aa70983c50ad762429371c683d26a024968d4b2?t=7860a1a6a9398c76d79fc3e3280aa893

We need a few more options set yet.  Scroll down to “Advanced container settings” and click the “Volumes” button.  Click “Map additional volume”.  The settings should be:

  • Container (i.e.: the path inside the container): /root/.ollama
  • Volume (i.e.: the volume on our host that we created in the earlier step): Choose “ollama – local” from the drop down

Repeat once more for the second volume – click “Map additional volume”, and make the settings:

  • Container (i.e.: the path inside the container): /app/backend/data
  • Volume (i.e.: the volume on our host that we created in the earlier step): Choose “open-webui – local” from the drop down

 

643455be626a5e9ccc74348670a93f55e15e6d0ee68cf7abdff21aad7ab458affdcb490469cc77e7?t=87098adae096893504259d19d10f9bd7

 

Next, click “Restart policy”.  You can choose from four options:

 

  • Never – if the container is manually stopped, crashes, or the system reboots, this specific container never restarts
  • Always – no matter what happens, the container always restarts
  • On failure – the container only restarts if there’s an internal problem with it (it crashes, for example)
  • Unless stopped – a special option that allows you to manually stop a container, and that state will be remembered even after reboot.  However if the container was running at reboot time, it will be started up again after reboot.

We’ve committed to matching the docker command supplied by the developers, so let’s set ours to “Always:

1e2c3e20317bb6419d32b74a83f12937ae59c036dc00609e0d83b34dbb8543a27219f130890f4135?t=fa94bdad8b9f24653da7bc17fb13a921

 

And finally, the reason we’re all here.  Click the “Runtime and resources” button.   The automation tool that deployed all of this for you was configured to make the Nvidia Container Runtime the default runtime.  You don’t need to select it from the list, and can leave that option as “Default”.  However you can see “nvidia” in the list should you wish to specify that manually for whatever reason.

Scroll down a little to “Enable GPU”, and enable that option.  By default it will select “Use All GPUs”.  If you have ordered a multi-GPU instance, you can decide here if you want to split your GPUs up amongst different containers, or present all GPUs to all containers.  How you configure that is up to you, but also note that not all applications can use multiple GPUs.  You’ll need to consult each application’s documentation individually to see what it can do.

 

The default “capability” options selected are “compute” and “utility”.  These are all we need for our particular container.  But the options on offer are:

 

  • Compute – use the Cuda compute component of the NVIDIA GPU
  • Utility – be able to access metrics and tools such as the NVIDIA command line tool `nvidia-smi` to query the state, status and driver level of the GPU
  • Compat32 – enable legacy 32bit compatibility mode (unlikely to be necessary, as almost all modern tools are 64bit)
  • Video – utilise the onboard transcode ASICs inside the GPU for hardware accelerated video encode / decode / transcode, and various transforms like colour correction, tone mapping, HDR processing, etc
  • Display – access the DRI/DRM display level hardware for producing graphical output, necessary if you were running a graphical application or desktop based on X11 or Wayland, virtual desktop, etc.

 

b2e5c96674df2c86b67c8dda1a7a3fd56c7f77a05b113f67f615b82bcf6bf31e25e221cef421b6f4?t=3ffff6b22a1c74ca83bac929a0c5ab08

 

Once done, scroll back up slightly and you’ll see the “Deploy container” button just above the “Advanced container settings” area we just configured it.  Click it to save our settings and begin pulling the container image to then run!

 

Note that this process can take a while.  Some containers are quite large, and can take some time to download from their respective Internet based container registries.  After a short wait, Portainer will tell us when the container is ready to use. You may see it sitting in the “starting” state for a few seconds.  Once “healthy”, it’s ready to use.

 

ffbe42716d3ea7c5fe08669691cf11107d3b0a758f9d40ba7404f8ede79f442045910448560d8c93?t=424b637c203548206ef9ddaa3692c180

 

Using our Open-WebUI container

We followed the instructions to launch the container and expose it on TCP/3000.    Find your VM’s IP on your product information page:

So, for example if your product was assigned the IP “123.456.789.123”, you could connect to your Open-WebUI service in your browser on “http://123.456.789.123:3000/” .  Note that this is unencrypted, and adding SSL/TLS encryption to this instance is left up to the user.

On first connect, Open-WebUI will ask you to create a new user who will be the administrator user.  Enter in any details you like (feel free to use a fake email address if you want – no checking or validation is done for this local instance), and add a password.  Once done, you’ll be sent to the main screen with a “what’s new” popup.  Dismiss this, and you’ll see the chat interface:

 

3ab7506b9d1fd4098c38902ab378f6171ad82ebc9e508116b88247d1ddb3e7488be37f3a7bf97946?t=a9ea557f541f65ab2a22bc473762c3df

 

What we have here is Open-WebUI (the chat interface we can see) and Ollama (the LLM engine/framework behind the scenes that we can’t see), but no model.   We can find out what models are available on the Ollama website:

 

 

These are all open source models that can be downloaded and used entirely privately.  In this example, I’m going to pick a nice small model, the Deepseek-R1 1.5b parameter model.   You don’t have to use that, and can use anything you like.  But do be aware of the following constraints:

 

  • This model will be downloaded to your VM instance and run locally, which is the whole point of this “private LLM” exercise.  This takes time.  Huge models will take a long time to download.
  • It needs to fit on the disk of your VM.  If you’ve ordered the smallest VM SHARON AI offers at 32GB of space (and factoring in things like the Linux operating system, NVIDIA drivers, Docker runtimes, container images, etc, etc), then you won’t be able to fit some of the larger 400GB models on the hard disk
  • The models need to fit inside the GPU RAM, and have room to process information.  Our H100 fleet are blessed with an impressive 96GB of RAM, but other GPUs may have less.  But in general go for a model that is smaller than your GPU RAM

In the name of experience, I will grab the smallest Deepseek-R1 model:

 

This 1.5 billion parameter model is a mere 1.1GB, and should download quite quickly.  The Ollama site tells us that the ollama command to pull the model down is:

 

ollama run deepseek-r1:1.5b

 

We’re going to take just the model name, “deepseek-r1:1.5b”, and copy that.  Back on our Open-WebUI instance, can can click “Select a model” in the top-left of the screen.  Pasting in our “deepseek-r1:1.5b” text, we see the drop down say “Pull “deepseek-r1:1.5b” from Ollama.com”.  Click that to begin downloading:

 

4a84256fc6a1a787b5a00cf95e6412c45c55a3a9f07c48a8a84595298239afd7feb4a69101acecfb?t=e7cf58d0056c1403c94b35fd92acff39

 

A progress bar will tell us how long that’s expected to take, and a popup will show when complete.  Once again, click on “Select a model” in the top-left area, and now we should see our Deepseek-R1 1.5b model available.  Select it:

 

3aceb0aa84efbaf626efbc84e2a4440779876ad59c9732a67aa190db48a4b6bf8d64c459335bff82?t=8629c4e6fa5b59f7abf4cf439e72c568

 

Start interacting with the LLM by asking it questions.  You can verify that it’s using the GPU by logging on to your VM instance and running the command:

 

sudo nvidia-smi -l

 

This will run NVIDIA’s GPU query tool and refresh the output every second or so to the screen.  You will see the ollama_llama_server running, and how much GPU memory it’s consuming.  While the LLM is doing very little, the GPU load may be low or even 0%.  As you interact with it and ask detailed questions (the more complex and wordy, the better), you will see the GPU utilisation grow.  Other handy outputs include the power draw of the GPU in Watts.

 

ef5811539420dc3d78cd29f0a0f4ac4881f8bfa0e234cc2b5fd58c7b7aaf8d6c4be206e2f1f2ce52?t=19ea3ebff8d49e5a0916645f2fb27117

 

Further exercises

From here, the sky’s the limit.  Some options to try:

 

  • Edit the Open-WebUI/Ollama container to expose the internal Ollama API port, then use the running LLM over that connection to a code editor running on a low powered laptop
  • Try other models and see how they compare for performance and accuracy
  • Remember that this is just a single container!  Millions of different containers are published for free on the Internet.  Try them out across compute-heavy industries like data science, life sciences, engineering, medical research, genomics, remote sensing, computer vision, media processing, visual effects, and countless others.

 

 


Related Articles

News & Updates

Want to learn more?

×
By clicking the "submit" button, you agree to and accept our Terms & Conditions and Privacy Policy .
×
By clicking the "submit" button, you agree to and accept our Terms & Conditions and Privacy Policy .
×
By clicking the "submit" button, you agree to and accept our Terms & Conditions and Privacy Policy .
×
By clicking the "submit" button, you agree to and accept our Terms & Conditions and Privacy Policy .
Calculator Icon

Pricing Calculator

GPU Cost in Seconds

Loading...
-
/ hourly cost

Estimate GPU Cloud Costs Instantly

Calculate your GPU cloud computing costs with our interactive pricing tool.

Billing Type

Product Type

GPU Type

Hardware Configuration

Contract Options

No contract discount applied
best value

GPU Plan Estimate

Hourly Cost
Loading...
-
Total cost per month
Loading...
-
Total cost
Loading...
-

Prices shown include all applicable discounts

Total cost per month
Loading...
-
Total cost
Loading...
-