Ollama Integrations

Sharon AI GPU data center and infrastructure visual

Share this

About

Ollama is an open source application that can run a wide range of open source LLMs (Large Language Models), and present them via an easy to use web based communication system.

See more at their website here:

Ollama itself is designed for other software to connect to and utilise the LLMs running within.  The types of software that can make use of these connections vary widely across application type, industry and use case.

To simplify the initial setup, SHARON AI offers a post-install application called “Ollama Integrations” that can get you up and running quickly with Ollama.  A simple open source chatbot interface called Open Web UI is presented as a way to test the LLM is working, as well as download and select new models.

 

Why host your own LLM?

See our KB article titled “The benefits of private LLMs”:

Installation:

Log into your client area, and order a new service.  Select a GPU service (Ollama and supported LLMs can run on CPUs, however they are an order of magnitude faster on GPUs).  Select a recent Ubuntu LTS distro and the “Ollama Integrations” application:

734f64ee958a8b447eec50b6d41789c0cb6b316481455575ac981f6053709cfd3cd3647e7e2f4b37?t=e7082eef497dd0fa0db55e4b4aba004c

Configure the rest of the options to suit your needs, including your disk space, SSH public key, etc.   Note that some models will take considerable disk space, so ensure you allow enough room to store both your operating system and model.  Check the Ollama “models” page to see the models on offer, and the space they require:

NOTE: The password you set here will be applied to the default `ubuntu` user.  We will need this to log in to the tools later.

When happy with your configuration, complete your order process and wait for your virtual machine to start.  This process can take several minutes as the application deployment collects the various applications and drivers necessary. Output can be seen in the files `/var/log/cloud-init.log` and `/var/log/cloud-init-output.log`, and for newer distributions followed via the systemd-journal logger using the command `sudo journalctl -f`.

 

First boot, Open-WebUI

On first boot, once you’ve verified the applications have installed and are available (see the installation notes above on how to verify that), you can log into Open-WebUI to test Ollama and download your first model.

Find your system IP in your SHARON AI billing dashboard, and log into that via a web browser on port 3000.  So if your IP is “123.456.789.123”, your URL would be “http://123.456.789.123:3000“.

 

You’ll be presented with the Open-WebUI new user screen.  Here you can create a completely private and offline user account:

b56f95220b0617226577ecc0d717408d098b633bd851b7de2ea247db16f0c7e31d892c33f98ca16b?t=d2c68948d3075ace0c5b6488e4971970

 

Once created, you’ll be logged in and presented with some information about recent Open-WebUI updates.

To download an LLM, head to the Ollama model search page:

And search for a model that you wish to use.  For this example, I’ll choose Meta’s Llama 3.2 model, and specify the smallest one at 1 billion parameters:

2381f02824f75fe9612a251dfa77ba6403770cf4aab1034a78385330326c43e6f2294f339d689810?t=9efe37f13fca19d0f68ae13e47a73a6f

 

It tells me that I would run it via the command line with the command “ollama run llama3.2:1b”.  However I’ll download this via the Open-WebUI interface, simply by copying the “llama3.2:1b” string, and in the top-left corner of Open-WebUI, choosing “select a model” and pasting it in there.

 

cb57eafba212be3b7f61bb819453a83982515924e15f0c714d9e66394cbfacfea02de7be60b467e7?t=a7ed69ff8abf7c6e851e792e72cd23a3

I’ll be given the option to Pull the model from ollama.com, and on doing so Open-WebUI will download and install that model for me.   This particular model is tiny at only 1.3GB (some models can exceed 400GB), so it will download and install quickly.

Once installed, if I again click “select a model”, this time the installed model will show up:

 

80bdf1693d9aa0f1c0f85e3a5554c6b8318a4135db9c8a14614badde31176db53417704d925d387c?t=bbc5a54403fce0b448f769485195ba9e

 

I can select it, and ask the model to identify itself to prove it’s working.  Hovering over the “i” information icon below the response will also give you some interesting statistics about the processing speed and number of tokens that the request and answer took, which is a great way to compare CPUs and GPUs:

bbca5242bf24183f93628c13793e764f09c62244229ad81cc8b56a2faa723d7c1c4da5cacfe96cc3?t=eaea091b05b1df75ae5470f0a7c9e6e1

 

 

Next steps

From here, you can start integrating other tools into Ollama.  See our guides for the following examples:

 

  • Integrate Ollama with Microsoft VSCode as a private alternative to Copilot
    • URL
  • Integrate Ollama with an in-browser assistant as a private website summary tool:
    • URL
  • Integrate Ollama with n8n, the enterprise open source process and workflow automation tool:
    • URL

 


Share via 

Related Articles

News & Updates

Want to learn more?

×
By clicking the "submit" button, you agree to and accept our Terms & Conditions and Privacy Policy .
×
By clicking the "submit" button, you agree to and accept our Terms & Conditions and Privacy Policy .
×
By clicking the "submit" button, you agree to and accept our Terms & Conditions and Privacy Policy .
×
By clicking the "submit" button, you agree to and accept our Terms & Conditions and Privacy Policy .
Calculator Icon

Pricing Calculator

GPU Cost in Seconds

Loading...
-
/ hourly cost

Estimate GPU Cloud Costs Instantly

Calculate your GPU cloud computing costs with our interactive pricing tool.

Billing Type

Product Type

GPU Type

Hardware Configuration

Contract Options

No contract discount applied
best value

GPU Plan Estimate

Hourly Cost
Loading...
-
Total cost per month
Loading...
-
Total cost
Loading...
-

Prices shown include all applicable discounts

Total cost per month
Loading...
-
Total cost
Loading...
-