Run LLMs Locally on Your Own Machine

Introduction

In the realm of artificial intelligence and natural language processing, there are these extraordinary tools known as large language models (LLMs). They possess an almost magical ability to understand context, nuances, and generate remarkably human-like text. The exciting news is that now you have the opportunity to fully utilize LLMs by running them on your own gaming desktop.

Gone are the days of complex setups and reliance on distant clouds. Running LLMs locally on your gaming desktop is now a straightforward process. Thanks to user-friendly installation packages and comprehensive documentation, you can seamlessly dive into the world of AI-driven language processing. Your gaming desktop, once a haven for gaming, can now double as a powerful AI workstation, crunching data with unparalleled efficiency.

Get ready to be amazed by the sheer computing muscle of your gaming desktop! Equipped with top-of-the-line GPUs and CPUs, you can perform rapid inference and effortlessly handle massive datasets and intricate language models. And the most compelling part? You can accomplish all this from the comfort of your own home.

But wait, there’s more! Customization possibilities abound! Fine-tune language models to suit specific domains and tailor them according to your needs. This level of personalization opens up a wealth of opportunities, from content generation to virtual assistants to sentiment analysis, and beyond.

So seize the moment, embrace the AI revolution, and run LLMs locally on your gaming desktop to embark on an exciting journey of language innovation and discovery.

Relevant Hardware

Since running these models are rather hardware expensive and I will be running them locally on my gaming desktop. I have included below my current hardware. You may need to do your own research to determine which models you can run on your hardware. Some models and environments allow “CPU” mode which allows the model to run in VRAP as opposed to RAM; speed would be the trade off but expandability is now possible since you can cheaply add more RAM.

AMD Threadripper 2920x 12 core, 24 thread CPU
128 GB DDR4 Corsair Vengence Pro RAM
Nvidia RTX3070Ti 8GB
EVGA Nvidia GTX 1070Ti FTW 8GB
1TB Samsung M.2 Drive

Oobabooga Installation

Oobabooga is my tool of choice for both running a local LLM but also hosting a local API for testing. It allows for multiple models to be downloaded and loaded into a single application and and a list of tools to help this process even more.

Download oobabooga Text generation webui one click installer from their github
Extract .zip file into anywhere
run start_windows.bat file, by either double clicking the file or running it in the terminal, and answer any prompts it asks about your computer
- I was only asked what manufacturer GPU I had. It asked this so it could download and install required CUDA python packages.
Installation will take approximately 10 minutes

As of July 23, This is the folder structure of the extracted windows installer.

Demonstrating the start_windows.bat script prompting the user for more information regarding their system

Downloading Your Model

For this post I am going to download wizardCoder from hugging face. You can follow this post and also download wizardCoder or choose another model you would like. I will be using this project to explore local code completion and code generation. If you would like a model like chat-GPT, I recommend Vicuna.

You do not want to manually download your model from hugging face. Use the built-in oobabooga command: python download-model.py organization/model. for this command to work you will need a terminal in the text-generation-ui directory.

Python

python download-model.py WizardLM/WizardCoder-15B-V1.0

This is a threaded approach to downloading a model from hugging face which will save you a significant amount of time.

Once running the command you will notice your terminal will look like this. Multiple files or models will be downloading at the same time

Enabling API

Since I want to use this setup on other projects as a playground I need to enable the API that way I can interact with this setup on other machines on the same network.

You will need to open the webui.py file and add the --api flag to CMD_FLAGS

Python

CMD_FLAGS = '--chat --api'

Starting Your Environment

So you have your model downloaded, oobabooga installed and you’re ready to actually get this started? Great! It’s pretty simple, in your file explore just re-run start_windows.bat script.

Then in your browser you can go to http://localhost:7860/?__theme=dark

Setting up Oobabooga

Once you have oobabooga open in your browser you need to load your freshly downloaded model.

Navigate to your “models” tab and select the downloaded model in the drop down then click “Load”.
- You will loading these models can take some time and your system resources will spike. This is expected.
- You can play with some Some settings in order to optimize performance.

INFO:Loaded the model in 101.29 seconds.

Output generated in 106.88 seconds (1.60 tokens/s, 171 tokens, context 33, seed 1738948618)

…And you’re done! Congratulations! You’ve successfully started your LLM journey. Enjoy the ever-expanding world of Large Language Models!