How to start a local inference for your model

Download the model

Download a (small) model from Good oprtions are: * granite-7b-lab-Q4_K_M.gguf * Phi-3-mini-4k-instruct-q4.gguf

The model need to be in gguf format. You have 2 option

  1. Filter for GGUF models in hugging face
  2. Download in HF format and convert it to gguf

running Lllama cpp image

There is a script named in the same directory of this document. The script run the model inference server

It reads the model path and model name from the environment variables, define these variables before running the script

export MODEL_PATH=/absolute_path/to/your/model

export POD_CONTAINER=podman | docker

export MODEL_NAME=model_name.gguf


If a .env file is present, it will read the environment variables from there The .env file should be in the same directory as the script

Rename the .env.example file to .env and set the environment variables