# Models Management

Sanctum lets you run open-source large language models (LLMs) on your computer using either the CPU or GPU.

### How to choose and download a model?

Sanctum makes it easy to discover and download models via its built-in manager, integrated with [Hugging Face](https://huggingface.co/). Visit **Models > Featured** for top picks, or **Models > Explore** to browse all GGUF models.

You'll see different versions of models, each with varying resource requirements. Sanctum highlights compatible models with a green checkmark and shows details like memory needs, disk space, and popularity to help you choose.

<div align="left"><figure><img src="/files/EpKBrOtAjsKZC5k2qyn0" alt="" width="563"><figcaption></figcaption></figure></div>

### How to change the models directory?

Need to change where models are stored? Go to Settings > Storage and select "Change Folder" to update the directory. This allows you to easily organize model storage or move them to a different drive.

<div align="left"><figure><img src="/files/8Gm7LDsOOzWCTdYDs1ZU" alt="" width="563"><figcaption></figcaption></figure></div>

### How to remove model?

To manage your downloaded models, head to Models > My Models. You can remove individual models by clicking the trash bin icon. Alternatively, go to Settings > Storage to remove all models at once if you need a clean slate.

<div align="left"><figure><img src="/files/h2xoZk4H27uP82iKd9NN" alt="" width="563"><figcaption></figcaption></figure></div>

### What is GGUF?

GGUF (GPT-Generated Unified Format) is an optimized format to run large language models on standard CPUs, making AI accessible without specialized hardware. Key features:

* **CPU Optimization**: Runs models smoothly on desktop CPUs, with optional GPU support.
* **Reduced Resource Usage**: Uses quantization for efficiency.
* **Portability**: Minimal dependencies allow use across systems.

This makes transformer models available locally, without relying on the cloud.

### How to customize model?

In regular mode, you can configure the following settings:

* **Model Preset:** Select from predefined configurations optimized for different models.
* **Enable GPU:** Boost performance for computationally intensive tasks.
* **GPU Layers:** Control how many layers of the model are processed on the GPU, allowing you to balance performance and resource usage.
* **Context Length:** Set the maximum number of tokens (words, characters, or parts of words) the model can consider from the conversation history when generating a response. Keep in mind that longer context lengths may slow down [model performance](/models/model-performance.md).

For more advanced customization, turn on the dev mode. And check the [Advanced Model Settings](/dev-mode/advanced-model-settings.md) for further instructions.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.sanctum.ai/models/models-management.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
