Models Management
Last updated
Was this helpful?
Last updated
Was this helpful?
Sanctum lets you run open-source large language models (LLMs) on your computer using either the CPU or GPU.
Sanctum makes it easy to discover and download models via its built-in manager, integrated with . Visit Models > Featured for top picks, or Models > Explore to browse all GGUF models.
You'll see different versions of models, each with varying resource requirements. Sanctum highlights compatible models with a green checkmark and shows details like memory needs, disk space, and popularity to help you choose.
Need to change where models are stored? Go to Settings > Storage and select "Change Folder" to update the directory. This allows you to easily organize model storage or move them to a different drive.
To manage your downloaded models, head to Models > My Models. You can remove individual models by clicking the trash bin icon. Alternatively, go to Settings > Storage to remove all models at once if you need a clean slate.
GGUF (GPT-Generated Unified Format) is an optimized format to run large language models on standard CPUs, making AI accessible without specialized hardware. Key features:
CPU Optimization: Runs models smoothly on desktop CPUs, with optional GPU support.
Reduced Resource Usage: Uses quantization for efficiency.
Portability: Minimal dependencies allow use across systems.
This makes transformer models available locally, without relying on the cloud.
In regular mode, you can configure the following settings:
Model Preset: Select from predefined configurations optimized for different models.
Enable GPU: Boost performance for computationally intensive tasks.
GPU Layers: Control how many layers of the model are processed on the GPU, allowing you to balance performance and resource usage.
For more advanced customization, turn on the dev mode. And check the Advanced Model Settings for further instructions.
Context Length: Set the maximum number of tokens (words, characters, or parts of words) the model can consider from the conversation history when generating a response. Keep in mind that longer context lengths may slow down .