🛠️Advanced Model Settings

Customize the model's behavior to meet your specific needs by configuring essential parameters such as response length, randomness, and repetition penalties. Take control of performance by fine-tuning options for GPU acceleration, sampling methods, and output diversity.

To access advanced model settings, activate Developer Mode using the toggle located in the bottom-left corner. Then, proceed to the chat page and open the right sidebar to configure the settings.

The table below explains each parameter in detail, helping you balance focus, consistency, and creativity in your model's responses.

You can click the Reset button at any time to restore parameters to their default values.

Parameter

Description

Model preset

Set of optimized settings for various model types or tasks. Select from standard presets or use your custom configuration.

Custom instruction prompt

Prompt to provide specific context and instructions to the model.

Prompt template

A prompt template is a predefined structure that guides how the AI model generates responses. It acts as a framework that the model uses to shape and expand its outputs. These templates often contain placeholders and specific instructions that help control how the model formats its responses.

Stop strings

Words or phrases that signal the model to stop further generation.

GPU acceleration (ngl)

Enables GPU processing for reduced response times and improved throughput.

CPU threads (threads)

The number of model layers to load onto the GPU for processing.

CPU threads batch (threads_batch)

The size of the workload processed simultaneously on each CPU thread. A larger batch size may improve processing speed at the cost of increased memory usage.

Response length (n_predict)

The maximum number of tokens the model will generate in a single response. Leave empty or set to -1 for the default setting.

Output randomness (temp)

Predictability of the model's responses. Set lower for more consistent results, higher for more varied. The default value is 0.8.

Frequency penalty (frequency_penalty)

Repetition in the model's output. A higher value increases the penalty on repeated terms, promoting more diverse language. The default value is 1.1.

Presence penalty (presence_penalty)

Penalizes tokens based on their presence in the generated text. The higher the value, the less likely it is that the same words or ideas will be used again, which encourages novelty in answers.

Repeat penalty

Model penalization for repeating the same information. Set higher for greater novelty.

Top-K sampling (top_k)

Defines how many of the most likely next tokens are considered for generating text. A higher number increases diversity but may introduce less likely words. Default is 40.

Top-P sampling (top_p)

Determines the variety of the generated text by only considering tokens with a cumulative probability up to a certain threshold. A higher threshold encourages more diverse outputs, while a lower one keeps the text focused. Defaults to 0.95.

Context length

The maximum number of tokens (words, characters, or parts of words) a model can consider from the conversation history when generating a response.

Frequency base (rope_freq_base)

The base rate for applying repetition penalties. It influences the starting point of penalization.

Frequency scale (rope_freq_scale)

The scaling factor for token repetition penalties. Higher values increase penalties on frequent tokens.

PreviousToolbox NextLocal Server

Last updated 7 months ago

Was this helpful?