Inference parameters

# threadsContext sizeMax generated tokensTemperature

Custom models

smollm2-360m-instruct-q8_0.gguf
HF repo: ngxson/SmolLM2-360M-Instruct-Q8_0-GGUF
Size: 368.5 MB

llama-3.2-1b-instruct-q4_k_m.gguf
HF repo: hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF
Size: 770.3 MB

qwen2-1_5b-instruct-q4_k_m-(shards).gguf
HF repo: ngxson/wllama-split-models
Size: 940.4 MB

smollm2-1.7b-instruct-q4_k_m.gguf
HF repo: ngxson/SmolLM2-1.7B-Instruct-Q4_K_M-GGUF
Size: 1006.7 MB

gemma-2-2b-it-abliterated-Q4_K_M-(shards).gguf
HF repo: ngxson/wllama-split-models
Size: 1.6 GB

neuralreyna-mini-1.8b-v0.3.q4_k_m-(shards).gguf
HF repo: ngxson/wllama-split-models
Size: 1.1 GB

Phi-3.1-mini-128k-instruct-Q3_K_M-(shards).gguf
HF repo: ngxson/wllama-split-models
Size: 1.8 GB

meta-llama-3.1-8b-instruct-abliterated.Q2_K-(shards).gguf
HF repo: ngxson/wllama-split-models
Size: 3.0 GB

Big model size, may not be able to load due to RAM limitation

Meta-Llama-3.1-8B-Instruct-Q2_K-(shards).gguf
HF repo: ngxson/wllama-split-models
Size: 3.0 GB

Big model size, may not be able to load due to RAM limitation