summaryrefslogtreecommitdiffstats
path: root/docs/local.md
blob: 2cedd1a947e1019220fdea0ae3487bec3796fd98 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164

### G4F - Local Usage Guide
 

### Table of Contents
1. [Introduction](#introduction)
2. [Required Dependencies](#required-dependencies)
3. [Basic Usage Example](#basic-usage-example)
4. [Supported Models](#supported-models)
5. [Performance Considerations](#performance-considerations)
6. [Troubleshooting](#troubleshooting)

#### Introduction
This guide explains how to use g4f to run language models locally. G4F (GPT4Free) allows you to interact with various language models on your local machine, providing a flexible and private solution for natural language processing tasks.

## Usage
 
#### Local inference
How to use g4f to run language models locally
  
#### Required dependencies
**Make sure to install the required dependencies by running:**
```bash
pip install g4f[local]
```
or
```bash
pip install -U gpt4all
```

  

#### Basic usage example 
```python
from g4f.local import LocalClient

client   = LocalClient()
response = client.chat.completions.create(
    model    = 'orca-mini-3b',
    messages = [{"role": "user", "content": "hi"}],
    stream   = True
)

for token in response:
    print(token.choices[0].delta.content or "")
```

Upon first use, there will be a prompt asking you if you wish to download the model. If you respond with `y`, g4f will go ahead and download the model for you.

You can also manually place supported models into `./g4f/local/models/`
  

**You can get a list of the current supported models by running:**
```python
from g4f.local import LocalClient

client   = LocalClient()
client.list_models()
``` 

```json
{
    "mistral-7b": {
        "path": "mistral-7b-openorca.gguf2.Q4_0.gguf",
        "ram": "8",
        "prompt": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n",
        "system": "<|im_start|>system\nYou are MistralOrca, a large language model trained by Alignment Lab AI. For multi-step problems, write out your reasoning for each step.\n<|im_end|>"
    },
    "mistral-7b-instruct": {
        "path": "mistral-7b-instruct-v0.1.Q4_0.gguf",
        "ram": "8",
        "prompt": "[INST] %1 [/INST]",
        "system": None
    },
    "gpt4all-falcon": {
        "path": "gpt4all-falcon-newbpe-q4_0.gguf",
        "ram": "8",
        "prompt": "### Instruction:\n%1\n### Response:\n",
        "system": None
    },
    "orca-2": {
        "path": "orca-2-13b.Q4_0.gguf",
        "ram": "16",
        "prompt": None,
        "system": None
    },
    "wizardlm-13b": {
        "path": "wizardlm-13b-v1.2.Q4_0.gguf",
        "ram": "16",
        "prompt": None,
        "system": None
    },
    "nous-hermes-llama2": {
        "path": "nous-hermes-llama2-13b.Q4_0.gguf",
        "ram": "16",
        "prompt": "### Instruction:\n%1\n### Response:\n",
        "system": None
    },
    "gpt4all-13b-snoozy": {
        "path": "gpt4all-13b-snoozy-q4_0.gguf",
        "ram": "16",
        "prompt": None,
        "system": None
    },
    "mpt-7b-chat": {
        "path": "mpt-7b-chat-newbpe-q4_0.gguf",
        "ram": "8",
        "prompt": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n",
        "system": "<|im_start|>system\n- You are a helpful assistant chatbot trained by MosaicML.\n- You answer questions.\n- You are excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.\n- You are more than just an information source, you are also able to write poetry, short stories, and make jokes.<|im_end|>"
    },
    "orca-mini-3b": {
        "path": "orca-mini-3b-gguf2-q4_0.gguf",
        "ram": "4",
        "prompt": "### User:\n%1\n### Response:\n",
        "system": "### System:\nYou are an AI assistant that follows instruction extremely well. Help as much as you can.\n\n"
    },
    "replit-code-3b": {
        "path": "replit-code-v1_5-3b-newbpe-q4_0.gguf",
        "ram": "4",
        "prompt": "%1",
        "system": None
    },
    "starcoder": {
        "path": "starcoder-newbpe-q4_0.gguf",
        "ram": "4",
        "prompt": "%1",
        "system": None
    },
    "rift-coder-7b": {
        "path": "rift-coder-v0-7b-q4_0.gguf",
        "ram": "8",
        "prompt": "%1",
        "system": None
    },
    "all-MiniLM-L6-v2": {
        "path": "all-MiniLM-L6-v2-f16.gguf",
        "ram": "1",
        "prompt": None,
        "system": None
    },
    "mistral-7b-german": {
        "path": "em_german_mistral_v01.Q4_0.gguf",
        "ram": "8",
        "prompt": "USER: %1 ASSISTANT: ",
        "system": "Du bist ein hilfreicher Assistent. "
    }
}
```

#### Performance Considerations
**When running language models locally, consider the following:**
   - RAM requirements vary by model size (see the 'ram' field in the model list).
   - CPU/GPU capabilities affect inference speed.
   - Disk space is needed to store the model files.

#### Troubleshooting
**Common issues and solutions:**
   1. **Model download fails**: Check your internet connection and try again.
   2. **Out of memory error**: Choose a smaller model or increase your system's RAM.
   3. **Slow inference**: Consider using a GPU or a more powerful CPU.



[Return to Home](/)