zephyr-7b-beta-awq Beta
Text Generation • theblokeZephyr 7B Beta AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Zephyr model variant.
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
Worker - Streaming
 export interface Env {  AI: Ai;}
export default {  async fetch(request, env): Promise<Response> {
    const messages = [      { role: "system", content: "You are a friendly assistant" },      {        role: "user",        content: "What is the origin of the phrase Hello, World",      },    ];
    const stream = await env.AI.run("@hf/thebloke/zephyr-7b-beta-awq", {      messages,      stream: true,    });
    return new Response(stream, {      headers: { "content-type": "text/event-stream" },    });  },} satisfies ExportedHandler<Env>;Worker
 export interface Env {  AI: Ai;}
export default {  async fetch(request, env): Promise<Response> {
    const messages = [      { role: "system", content: "You are a friendly assistant" },      {        role: "user",        content: "What is the origin of the phrase Hello, World",      },    ];    const response = await env.AI.run("@hf/thebloke/zephyr-7b-beta-awq", { messages });
    return Response.json(response);  },} satisfies ExportedHandler<Env>;Python
 import osimport requests
ACCOUNT_ID = "your-account-id"AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post(  f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@hf/thebloke/zephyr-7b-beta-awq",    headers={"Authorization": f"Bearer {AUTH_TOKEN}"},    json={      "messages": [        {"role": "system", "content": "You are a friendly assistant"},        {"role": "user", "content": prompt}      ]    })result = response.json()print(result)curl
 curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@hf/thebloke/zephyr-7b-beta-awq \  -X POST \  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \  -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'Parameters
* indicates a required field
Input
-  Prompt object       -  prompt *string min 1 max 131072The input text prompt for the model to generate a response. 
-  rawbooleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting. 
-  streambooleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events. 
-  max_tokensinteger default 256The maximum number of tokens to generate in the response. 
-  temperaturenumber default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results. 
-  top_pnumber min 0 max 2Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. 
-  top_kinteger min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises. 
-  seedinteger min 1 max 9999999999Random seed for reproducibility of the generation. 
-  repetition_penaltynumber min 0 max 2Penalty for repeated tokens; higher values discourage repetition. 
-  frequency_penaltynumber min 0 max 2Decreases the likelihood of the model repeating the same lines verbatim. 
-  presence_penaltynumber min 0 max 2Increases the likelihood of the model introducing new topics. 
-  lorastringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model. 
 
-  
-  Messages object       -  messages *arrayAn array of message objects representing the conversation history. -  itemsobject-  role *stringThe role of the message sender (e.g., 'user', 'assistant', 'system', 'tool'). 
-  content *string max 131072The content of the message as a string. 
 
-  
 
-  
-  functionsarray-  itemsobject-  name *string
-  code *string
 
-  
 
-  
-  toolsarrayA list of tools available for the assistant to use. -  itemsone of-  0object-  name *stringThe name of the tool. More descriptive the better. 
-  description *stringA brief description of what the tool does. 
-  parameters *objectSchema defining the parameters accepted by the tool. -  type *stringThe type of the parameters object (usually 'object'). 
-  requiredarrayList of required parameter names. -  itemsstring
 
-  
-  properties *objectDefinitions of each parameter. -  additionalPropertiesobject-  type *stringThe data type of the parameter. 
-  description *stringA description of the expected parameter. 
 
-  
 
-  
 
-  
 
-  
-  1object-  type *stringSpecifies the type of tool (e.g., 'function'). 
-  function *objectDetails of the function tool. -  name *stringThe name of the function. 
-  description *stringA brief description of what the function does. 
-  parameters *objectSchema defining the parameters accepted by the function. -  type *stringThe type of the parameters object (usually 'object'). 
-  requiredarrayList of required parameter names. -  itemsstring
 
-  
-  properties *objectDefinitions of each parameter. -  additionalPropertiesobject-  type *stringThe data type of the parameter. 
-  description *stringA description of the expected parameter. 
 
-  
 
-  
 
-  
 
-  
 
-  
 
-  
 
-  
-  streambooleanIf true, the response will be streamed back incrementally. 
-  max_tokensinteger default 256The maximum number of tokens to generate in the response. 
-  temperaturenumber default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results. 
-  top_pnumber min 0 max 2Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. 
-  top_kinteger min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises. 
-  seedinteger min 1 max 9999999999Random seed for reproducibility of the generation. 
-  repetition_penaltynumber min 0 max 2Penalty for repeated tokens; higher values discourage repetition. 
-  frequency_penaltynumber min 0 max 2Decreases the likelihood of the model repeating the same lines verbatim. 
-  presence_penaltynumber min 0 max 2Increases the likelihood of the model introducing new topics. 
 
-  
Output
-  0object-  responsestringThe generated text response from the model 
-  usageobjectUsage statistics for the inference request -  prompt_tokensnumber 0Total number of tokens in input 
-  completion_tokensnumber 0Total number of tokens in output 
-  total_tokensnumber 0Total number of input and output tokens 
 
-  
-  tool_callsarrayAn array of tool calls requests made during the response generation -  itemsobject-  argumentsobjectThe arguments passed to be passed to the tool call request 
-  namestringThe name of the tool to be called 
 
-  
 
-  
 
-  
-  1string
API Schemas
The following schemas are based on JSON Schema
{    "type": "object",    "oneOf": [        {            "title": "Prompt",            "properties": {                "prompt": {                    "type": "string",                    "minLength": 1,                    "maxLength": 131072,                    "description": "The input text prompt for the model to generate a response."                },                "raw": {                    "type": "boolean",                    "default": false,                    "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."                },                "stream": {                    "type": "boolean",                    "default": false,                    "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."                },                "max_tokens": {                    "type": "integer",                    "default": 256,                    "description": "The maximum number of tokens to generate in the response."                },                "temperature": {                    "type": "number",                    "default": 0.6,                    "minimum": 0,                    "maximum": 5,                    "description": "Controls the randomness of the output; higher values produce more random results."                },                "top_p": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."                },                "top_k": {                    "type": "integer",                    "minimum": 1,                    "maximum": 50,                    "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."                },                "seed": {                    "type": "integer",                    "minimum": 1,                    "maximum": 9999999999,                    "description": "Random seed for reproducibility of the generation."                },                "repetition_penalty": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Penalty for repeated tokens; higher values discourage repetition."                },                "frequency_penalty": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Decreases the likelihood of the model repeating the same lines verbatim."                },                "presence_penalty": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Increases the likelihood of the model introducing new topics."                },                "lora": {                    "type": "string",                    "description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model."                }            },            "required": [                "prompt"            ]        },        {            "title": "Messages",            "properties": {                "messages": {                    "type": "array",                    "description": "An array of message objects representing the conversation history.",                    "items": {                        "type": "object",                        "properties": {                            "role": {                                "type": "string",                                "description": "The role of the message sender (e.g., 'user', 'assistant', 'system', 'tool')."                            },                            "content": {                                "type": "string",                                "maxLength": 131072,                                "description": "The content of the message as a string."                            }                        },                        "required": [                            "role",                            "content"                        ]                    }                },                "functions": {                    "type": "array",                    "items": {                        "type": "object",                        "properties": {                            "name": {                                "type": "string"                            },                            "code": {                                "type": "string"                            }                        },                        "required": [                            "name",                            "code"                        ]                    }                },                "tools": {                    "type": "array",                    "description": "A list of tools available for the assistant to use.",                    "items": {                        "type": "object",                        "oneOf": [                            {                                "properties": {                                    "name": {                                        "type": "string",                                        "description": "The name of the tool. More descriptive the better."                                    },                                    "description": {                                        "type": "string",                                        "description": "A brief description of what the tool does."                                    },                                    "parameters": {                                        "type": "object",                                        "description": "Schema defining the parameters accepted by the tool.",                                        "properties": {                                            "type": {                                                "type": "string",                                                "description": "The type of the parameters object (usually 'object')."                                            },                                            "required": {                                                "type": "array",                                                "description": "List of required parameter names.",                                                "items": {                                                    "type": "string"                                                }                                            },                                            "properties": {                                                "type": "object",                                                "description": "Definitions of each parameter.",                                                "additionalProperties": {                                                    "type": "object",                                                    "properties": {                                                        "type": {                                                            "type": "string",                                                            "description": "The data type of the parameter."                                                        },                                                        "description": {                                                            "type": "string",                                                            "description": "A description of the expected parameter."                                                        }                                                    },                                                    "required": [                                                        "type",                                                        "description"                                                    ]                                                }                                            }                                        },                                        "required": [                                            "type",                                            "properties"                                        ]                                    }                                },                                "required": [                                    "name",                                    "description",                                    "parameters"                                ]                            },                            {                                "properties": {                                    "type": {                                        "type": "string",                                        "description": "Specifies the type of tool (e.g., 'function')."                                    },                                    "function": {                                        "type": "object",                                        "description": "Details of the function tool.",                                        "properties": {                                            "name": {                                                "type": "string",                                                "description": "The name of the function."                                            },                                            "description": {                                                "type": "string",                                                "description": "A brief description of what the function does."                                            },                                            "parameters": {                                                "type": "object",                                                "description": "Schema defining the parameters accepted by the function.",                                                "properties": {                                                    "type": {                                                        "type": "string",                                                        "description": "The type of the parameters object (usually 'object')."                                                    },                                                    "required": {                                                        "type": "array",                                                        "description": "List of required parameter names.",                                                        "items": {                                                            "type": "string"                                                        }                                                    },                                                    "properties": {                                                        "type": "object",                                                        "description": "Definitions of each parameter.",                                                        "additionalProperties": {                                                            "type": "object",                                                            "properties": {                                                                "type": {                                                                    "type": "string",                                                                    "description": "The data type of the parameter."                                                                },                                                                "description": {                                                                    "type": "string",                                                                    "description": "A description of the expected parameter."                                                                }                                                            },                                                            "required": [                                                                "type",                                                                "description"                                                            ]                                                        }                                                    }                                                },                                                "required": [                                                    "type",                                                    "properties"                                                ]                                            }                                        },                                        "required": [                                            "name",                                            "description",                                            "parameters"                                        ]                                    }                                },                                "required": [                                    "type",                                    "function"                                ]                            }                        ]                    }                },                "stream": {                    "type": "boolean",                    "default": false,                    "description": "If true, the response will be streamed back incrementally."                },                "max_tokens": {                    "type": "integer",                    "default": 256,                    "description": "The maximum number of tokens to generate in the response."                },                "temperature": {                    "type": "number",                    "default": 0.6,                    "minimum": 0,                    "maximum": 5,                    "description": "Controls the randomness of the output; higher values produce more random results."                },                "top_p": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Controls the creativity of the AI's responses by adjusting how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."                },                "top_k": {                    "type": "integer",                    "minimum": 1,                    "maximum": 50,                    "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."                },                "seed": {                    "type": "integer",                    "minimum": 1,                    "maximum": 9999999999,                    "description": "Random seed for reproducibility of the generation."                },                "repetition_penalty": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Penalty for repeated tokens; higher values discourage repetition."                },                "frequency_penalty": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Decreases the likelihood of the model repeating the same lines verbatim."                },                "presence_penalty": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Increases the likelihood of the model introducing new topics."                }            },            "required": [                "messages"            ]        }    ]}{    "oneOf": [        {            "type": "object",            "contentType": "application/json",            "properties": {                "response": {                    "type": "string",                    "description": "The generated text response from the model"                },                "usage": {                    "type": "object",                    "description": "Usage statistics for the inference request",                    "properties": {                        "prompt_tokens": {                            "type": "number",                            "description": "Total number of tokens in input",                            "default": 0                        },                        "completion_tokens": {                            "type": "number",                            "description": "Total number of tokens in output",                            "default": 0                        },                        "total_tokens": {                            "type": "number",                            "description": "Total number of input and output tokens",                            "default": 0                        }                    }                },                "tool_calls": {                    "type": "array",                    "description": "An array of tool calls requests made during the response generation",                    "items": {                        "type": "object",                        "properties": {                            "arguments": {                                "type": "object",                                "description": "The arguments passed to be passed to the tool call request"                            },                            "name": {                                "type": "string",                                "description": "The name of the tool to be called"                            }                        }                    }                }            }        },        {            "type": "string",            "contentType": "text/event-stream",            "format": "binary"        }    ]}