OpenAI Text Generation Model Guide
- 861Words
- 4Minutes
- 27 Jun, 2024
OpenAI’s text generation models (often referred to as Generative Pre-trained Transformers or Large Language Models) are trained to understand natural language, code, and images. These models generate text outputs based on inputs, commonly known as “prompts.” Designing prompts is essentially how you “program” large language models, typically by providing instructions.
Application Scenarios
Using OpenAI’s text generation models, you can build the following applications:
- Drafting documents
- Writing computer code
- Answering questions about knowledge bases
- Analyzing text
- Providing natural language interfaces for software
- Tutoring in various disciplines
- Translating languages
- Simulating game characters
With the release of gpt-4-turbo, you can now also build systems that process and understand images.
How to Use the Models
To use these models via the OpenAI API, you need to send a request containing the input and API key and receive a response containing the model’s output. The latest models, such as gpt-4o and gpt-4, can be accessed via the Chat Completions API.
Models and APIs
Model | API |
---|---|
New Models (2023 and later) | gpt-4o, gpt-4, gpt-3.5-turbo https://api.openai.com/v1/chat/completions |
Updated Legacy Models (2023) | gpt-3.5-turbo-instruct, babbage-002, davinci-002 https://api.openai.com/v1/completions |
You can experiment with various models in the Chat Playground. If unsure which model to use, you can start with gpt-3.5-turbo or gpt-4o.
Chat Completions API
Chat models accept a series of messages as input and return messages generated by the model as output. Although the chat format is designed for multi-turn conversations, it is equally suitable for single-turn tasks without any dialogue.
Example Call
1from openai import OpenAI2client = OpenAI()3
4response = client.chat.completions.create(5 model="gpt-3.5-turbo",6 messages=[7 {"role": "system", "content": "You are a helpful assistant."},8 {"role": "user", "content": "Who won the world series in 2020?"},9 {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},10 {"role": "user", "content": "Where was it played?"}11 ]12)
The main input is the messages
parameter. Messages must be an array of message objects, each with a role (“system,” “user,” or “assistant”) and content. Conversations can be as short as one message or involve multiple turns.
JSON Mode
A common usage is to instruct the model to always return a JSON object suitable for your use case by specifying this in the system message. You can enable JSON mode by setting response_format
to { "type": "json_object" }
.
Example Call
1from openai import OpenAI2client = OpenAI()3
4response = client.chat.completions.create(5 model="gpt-3.5-turbo-0125",6 response_format={ "type": "json_object" },7 messages=[8 {"role": "system", "content": "You are a helpful assistant designed to output JSON."},9 {"role": "user", "content": "Who won the world series in 2020?"}10 ]11)12print(response.choices[0].message.content)
In this example, the response contains a JSON object as follows:
1"content": "{\"winner\": \"Los Angeles Dodgers\"}"
Managing Tokens
Language models read and write text in chunks called tokens. In English, a token can be as short as one character or as long as one word (e.g., “a” or “apple”). The total number of tokens in an API call affects:
- The cost of the API call, as you pay per token
- The time of the API call, as writing more tokens takes more time
- Whether the API call succeeds, as the total number of tokens must be below the model’s maximum limit (4,097 tokens for gpt-3.5-turbo)
Parameter Details
Frequency and Presence Penalties
The frequency
and presence
penalty parameters can be used to reduce the likelihood of sampling repetitive token sequences. Reasonable penalty coefficient values are around 0.1 to 1 if the goal is to slightly reduce repetitive samples. If the goal is to strongly suppress repetition, the coefficient can be increased to 2, but this may significantly reduce sample quality.
Completions API
The completions
API endpoint received its final update in July 2023 and has a different interface from the new chat completions
endpoint. The input is a free-form text string called a prompt.
Example Call
1from openai import OpenAI2client = OpenAI()3
4response = client.completions.create(5 model="gpt-3.5-turbo-instruct",6 prompt="Write a tagline for an ice cream shop."7)
Chat Completions vs. Completions
The Chat Completions
format can be made similar to the completions
format by constructing a request with a single user message. The difference lies in the availability of underlying models. The chat completions
API is the interface for the most powerful models (gpt-4o) and the most cost-effective models (gpt-3.5-turbo).
Which Model to Use?
We generally recommend using gpt-4o or gpt-3.5-turbo. Which one to use depends on the complexity of the task you are using the model for. gpt-4o performs better in a wide range of evaluations, especially in carefully following complex instructions. In contrast, gpt-3.5-turbo is more likely to follow only part of complex multi-part instructions. gpt-4o is less likely to hallucinate information, a behavior known as “hallucination.” gpt-4o also has a larger context window, up to 128,000 tokens, compared to 4,096 tokens for gpt-3.5-turbo. However, gpt-3.5-turbo has lower latency in returning outputs and is much cheaper per token.
Frequently Asked Questions
How to Set the Temperature Parameter?
Lower temperature values (e.g., 0.2) produce more consistent outputs, while higher temperature values (e.g., 1.0) generate more diverse and creative results. Choose a temperature value based on your specific application to balance coherence and creativity. The temperature range is from 0 to 2.
Can the Latest Models Be Fine-Tuned?
Currently, gpt-3.5-turbo and base models (babbage-002 and davinci-002) can be fine-tuned.
Should I Use ChatGPT or the API?
ChatGPT provides a chat interface for the models and comes with a range of built-in features such as integrated browsing, code execution, plugins, etc. In contrast, using OpenAI’s API offers more flexibility but requires writing code or programmatically sending requests to the model.