OpenAI Text Generation Model Guide

861Words
4Minutes
27 Jun, 2024
- AI

OpenAI’s text generation models (often referred to as Generative Pre-trained Transformers or Large Language Models) are trained to understand natural language, code, and images. These models generate text outputs based on inputs, commonly known as “prompts.” Designing prompts is essentially how you “program” large language models, typically by providing instructions.

Application Scenarios

Using OpenAI’s text generation models, you can build the following applications:

Drafting documents
Writing computer code
Answering questions about knowledge bases
Analyzing text
Providing natural language interfaces for software
Tutoring in various disciplines
Translating languages
Simulating game characters

With the release of gpt-4-turbo, you can now also build systems that process and understand images.

How to Use the Models

To use these models via the OpenAI API, you need to send a request containing the input and API key and receive a response containing the model’s output. The latest models, such as gpt-4o and gpt-4, can be accessed via the Chat Completions API.

Models and APIs

Model	API
New Models (2023 and later)	gpt-4o, gpt-4, gpt-3.5-turbo https://api.openai.com/v1/chat/completions
Updated Legacy Models (2023)	gpt-3.5-turbo-instruct, babbage-002, davinci-002 https://api.openai.com/v1/completions

You can experiment with various models in the Chat Playground. If unsure which model to use, you can start with gpt-3.5-turbo or gpt-4o.

Chat Completions API

Chat models accept a series of messages as input and return messages generated by the model as output. Although the chat format is designed for multi-turn conversations, it is equally suitable for single-turn tasks without any dialogue.

Example Call

1
from openai import OpenAI
2
client = OpenAI()
3

4
response = client.chat.completions.create(
5
  model="gpt-3.5-turbo",
6
  messages=[
7
    {"role": "system", "content": "You are a helpful assistant."},
8
    {"role": "user", "content": "Who won the world series in 2020?"},
9
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
10
    {"role": "user", "content": "Where was it played?"}
11
  ]
12
)

The main input is the messages parameter. Messages must be an array of message objects, each with a role (“system,” “user,” or “assistant”) and content. Conversations can be as short as one message or involve multiple turns.

JSON Mode

A common usage is to instruct the model to always return a JSON object suitable for your use case by specifying this in the system message. You can enable JSON mode by setting response_format to { "type": "json_object" }.

Example Call

1
from openai import OpenAI
2
client = OpenAI()
3

4
response = client.chat.completions.create(
5
  model="gpt-3.5-turbo-0125",
6
  response_format={ "type": "json_object" },
7
  messages=[
8
    {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
9
    {"role": "user", "content": "Who won the world series in 2020?"}
10
  ]
11
)
12
print(response.choices[0].message.content)

In this example, the response contains a JSON object as follows:

1
"content": "{\"winner\": \"Los Angeles Dodgers\"}"

Managing Tokens

Language models read and write text in chunks called tokens. In English, a token can be as short as one character or as long as one word (e.g., “a” or “apple”). The total number of tokens in an API call affects:

The cost of the API call, as you pay per token
The time of the API call, as writing more tokens takes more time
Whether the API call succeeds, as the total number of tokens must be below the model’s maximum limit (4,097 tokens for gpt-3.5-turbo)

Parameter Details

Frequency and Presence Penalties

The frequency and presence penalty parameters can be used to reduce the likelihood of sampling repetitive token sequences. Reasonable penalty coefficient values are around 0.1 to 1 if the goal is to slightly reduce repetitive samples. If the goal is to strongly suppress repetition, the coefficient can be increased to 2, but this may significantly reduce sample quality.

Completions API

The completions API endpoint received its final update in July 2023 and has a different interface from the new chat completions endpoint. The input is a free-form text string called a prompt.

Example Call

1
from openai import OpenAI
2
client = OpenAI()
3

4
response = client.completions.create(
5
  model="gpt-3.5-turbo-instruct",
6
  prompt="Write a tagline for an ice cream shop."
7
)

Chat Completions vs. Completions

The Chat Completions format can be made similar to the completions format by constructing a request with a single user message. The difference lies in the availability of underlying models. The chat completions API is the interface for the most powerful models (gpt-4o) and the most cost-effective models (gpt-3.5-turbo).

Which Model to Use?

We generally recommend using gpt-4o or gpt-3.5-turbo. Which one to use depends on the complexity of the task you are using the model for. gpt-4o performs better in a wide range of evaluations, especially in carefully following complex instructions. In contrast, gpt-3.5-turbo is more likely to follow only part of complex multi-part instructions. gpt-4o is less likely to hallucinate information, a behavior known as “hallucination.” gpt-4o also has a larger context window, up to 128,000 tokens, compared to 4,096 tokens for gpt-3.5-turbo. However, gpt-3.5-turbo has lower latency in returning outputs and is much cheaper per token.

Frequently Asked Questions

How to Set the Temperature Parameter?

Lower temperature values (e.g., 0.2) produce more consistent outputs, while higher temperature values (e.g., 1.0) generate more diverse and creative results. Choose a temperature value based on your specific application to balance coherence and creativity. The temperature range is from 0 to 2.

Can the Latest Models Be Fine-Tuned?

Currently, gpt-3.5-turbo and base models (babbage-002 and davinci-002) can be fine-tuned.

Should I Use ChatGPT or the API?

ChatGPT provides a chat interface for the models and comes with a range of built-in features such as integrated browsing, code execution, plugins, etc. In contrast, using OpenAI’s API offers more flexibility but requires writing code or programmatically sending requests to the model.

OpenAI Text Generation Model Guide

Application Scenarios

How to Use the Models

Models and APIs

Chat Completions API

Example Call

JSON Mode

Example Call

Managing Tokens

Parameter Details

Frequency and Presence Penalties

Completions API

Example Call

Chat Completions vs. Completions

Which Model to Use?

Frequently Asked Questions

How to Set the Temperature Parameter?

Can the Latest Models Be Fine-Tuned?

Should I Use ChatGPT or the API?

Similar Posts

AI Prompts and Prompt Engineering: Concepts, Design, and Optimization

Cursor Unlimited Trial Ultimate Solution: 6 Techniques to Bypass Restrictions

How to Use Whisper to Extract Video Text for Free