Massive language fashions (LLMs) provide a number of parameters that allow you to fine-tune their conduct and management how they generate responses. If a mannequin isn’t producing the specified output, the difficulty typically lies in how these parameters are configured. On this tutorial, we’ll discover a number of the mostly used ones — max_completion_tokens, temperature, top_p, presence_penalty, and frequency_penalty — and perceive how every influences the mannequin’s output.
Putting in the dependencies
pip set up openai pandas matplotlib
Loading OpenAI API Key
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
Initializing the Mannequin
from openai import OpenAI
mannequin="gpt-4.1"
consumer = OpenAI()
Max Tokens
Max Tokens is the utmost variety of tokens the mannequin can generate throughout a run. The mannequin will attempt to keep inside this restrict throughout all turns. If it exceeds the desired quantity, the run will cease and be marked as incomplete.
A smaller worth (like 16) limits the mannequin to very quick solutions, whereas a better worth (like 80) permits it to generate extra detailed and full responses. Rising this parameter offers the mannequin extra room to elaborate, clarify, or format its output extra naturally.
immediate = "What's the hottest French cheese?"
for tokens in [16, 30, 80]:
print(f"n--- max_output_tokens = {tokens} ---")
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
max_completion_tokens=tokens
)
print(response.selections[0].message.content material)
Temperature
In Massive Language Fashions (LLMs), the temperature parameter controls the variety and randomness of generated outputs. Decrease temperature values make the mannequin extra deterministic and targeted on probably the most possible responses — excellent for duties that require accuracy and consistency. Increased values, however, introduce creativity and selection by permitting the mannequin to discover much less seemingly choices. Technically, temperature scales the chances of predicted tokens within the softmax operate: growing it flattens the distribution (extra various outputs), whereas reducing it sharpens the distribution (extra predictable outputs).
On this code, we’re prompting the LLM to provide 10 totally different responses (n_choices = 10) for a similar query — “What’s one intriguing place price visiting?” — throughout a variety of temperature values. By doing this, we are able to observe how the variety of solutions adjustments with temperature. Decrease temperatures will seemingly produce related or repeated responses, whereas greater temperatures will present a broader and extra diverse distribution of locations.
immediate = "What's one intriguing place price visiting? Give a single-word reply and assume globally."
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10
outcomes = {}
for temp in temperatures:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temp,
n=n_choices
)
# Acquire all n responses in an inventory
outcomes[temp] = [response.choices[i].message.content material.strip() for i in vary(n_choices)]
# Show outcomes
for temp, responses in outcomes.objects():
print(f"n--- temperature = {temp} ---")
print(responses)
As we are able to see, because the temperature will increase to 0.6, the responses change into extra various, transferring past the repeated single reply “Petra.” At a better temperature of 1.5, the distribution shifts, and we are able to see responses like Kyoto, and Machu Picchu as nicely.
Prime P
Prime P (often known as nucleus sampling) is a parameter that controls what number of tokens the mannequin considers primarily based on a cumulative likelihood threshold. It helps the mannequin concentrate on the probably tokens, typically bettering coherence and output high quality.
Within the following visualization, we first set a temperature worth after which apply Prime P = 0.5 (50%), which means solely the highest 50% of the likelihood mass is stored. Word that when temperature = 0, the output is deterministic, so Prime P has no impact.
The technology course of works as follows:
- Apply the temperature to regulate the token possibilities.
- Use Prime P to retain solely probably the most possible tokens that collectively make up 50% of the full likelihood mass.
- Renormalize the remaining possibilities earlier than sampling.
We’ll visualize how the token likelihood distribution adjustments throughout totally different temperature values for the query:
“What’s one intriguing place price visiting?”
immediate = "What's one intriguing place price visiting? Give a single-word reply and assume globally."
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10
results_ = {}
for temp in temperatures:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
temperature=temp,
n=n_choices,
top_p=0.5
)
# Acquire all n responses in an inventory
results_[temp] = [response.choices[i].message.content material.strip() for i in vary(n_choices)]
# Show outcomes
for temp, responses in results_.objects():
print(f"n--- temperature = {temp} ---")
print(responses)
Since Petra constantly accounted for greater than 50% of the full response likelihood, making use of Prime P = 0.5 filters out all different choices. Because of this, the mannequin solely selects “Petra” as the ultimate output in each case.
Frequency Penalty
Frequency Penalty controls how a lot the mannequin avoids repeating the identical phrases or phrases in its output.
Vary: -2 to 2
Default: 0
When the frequency penalty is greater, the mannequin will get penalized for utilizing phrases it has already used earlier than. This encourages it to decide on new and totally different phrases, making the textual content extra diverse and fewer repetitive.
In easy phrases — a better frequency penalty = much less repetition and extra creativity.
We’ll take a look at this utilizing the immediate:
“Listing 10 attainable titles for a fantasy ebook. Give the titles solely and every title on a brand new line.”
immediate = "Listing 10 attainable titles for a fantasy ebook. Give the titles solely and every title on a brand new line."
frequency_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
outcomes = {}
for fp in frequency_penalties:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
frequency_penalty=fp,
temperature=0.2
)
textual content = response.selections[0].message.content material
objects = [line.strip("- ").strip() for line in text.split("n") if line.strip()]
outcomes[fp] = objects
# Show outcomes
for fp, objects in outcomes.objects():
print(f"n--- frequency_penalty = {fp} ---")
print(objects)
- Low frequency penalties (-2 to 0): Titles are inclined to repeat, with acquainted patterns like “The Shadow Weaver’s Oath”, “Crown of Ember and Ice”, and “The Final Dragon’s Inheritor” showing regularly.
- Reasonable penalties (0.5 to 1.5): Some repetition stays, however the mannequin begins producing extra diverse and artistic titles.
- Excessive penalty (2.0): The primary three titles are nonetheless the identical, however after that, the mannequin produces various, distinctive, and imaginative ebook names (e.g., “Whisperwind Chronicles: Rise of the Phoenix Queen”, “Ashes Beneath the Willow Tree”).
Presence Penalty
Presence Penalty controls how a lot the mannequin avoids repeating phrases or phrases which have already appeared within the textual content.
- Vary: -2 to 2
- Default: 0
The next presence penalty encourages the mannequin to make use of a greater diversity of phrases, making the output extra various and artistic.
In contrast to the frequency penalty, which accumulates with every repetition, the presence penalty is utilized as soon as to any phrase that has already appeared, decreasing the possibility will probably be repeated within the output. This helps the mannequin produce textual content with extra selection and originality.
immediate = "Listing 10 attainable titles for a fantasy ebook. Give the titles solely and every title on a brand new line."
presence_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
outcomes = {}
for fp in frequency_penalties:
response = consumer.chat.completions.create(
mannequin=mannequin,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
presence_penalty=fp,
temperature=0.2
)
textual content = response.selections[0].message.content material
objects = [line.strip("- ").strip() for line in text.split("n") if line.strip()]
outcomes[fp] = objects
# Show outcomes
for fp, objects in outcomes.objects():
print(f"n--- presence_penalties = {fp} ---")
print(objects)
- Low to Reasonable Penalty (-2.0 to 0.5): Titles are considerably diverse, with some repetition of frequent fantasy patterns like “The Shadow Weaver’s Oath”, “The Final Dragon’s Inheritor”, “Crown of Ember and Ice”.
- Medium Penalty (1.0 to 1.5): The primary few in style titles stay, whereas later titles present extra creativity and distinctive combos. Examples: “Ashes of the Fallen Kingdom”, “Secrets and techniques of the Starbound Forest”, “Daughter of Storm and Stone”.
- Most Penalty (2.0): Prime three titles keep the identical, however the remainder change into extremely various and imaginative. Examples: “Moonfire and Thorn”, “Veil of Starlit Ashes”, “The Midnight Blade”.
Try the FULL CODES here. Be happy to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
I’m a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I’ve a eager curiosity in Knowledge Science, particularly Neural Networks and their software in varied areas.