Soft Prompts vs. Hard Prompts: A Deep Dive

Large language models (LLMs) rely on prompts to perform tasks like text generation, classification, and question answering. But not all prompts are created equal. In this post, we’ll explore the key differences between soft prompts and hard prompts, how they work, and when to use each.

What Are Hard Prompts?

A hard prompt is a manually written instruction in natural language. For example:

“Classify the sentiment of this review as Positive or Negative: ‘This movie was amazing!’”

Hard prompts work well for zero-shot or few-shot learning, where you guide the LLM by providing clear instructions. However, they have limitations:

Manually crafted: Requires prompt engineering expertise.
Fixed format: Cannot adapt dynamically to new tasks.
May be suboptimal: Slight wording changes can significantly impact performance.

What Are Soft Prompts?

A soft prompt is a trainable embedding vector prepended to the input text. Unlike hard prompts, soft prompts are not human-readable and live in the model’s embedding space. As part of the training process, the soft prompt is optimized to elicit the desired behavior from the LLM. The data used to train the soft prompt is examples of the task at hand, such as sentiment classification.

Instead of:

“Classify the sentiment of this review as Positive or Negative: …”

We prepend a learned embedding (a matrix of shape p × d, where p is the number of prompt tokens and d is the embedding dimension). The model then learns to associate these embeddings with task-specific behavior.

Soft prompts are optimized using gradient descent, but the LLM remains frozen — only the soft prompt embeddings are updated.

The benefit of soft prompts is that they can be trained on a small amount of data, making them a powerful alternative to fine-tuning large models. They are particularly useful for domain-specific tasks where hard prompts may not perform well.

Key Differences: Soft Prompts vs. Hard Prompts

Feature	Soft Prompt	Hard Prompt
Definition	A trainable embedding (vector) prepended to input	A manually written text prompt
Model Changes?	No, the LLM remains frozen	No, the LLM remains frozen
Optimization	Learned via gradient descent	Manually engineered
Format	Continuous vectors in embedding space (`p × d`)	Discrete text (natural language)
Interpretability	Hard to interpret (not human-readable)	Easy to understand
Performance	Usually better for domain-specific tasks	Can work well but is often suboptimal
Use Cases	Few-shot learning, task adaptation, fine-tuning alternative	General prompting, zero-shot inference

Why Do Soft Prompts Work?

Soft prompts work because LLMs already contain vast amounts of general knowledge. Instead of fine-tuning the entire model, we only train a small embedding vector to steer the model’s behavior.

They are particularly useful when:

Fine-tuning is too expensive (e.g., large LLMs like GPT-4).
We need task-specific adaptation without modifying model weights.
Hard prompts underperform, and we want an optimized alternative.

Example

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np


# Define a simple model with learnable soft prompts
class SoftPromptModel(nn.Module):
    def __init__(self, prompt_size: int, embedding_dim: int):
        super().__init__()
        self.soft_prompt = nn.Parameter(torch.randn(prompt_size, embedding_dim))

    def forward(self):
        return self.soft_prompt


# Define parameters
prompt_size = 5  # Number of soft prompt tokens
embedding_dim = 10  # Embedding size per token
num_epochs = 100  # Number of training epochs
learning_rate = 0.01  # Learning rate

# Initialize model
model = SoftPromptModel(prompt_size, embedding_dim)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
loss_fn = nn.MSELoss()

# Dummy target soft prompt embeddings
target_embeddings = torch.randn(prompt_size, embedding_dim)

# Training loop
for epoch in range(num_epochs):
    optimizer.zero_grad()
    output = model()
    loss = loss_fn(output, target_embeddings)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}")

# Final learned soft prompt embeddings
print("Final Soft Prompt Embeddings:", model().detach().numpy())

Epoch [10/100], Loss: 1.6211
Epoch [20/100], Loss: 1.4284
Epoch [30/100], Loss: 1.2571
Epoch [40/100], Loss: 1.1054
Epoch [50/100], Loss: 0.9715
Epoch [60/100], Loss: 0.8537
Epoch [70/100], Loss: 0.7501
Epoch [80/100], Loss: 0.6591
Epoch [90/100], Loss: 0.5791
Epoch [100/100], Loss: 0.5087
Final Soft Prompt Embeddings: [[ 1.4910809  -0.8699683  -1.9884871  -0.35277116  0.12150019 -0.16390091
   0.47803414  1.0293046   0.16990474 -0.46258494]
 [-0.07308722  0.3423777  -0.04903882 -0.00638362  1.1009207  -0.07041186
  -1.0775605   0.69498384 -0.4386929  -0.18202302]
 [-0.4229598   0.77623296 -0.22888693 -0.8386606  -0.00771936  0.27615315
   0.38049632 -0.04124581 -1.1192888   0.21972553]
 [ 0.04477229  0.7526956   0.888023    0.86666435 -0.5246883  -0.01704139
  -1.3241758  -0.7773695  -0.05395351  0.91333693]
 [-0.4218152   1.4064169   0.5676974   0.04455263  0.51899564 -1.6439441
   0.3903599   0.699438    0.9776204   0.9386464 ]]

Practical Takeaways

Use hard prompts for general tasks where simple instructions work well.
Use soft prompts when fine-tuning is infeasible but you need better performance.
Soft prompts are better for domain-specific adaptation without modifying the LLM.

Both approaches have their place, but soft prompts provide a powerful, low-cost alternative to fine-tuning. As LLMs continue to evolve, expect soft prompting techniques to become increasingly important in real-world applications.