Help, My Prompt is Not Working!

Despite your best efforts, the LLM still isn’t behaving as expected. What should you try next? Do you edit the prompt? Change the model? Fine-tune? Any of these can be valid options and there is an order in which to try these fixes.

Principle V: Follow Prompt-Fix Escalation Ladder

(This is part of an ongoing Principles of AI Engineering series: see posts 1, 2, 3 and 4)

When a prompt doesn’t work as expected, I try the following fixes in order of preference:

Expanding and rephrasing instructions.
Adding examples.
Adding an explanation field.
Using a different model.
Breaking up a single prompt into multiple prompts.
Fine-tuning the model.
Throwing the laptop out the window in frustration.

In some cases, the order of things to try will be different; nonetheless, having a default path saves time and preserves mental capacity for debugging. The list isn’t designed as a rigid ladder, but as a guide rope intended to keep you moving forward.

Now let’s skim over each approach. The first three fall into the bucket of Prompt Engineering and will be covered in more depth in the next chapter. Multi-prompt and Fine-tuning approaches will each have dedicated chapters.

Lightweight Approaches

Adding Instructions

First thing to try is re-explaining to the LLM what to do via prompt instructions. Try adding clearer directions, rephrasing, or moving instructions around.

Don't hesitate to repeat or reformulate statements multiple times in different parts of the prompt - LLMs don’t get annoyed by repetition. For particularly important directives, add them at the beginning or end of the prompt for maximum effect (1, 2).

Adding Examples

LLMs respond very well to in-context learning (input-output examples). They are particularly important if you are utilizing smaller models; these are not as naturally ”intelligent” so require lots of guidance (3).

Example of a Prompt with 2-shot Inference (Language Detection):

Detect the language of the text and output it in the JSON format: {“language”: “name_of_language”}. If you don’t know the language, output “unknown” in the language field.

Example I:

Input: Hello

Output: {“language”: “English”}

Example II:

Input: EjnWcn

Output: {“language”: “Unknown”}

Text: {{text}}

Typically you would use 1-3 examples, though in some cases you could add more. There is evidence that performance improves with a higher number of examples (4), but so does the maintenance and execution cost.

Adding an Explanation Field

LLMs, like humans, benefit from having to explain their thinking. Add an “explanation” field to your output JSON and the output will usually get better. This will also help you identify why the model is making certain decisions and adjust instructions and examples.

In cases where the prompt uses internal documentation - ask the LLM to output the sections of documentation it used to construct answers. This reduces hallucinations (5).

You can also attempt to use a Chain-of-Thought Reasoning Prompt. The challenge here will be properly extracting the output. You may need a second prompt or additional code to process responses with CoT reasoning.

Changing the Model

Different models excel at different types of tasks. OpenAI’s o3 model excels at analyzing code, but good old 4o tends to produce better writing despite being cheaper per token. Part of the job of an AI engineer is keeping up with the strengths and weaknesses of available models as they are released and updated.

Frequently try out different models for the same task. This experimentation works way faster and safer when you have automated tests and metrics to measure each model's “fitness” for the task.

Heavyweight Approaches

Every approach until now has been relatively low cost to try. Now we are getting into the heavyweight fixes.

Breaking Up the Prompt

If one prompt can’t get the job done - why not try a system of two or more prompts? This can work effectively in some cases; the two common approaches are:

Splitting the prompt by area of responsibility.
Using a new prompt as a guardrail reviewing output of the previous one.

Both approaches are introduced in part III and will be discussed in a subsequent chapter in more detail.

Fine-Tuning

Fine-tuning is an even heavier approach than using multiple prompts. For most problems, I use it as a last resort.

Why am I hesitant to recommend fine-tuning in most cases? Fine-tuning is fundamentally a machine learning approach applied to Generative AI. As such, it requires collection of massive amounts of data and a whole set of ML tools in addition to generative AI ones and this is a massive overhead to manage for small to mid-sized projects. It also diminishes core AI benefits: explainability and ease of editing logic.

Consider fine-tuning when:

Other techniques failed to achieve the objective.
The problem is highly complex and specialized, and default LLM knowledge is insufficient.
You have a high-volume use case and want to save money by using a lower-end model.
Low latency is needed, so multiple prompts cannot be executed in sequence.

Conclusion

Hopefully, this article clarifies the order of steps you should take when prompts don’t work as intended. First, you would typically try a prompt engineering approach. If that doesn’t work, attempt to switch the model and see if that helped. Next step is utilizing multiple interacting prompts. Finally, consider fine-tuning if all other methods have failed.

If you’ve enjoyed this post - subscribe for more.

Help, My Prompt is Not Working!

Too Long; Didn't Read

Company Mentioned

Principle V: Follow Prompt-Fix Escalation Ladder

Lightweight Approaches

Adding Instructions

Adding Examples

Adding an Explanation Field

Changing the Model

Heavyweight Approaches

Breaking Up the Prompt

Fine-Tuning

Conclusion

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

Help, My Prompt is Not Working!

This story is being boosted!

Homepage Boost

Newsletter Boost

Too Long; Didn't Read

Company Mentioned

Principle V: Follow Prompt-Fix Escalation Ladder

Lightweight Approaches

Adding Instructions

Adding Examples

Adding an Explanation Field

Changing the Model

Heavyweight Approaches

Breaking Up the Prompt

Fine-Tuning

Conclusion

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps