Structured Prompting techniques

Local LLMs
and JSON
output

A few
things that
worked for
me

diegobit
17 Oct 2025

Local LLMs and JSON output. A few things that worked for me.

I was listening to DHH and Jason Fried talk about how they handle communication at 37signals, and I heard what I think is a really good advice. They suggest to just talk about what you are working on without great expectations.

They say that documenting the problems you encounter and how you solve them is easier and more interesting than trying to craft specific content for publication. So, something in between a journal you write for yourself and a technical blog. Was it obvious? Probably for personal writing, probably not for companies.

Let's start today with some prompting techniques I've been using recently while building agents.

Premise

Some facts and good practises:

  • It's almost always better to use the structured output support of the agentic framework of your choice. It's easier and cleaner to write eg. a BaseModel in Pydantic than a JSON schema manually to be appended to the prompt.
  • You have to give the LLM everything it needs to answer effectively (a rule of thumb is to put yourself in its shoes: what would you need to know to solve the task if this was your first time seeing the task and the domain?)
  • But filling the context with a million tokens is just going to confuse it.
  • Chain of Thought is powerful: making the model reason about the problem space before actually answering guides its sampling toward better solutions. If you ask the model to reason after the answer, you are just asking it to rationalize whatever happened to be generated.
  • Yet if your structured output is too dense (let's say, a JSON with 20 fields), then you are implicitly asking the model to be concise and condense too much in little space.

Problem 1

The answers in the json output are not detailed enough, or the model is failing to understand all the details in the context.

This happens because you are giving the model too much context, or you are asking the model too much in a single generation.

Technique 1. Split the output generation in steps, each step with a tweaked prompt, and give the model the previous generation as additional context. You can later manually concatenate the produced JSON fields.

Technique 2. Make the model produce its overall json, then call again the model with a dedicated prompt for each field you need it to expand (without structured output, ideally). In this way the model will use its whole capacity to produce the most effective answer for that particular field.

Problem 2

The model fails to produce a valid JSON.

This means you are asking the model too much. Try Technique 1 again, or isolate the problematic field and generate it freely. Other options:

Technique 3. Check if your framework supports grammar-constrained generation. In this mode, sampling is constrained to generate valid continuations. Eg. inside a JSON string, the model can't close a bracket prematurely. Unfortunately this is not supported by all frameworks and LLM engines.

Technique 4. Implement a smart retry strategy in which you tune model parameters at every retry. The most obvious thing is to change the temperature. Another (not very effective) technique is to add a line at the end of you prompt telling the model to be extra careful about producing a valid JSON. You could also try a different model, or have an alternative version of the system prompt. Mixing all these can bring good results, but you still have no guarantee of success.

Technique 5. The last resort is to put the JSON schema directly in the prompt (you should be able to export it, eg. Pydantic model_dump_json()) and make the LLM generate a string with no validation. Then ask a second model with a specific prompt to validate the JSON output and repair it.