Knowledge-Sharing

Why OpenAI API's JSON Mode Might Not Always Yield JSON

Why OpenAI API's JSON Mode Might Not Always Yield JSON

Dec 15, 2023

In previous posts, we have discussed parallel function calling and reproducible outputs. Actually in API 1106 Updates, OpenAI also introduced support for JSON mode, a highly beneficial feature for developers due to its structured and easily parseable output.

Activating JSON mode is pretty straightforward. Simply follow the instructions in the API Interface Documentation and set the response_format in your request to { "type": "json_object" }.

Here's an example of a response in JSON mode:

JSON mode in Prompter

But why might you still encounter outputs that are not in JSON format?

Several reasons could be at play. Check the following:

  1. Model Compatibility: Ensure you are using either gpt-4-1106-preview or gpt-3.5-turbo-1106, as these are currently the only models supporting this feature. For legacy models, you can direct the output to be in JSON format in the system prompt, but they may not reliably produce correct JSON outputs. Be aware of the need to escape all double-quote characters within string outputs with backslashes in JSON values.

  2. Explicit Instructions: Even if you set the response_format to { "type": "json_object" }, you must specify in the prompt that the LLM should generate output in JSON format. Otherwise, the model might generate an endless stream of whitespace, running continually until it hits the token limit.

    Errors like the following may occur in GPT model:

    no explicit instruction to output JSON in system prompt

    To address this, include directives in your system message like:

    • You are a helpful assistant designed to output JSON.

    • Provide output in JSON format.

    • Output a JSON object.

    • Provide your output in json format with the keys: aaa and bbb.

  3. Incomplete JSON Text: If the JSON text output is incomplete, making it unparseable as a JSON object, check the finish_reason in the output. If it's due to length, your prompt might be too long, or you may have set too small max_tokens in request, resulting in incomplete output. You can check max_tokens in Prompter’s Parameters tab.

  4. Schema Matching: OpenAI doesn't guarantee a match to a specific schema. It ensures the JSON format is correct and parsable but may not align with a predefined schema. However, you can use few-shot techniques to specify the schema with examples, though this doesn’t guarantee complete compliance.

    Respond in JSON format. The schema should be like this:
    {
        "title": "xxxx",
        "summary": "xxxx",
        "body":
        [
            {
                "subtitle": "xxxx",
                "content": "xxxx"
            },
            {
                "subtitle": "xxxx",
                "content": "xxxx"
            },
            ...
        ],
        "tag":
        [
            "xxx",
            "xxx",
            "xxx",
            "xxx"
        ]
    }

Regardless, it’s vital to note that LLM still cannot guarantee 100% accuracy in JSON format output. Appropriate error catching and handling in your program are essential upon receiving a response with parsing errors.

Prompter now fully supports the JSON mode feature. The above screenshots were taken using Prompter. Just choose gpt-4-1106-preview or gpt-3.5-turbo-1106 and turn on the JSON toggle in the Parameters tab. Additionally, Prompter offers a variety of examples specifically for JSON mode. We welcome your feedback and encourage you to give it a try. Thank you.

In previous posts, we have discussed parallel function calling and reproducible outputs. Actually in API 1106 Updates, OpenAI also introduced support for JSON mode, a highly beneficial feature for developers due to its structured and easily parseable output.

Activating JSON mode is pretty straightforward. Simply follow the instructions in the API Interface Documentation and set the response_format in your request to { "type": "json_object" }.

Here's an example of a response in JSON mode:

JSON mode in Prompter

But why might you still encounter outputs that are not in JSON format?

Several reasons could be at play. Check the following:

  1. Model Compatibility: Ensure you are using either gpt-4-1106-preview or gpt-3.5-turbo-1106, as these are currently the only models supporting this feature. For legacy models, you can direct the output to be in JSON format in the system prompt, but they may not reliably produce correct JSON outputs. Be aware of the need to escape all double-quote characters within string outputs with backslashes in JSON values.

  2. Explicit Instructions: Even if you set the response_format to { "type": "json_object" }, you must specify in the prompt that the LLM should generate output in JSON format. Otherwise, the model might generate an endless stream of whitespace, running continually until it hits the token limit.

    Errors like the following may occur in GPT model:

    no explicit instruction to output JSON in system prompt

    To address this, include directives in your system message like:

    • You are a helpful assistant designed to output JSON.

    • Provide output in JSON format.

    • Output a JSON object.

    • Provide your output in json format with the keys: aaa and bbb.

  3. Incomplete JSON Text: If the JSON text output is incomplete, making it unparseable as a JSON object, check the finish_reason in the output. If it's due to length, your prompt might be too long, or you may have set too small max_tokens in request, resulting in incomplete output. You can check max_tokens in Prompter’s Parameters tab.

  4. Schema Matching: OpenAI doesn't guarantee a match to a specific schema. It ensures the JSON format is correct and parsable but may not align with a predefined schema. However, you can use few-shot techniques to specify the schema with examples, though this doesn’t guarantee complete compliance.

    Respond in JSON format. The schema should be like this:
    {
        "title": "xxxx",
        "summary": "xxxx",
        "body":
        [
            {
                "subtitle": "xxxx",
                "content": "xxxx"
            },
            {
                "subtitle": "xxxx",
                "content": "xxxx"
            },
            ...
        ],
        "tag":
        [
            "xxx",
            "xxx",
            "xxx",
            "xxx"
        ]
    }

Regardless, it’s vital to note that LLM still cannot guarantee 100% accuracy in JSON format output. Appropriate error catching and handling in your program are essential upon receiving a response with parsing errors.

Prompter now fully supports the JSON mode feature. The above screenshots were taken using Prompter. Just choose gpt-4-1106-preview or gpt-3.5-turbo-1106 and turn on the JSON toggle in the Parameters tab. Additionally, Prompter offers a variety of examples specifically for JSON mode. We welcome your feedback and encourage you to give it a try. Thank you.

In previous posts, we have discussed parallel function calling and reproducible outputs. Actually in API 1106 Updates, OpenAI also introduced support for JSON mode, a highly beneficial feature for developers due to its structured and easily parseable output.

Activating JSON mode is pretty straightforward. Simply follow the instructions in the API Interface Documentation and set the response_format in your request to { "type": "json_object" }.

Here's an example of a response in JSON mode:

JSON mode in Prompter

But why might you still encounter outputs that are not in JSON format?

Several reasons could be at play. Check the following:

  1. Model Compatibility: Ensure you are using either gpt-4-1106-preview or gpt-3.5-turbo-1106, as these are currently the only models supporting this feature. For legacy models, you can direct the output to be in JSON format in the system prompt, but they may not reliably produce correct JSON outputs. Be aware of the need to escape all double-quote characters within string outputs with backslashes in JSON values.

  2. Explicit Instructions: Even if you set the response_format to { "type": "json_object" }, you must specify in the prompt that the LLM should generate output in JSON format. Otherwise, the model might generate an endless stream of whitespace, running continually until it hits the token limit.

    Errors like the following may occur in GPT model:

    no explicit instruction to output JSON in system prompt

    To address this, include directives in your system message like:

    • You are a helpful assistant designed to output JSON.

    • Provide output in JSON format.

    • Output a JSON object.

    • Provide your output in json format with the keys: aaa and bbb.

  3. Incomplete JSON Text: If the JSON text output is incomplete, making it unparseable as a JSON object, check the finish_reason in the output. If it's due to length, your prompt might be too long, or you may have set too small max_tokens in request, resulting in incomplete output. You can check max_tokens in Prompter’s Parameters tab.

  4. Schema Matching: OpenAI doesn't guarantee a match to a specific schema. It ensures the JSON format is correct and parsable but may not align with a predefined schema. However, you can use few-shot techniques to specify the schema with examples, though this doesn’t guarantee complete compliance.

    Respond in JSON format. The schema should be like this:
    {
        "title": "xxxx",
        "summary": "xxxx",
        "body":
        [
            {
                "subtitle": "xxxx",
                "content": "xxxx"
            },
            {
                "subtitle": "xxxx",
                "content": "xxxx"
            },
            ...
        ],
        "tag":
        [
            "xxx",
            "xxx",
            "xxx",
            "xxx"
        ]
    }

Regardless, it’s vital to note that LLM still cannot guarantee 100% accuracy in JSON format output. Appropriate error catching and handling in your program are essential upon receiving a response with parsing errors.

Prompter now fully supports the JSON mode feature. The above screenshots were taken using Prompter. Just choose gpt-4-1106-preview or gpt-3.5-turbo-1106 and turn on the JSON toggle in the Parameters tab. Additionally, Prompter offers a variety of examples specifically for JSON mode. We welcome your feedback and encourage you to give it a try. Thank you.