Knowledge-Sharing

In-Depth Analysis: OpenAI API 1106 Updates - Seed & System Fingerprint

In-Depth Analysis: OpenAI API 1106 Updates - Seed & System Fingerprint

Dec 8, 2023

Following our previous in-depth analysis of the updated function calling, this post will delve into the newly introduced Seed and System Fingerprint features in terms of Reproducible outputs.

As engineers, we often desire complete control over our systems. Ideally, the same inputs through the same system should always produce the same outputs. This predictability allows us to isolate and debug specific variables, understanding their impact on the system while keeping other inputs constant.

system under testing

However, LLMs present a unique challenge in this regard. Due to their inherent randomness, LLMs may produce different results even with identical parameters and prompts. This unpredictability can lead to issues in accuracy, as I discussed in Prompt Engineering: Principles to Uphold.

LLM uncertainty

Traditionally, I mitigated this issue by setting the temperature to its lowest value, zero, to limit divergence in the LLM's output. However, this approach sometimes compromised the model's creative capabilities, representing a double-edged sword.

zero temperature

The good news is that OpenAI API 1106 update introduces the Seed and System Fingerprint features. Simply put, if all parameters and messages remain constant, and the same seed is set, OpenAI can ensure that the outputs will be consistent in most cases. This holds true even if the temperature is non-zero. Additionally, to verify the consistency of LLM outputs, OpenAI has included the System Fingerprint parameter in the request response, allowing users to compare the seed, parameters, prompts, completions, and system fingerprint to verify data consistency and replicability.

In tests using Prompter with the same parameters and messages but without setting a seed value, we observed potential inconsistencies in the outputs.

without seed

However, upon setting a seed value and re-running the tests, we noticed that the output's system fingerprint remained consistent as fp_eeff13170a, and the results were uniformly identical across multiple runs.

with the same seed

Here's an example of input request used in the tests:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant to output in JSON format."
    },
    {
      "role": "user",
      "content": "calculate 99x99"
    }
  ],
  "model": "gpt-3.5-turbo-1106",
  "temperature": 1,
  "top_p": 1,
  "n": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "seed": 12345678,
  "response_format": {
    "type": "json_object"
  }
}

The response returned by OpenAI was as follows:

{
  "id": "chatcmpl-8xxxxxxxxHrwt8Y2E",
  "object": "chat.completion",
  "created": 1702046085,
  "model": "gpt-3.5-turbo-1106",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\\n  \\"result\\": 9801\\n}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 27,
    "completion_tokens": 10,
    "total_tokens": 37
  },
  "system_fingerprint": "fp_eeff13170a"
}

Thus, the Seed feature will become a decisive tool for engineers needing strict control over chat completions. If this is the feature you have been waiting for, give it a try with Prompter, which now fully supports Seed and System Fingerprint.

Following our previous in-depth analysis of the updated function calling, this post will delve into the newly introduced Seed and System Fingerprint features in terms of Reproducible outputs.

As engineers, we often desire complete control over our systems. Ideally, the same inputs through the same system should always produce the same outputs. This predictability allows us to isolate and debug specific variables, understanding their impact on the system while keeping other inputs constant.

system under testing

However, LLMs present a unique challenge in this regard. Due to their inherent randomness, LLMs may produce different results even with identical parameters and prompts. This unpredictability can lead to issues in accuracy, as I discussed in Prompt Engineering: Principles to Uphold.

LLM uncertainty

Traditionally, I mitigated this issue by setting the temperature to its lowest value, zero, to limit divergence in the LLM's output. However, this approach sometimes compromised the model's creative capabilities, representing a double-edged sword.

zero temperature

The good news is that OpenAI API 1106 update introduces the Seed and System Fingerprint features. Simply put, if all parameters and messages remain constant, and the same seed is set, OpenAI can ensure that the outputs will be consistent in most cases. This holds true even if the temperature is non-zero. Additionally, to verify the consistency of LLM outputs, OpenAI has included the System Fingerprint parameter in the request response, allowing users to compare the seed, parameters, prompts, completions, and system fingerprint to verify data consistency and replicability.

In tests using Prompter with the same parameters and messages but without setting a seed value, we observed potential inconsistencies in the outputs.

without seed

However, upon setting a seed value and re-running the tests, we noticed that the output's system fingerprint remained consistent as fp_eeff13170a, and the results were uniformly identical across multiple runs.

with the same seed

Here's an example of input request used in the tests:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant to output in JSON format."
    },
    {
      "role": "user",
      "content": "calculate 99x99"
    }
  ],
  "model": "gpt-3.5-turbo-1106",
  "temperature": 1,
  "top_p": 1,
  "n": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "seed": 12345678,
  "response_format": {
    "type": "json_object"
  }
}

The response returned by OpenAI was as follows:

{
  "id": "chatcmpl-8xxxxxxxxHrwt8Y2E",
  "object": "chat.completion",
  "created": 1702046085,
  "model": "gpt-3.5-turbo-1106",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\\n  \\"result\\": 9801\\n}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 27,
    "completion_tokens": 10,
    "total_tokens": 37
  },
  "system_fingerprint": "fp_eeff13170a"
}

Thus, the Seed feature will become a decisive tool for engineers needing strict control over chat completions. If this is the feature you have been waiting for, give it a try with Prompter, which now fully supports Seed and System Fingerprint.

Following our previous in-depth analysis of the updated function calling, this post will delve into the newly introduced Seed and System Fingerprint features in terms of Reproducible outputs.

As engineers, we often desire complete control over our systems. Ideally, the same inputs through the same system should always produce the same outputs. This predictability allows us to isolate and debug specific variables, understanding their impact on the system while keeping other inputs constant.

system under testing

However, LLMs present a unique challenge in this regard. Due to their inherent randomness, LLMs may produce different results even with identical parameters and prompts. This unpredictability can lead to issues in accuracy, as I discussed in Prompt Engineering: Principles to Uphold.

LLM uncertainty

Traditionally, I mitigated this issue by setting the temperature to its lowest value, zero, to limit divergence in the LLM's output. However, this approach sometimes compromised the model's creative capabilities, representing a double-edged sword.

zero temperature

The good news is that OpenAI API 1106 update introduces the Seed and System Fingerprint features. Simply put, if all parameters and messages remain constant, and the same seed is set, OpenAI can ensure that the outputs will be consistent in most cases. This holds true even if the temperature is non-zero. Additionally, to verify the consistency of LLM outputs, OpenAI has included the System Fingerprint parameter in the request response, allowing users to compare the seed, parameters, prompts, completions, and system fingerprint to verify data consistency and replicability.

In tests using Prompter with the same parameters and messages but without setting a seed value, we observed potential inconsistencies in the outputs.

without seed

However, upon setting a seed value and re-running the tests, we noticed that the output's system fingerprint remained consistent as fp_eeff13170a, and the results were uniformly identical across multiple runs.

with the same seed

Here's an example of input request used in the tests:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant to output in JSON format."
    },
    {
      "role": "user",
      "content": "calculate 99x99"
    }
  ],
  "model": "gpt-3.5-turbo-1106",
  "temperature": 1,
  "top_p": 1,
  "n": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "seed": 12345678,
  "response_format": {
    "type": "json_object"
  }
}

The response returned by OpenAI was as follows:

{
  "id": "chatcmpl-8xxxxxxxxHrwt8Y2E",
  "object": "chat.completion",
  "created": 1702046085,
  "model": "gpt-3.5-turbo-1106",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\\n  \\"result\\": 9801\\n}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 27,
    "completion_tokens": 10,
    "total_tokens": 37
  },
  "system_fingerprint": "fp_eeff13170a"
}

Thus, the Seed feature will become a decisive tool for engineers needing strict control over chat completions. If this is the feature you have been waiting for, give it a try with Prompter, which now fully supports Seed and System Fingerprint.