ChatGPT works through a two-step process. The first step is a "pre-training" phase, where the model is trained on a large dataset containing parts of the Internet. It predicts what comes next in a sentence, learning grammar, facts, reasoning abilities, and some biases from the data.
The second step is fine-tuning, which is done on a more specific dataset with human reviewers following guidelines provided by OpenAI. These reviewers review and rate possible model outputs for a range of example inputs, helping ChatGPT learn its behavior. This iterative process allows the model to improve over time based on feedback.
OpenAI maintains a strong feedback loop with reviewers, through weekly meetings to address questions, provide clarifications, and learn from reviewer expertise. This feedback is an invaluable source for improving the model's behavior.
OpenAI takes user feedback seriously and has multiple safeguards in place. They use the Moderation API to warn or block certain types of unsafe content. They also work on expanding user feedback channels and are actively researching ways to include public perspectives to influence parameters and rules of the system.