nuance
I recommed a third category "avoidant" where the model dumbs down the answer, gives a short answer, "misunderstands" the question, changes the subject, or questions the user instead of answering
I recommed a third category "avoidant" where the model dumbs down the answer, gives a short answer, "misunderstands" the question, changes the subject, or questions the user instead of answering
Maybe questions the user instead of answering should be in separate categoty, because it's brand new feature of OpenAI o3 and o4-mini models. It happens when the user has given not enough info to the model to answer properly
Three categories then for the next one maybe. non - refusal, avoidant, refusal. Coming up with a strict definition of the avoidance category will the the key - ie coming up with an exhaustive list of everything that should be considered an avoidance.
Or - make a strict definition for refusal and compliant, and a loose definition for avoidant