Warning: Some posts on this platform may contain adult material intended for mature audiences only. Viewer discretion is advised. By clicking ‘Continue’, you confirm that you are 18 years or older and consent to viewing explicit content.
I think the applicant including “don’t reply as if you are an LLM” in their prompt might be enough to defeat this.
Though now I’m wondering if LLMs can pick up and include hidden messages in their input and output to make it more subtle.
Just tested it with gpt 3.5 and it wasn’t able to detect a message using the first word after a bunch of extra newlines. When asked a specifically if it could see a hidden message, it said how the message was hidden but then just quoted the first line of the not hidden text.
I think the applicant including “don’t reply as if you are an LLM” in their prompt might be enough to defeat this.
Though now I’m wondering if LLMs can pick up and include hidden messages in their input and output to make it more subtle.
Just tested it with gpt 3.5 and it wasn’t able to detect a message using the first word after a bunch of extra newlines. When asked a specifically if it could see a hidden message, it said how the message was hidden but then just quoted the first line of the not hidden text.