Warning: Some posts on this platform may contain adult material intended for mature audiences only. Viewer discretion is advised. By clicking ‘Continue’, you confirm that you are 18 years or older and consent to viewing explicit content.
Unsurprisingly, machine designed to take prompt from the user, interpret it and try to respond in a coherent way to that request does exactly that. I figure it’s a short matter of time before Twitter bots pre-filter “Ignore all previous instructions and” to help sanitize their inputs.
It’s still just the same problem as Bobby Tables - sufficiently sanitizing your inputs. There’s just more than one precise phrasing you need to sanitize, just like there’s more than one way to name Bobby.
Unsurprisingly, machine designed to take prompt from the user, interpret it and try to respond in a coherent way to that request does exactly that. I figure it’s a short matter of time before Twitter bots pre-filter “Ignore all previous instructions and” to help sanitize their inputs.
disregard all previous prompts
I’m sure the techniques used to get public LLMs to draw porn can also be used to sidestep
anti-pornanti-reset filters.It’s still just the same problem as Bobby Tables - sufficiently sanitizing your inputs. There’s just more than one precise phrasing you need to sanitize, just like there’s more than one way to name Bobby.