conference paper

"Prompter Says": A Linguistic Approach to Understanding and Detecting Jailbreak Attacks Against Large-Language Models

Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis

Publication Date

November 19, 2023

Author(s)

Dylan Lee, Shaoyuan Xie, Shagoto Rahman, Kenneth Pat, David Lee, Qi Alfred Chen
Suggested Citation
Dylan Lee, Shaoyuan Xie, Shagoto Rahman, Kenneth Pat, David Lee and Qi Alfred Chen (2023) “"Prompter Says": A Linguistic Approach to Understanding and Detecting Jailbreak Attacks Against Large-Language Models”, in Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis. CCS '24: ACM SIGSAC Conference on Computer and Communications Security, Salt Lake City UT USA: ACM, pp. 77–87. Available at: 10.1145/3689217.3690618.