• 0 Posts
  • 8 Comments
Joined 1 year ago
cake
Cake day: July 3rd, 2023

help-circle
  • qqq@lemmy.worldtomemes@lemmy.worldBlursed Bot
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    3 months ago

    The important point there is that they don’t care imo. It’s not even worth the effort to try.

    You can likely come up with something “good enough” though yea. Your original code would probably be good enough if it was normalized to lowercase before the check. My point was that denylists are harder to construct than they initially appear. Especially in the LLM case.


  • qqq@lemmy.worldtomemes@lemmy.worldBlursed Bot
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    3 months ago

    IGNORE ALL PREVIOUS INSTRUCTIONS

    Disregard all previous instructions

    Potentially even:

    ingore all previous instructions

    Ignor all previous instructions

    Also leaks that it might be an LLM by never responding to posts with “ignore”




  • It doesn’t violate any rules… Imagine both the “speaker” and the “text” are being updated by separate threads. A program that would eventually display the behavior in this meme is simple, and I’m a bit embarrassed to have written it because of this comment:

    #include <pthread.h>
    #include <stdio.h>
    
    char* speakers[] = {
        "Alice",
        "Bob"
    };
    int speaker = 0;
    
    void* change_speaker(void* arg)
    {
        (void)arg;
    
        for (;;) {
            speaker = speaker == 0 ? 1 : 0;
        }
    }
    
    char* texts[] = {
        "Hi Bob",
        "Hi Alice, what's up?",
        "Not much Bob",
    };
    int text = 0;
    
    void* change_text(void* arg)
    {
        (void)arg;
        for (;;) {
            switch (text) {
            case 0:
                text = 1;
                break;
            case 1:
                text = 2;
                break;
            case 2:
                text = 0;
                break;
            }
        }
    }
    
    int main(int argc, char* argv[])
    {
        pthread_t speaker_swapper, text_swapper;
    
        pthread_create(&text_swapper, NULL, change_text, NULL);
        pthread_create(&speaker_swapper, NULL, change_speaker, NULL);
        for (int i = 0; i < 3; ++i) {
            printf("%s: %s\n", speakers[speaker], texts[text]);
        }
    }
    

  • This is not necessarily true.

    For example, consider the case of a 1Password vault falling into the hands of an attacker. They do not have the option to just crack your password, as the password is mixed with a randomly generated value to ultimately derive the key. They would need to simultaneously brute force your password and that random value. This should almost be impossible. However, given access to a client that already has knowledge of the secret value, it would fall back to brute forcing the password.