• [email protected]@lemmy.federate.cc
    link
    fedilink
    arrow-up
    75
    ·
    edit-2
    22 days ago

    This kind of seems like a non-article to me. LLMs are trained on the corpus of written text that exists out in the world, which are overwhelmingly standard English. American dialects effectively only exist while spoken, be it a regional or city dialect, the black or chicano dialect, etc. So how would LLMs learn them? Seems like not a bias by AI models themselves, rather a reflection of the source material.

    • lily33@lemm.ee
      link
      fedilink
      arrow-up
      52
      ·
      edit-2
      22 days ago

      It’s not an article about LLMs not using dialects. In fact, they have learned said dialects and will use them if asked.

      What they did was, ask the LLM to suggest adjectives associated with sentences - and it would associate more aggressive or negative adjectives with African dialect.

      Seems like not a bias by AI models themselves, rather a reflection of the source material.

      All (racial) bias in AI models is actually a reflection of the training data, not of the modelling.

      • JohnEdwa@sopuli.xyz
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        21 days ago

        I would assume the small amount of training data written that way doesn’t contain that many professional research papers, corporate emails or calm poetry, but would consist mostly of social media posts and comments which have a rather heavy bias towards aggressive and negative.

    • BlackEco@lemmy.blackeco.comOP
      link
      fedilink
      arrow-up
      27
      ·
      edit-2
      22 days ago

      Seems like not a bias by Al models themselves, rather a reflection of the source material.

      That’s what is usually meant by AI bias: a bias in the material used to train the model that reflects in its behavior

    • Melody Fwygon@lemmy.one
      link
      fedilink
      English
      arrow-up
      4
      ·
      22 days ago

      Yeah this seems like a non-issue to me as well; the source material for the models is probably the cause of this bias.

      I also don’t think there’s a lot of sources for this manner of speaking. Let’s also not forget that there’s oftentimes instructions given to the LLM that ask it to avoid certain topics which it will in fact do.

    • Toribor@corndog.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      20 days ago

      I’m from the Midwest US and I know there are words and sounds I pronounce with a Midwestern accent but I can still type and spell them correctly.

      If’n I typ lik dis den o’course people gonna think I hev the big dumb or that I’m a mole from a Redwall book.