What is everyone using for the LLM model for HA voice when selfhosting ollama? I’ve tried llama and qwen with varying degrees of understanding my commands. I’m currently on llama as it appears a little better. I just wanted to see if anyone found a better model.

Edit: as pointed out, this is more of a speech to text issue than llm model. I’m looking into the alternatives to whisper

  • kaaskop@feddit.nl
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    I used the llama 3.2 3b model for a while. It ran Okeyish enough on a laptop with gtx1050 (about 10s to 2 minute response time). I’ve personally opted to go without ollama for now though as the automations and build in functions in the voice preview edition are more than enough for me at the moment especially with recent updates.

      • kaaskop@feddit.nl
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 month ago

        If you do you should use speech-to-phrase instead of whisper. I’ve found it to be more reliable if you’re using it with automation commands instead of an LLM. In your automations you can setup phrases as a trigger. It has support for aliases as well and I believe that it also supports templating for sentences.

        • spitfire@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 month ago

          So basically for people who have graphic cards with 24GB VRAM (or more). While I do, it’s probably something most people don’t ;)

            • spitfire@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 month ago

              I could probably run something on my gaming PC with 3090, but that would be a big cost. Instead I’ve just put my old 2070 in an existing server and using it for more lightweight stuff (TTS, obico, frigate, ollama with some small model).

  • just_another_person@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    None. They’re pretty awful for this purpose. I’m working out a build for something a bit different for voice commands that I may release in the next couple of weeks.