It’s getting a bit ridiculous out here. I’m using DuckDuckGo but since it aggregates its search from other sources, it’s also gotten bad recently. Is there a search out there that blocks domains that spam AI? Extra points if there’s something like Ublock Origin that filters things based on a community-made list.

Edit: I’m aware of Kagi but it’s pretty expensive and I’m not a fan that they, too, host their own AI tools.

  • rumba@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 days ago

    I tried doing some of this. I trained on a corpus of data I wanted it to read, with such a small amount of training data, I found it was overall too lossy. If I asked it a question about something that was in there and it responded there was a really good chance that it was in there. But there was a lot of not knowing something that was definitely in there. It wasn’t completely useless but I wouldn’t say that it was at the level of being truly helpful.

    I worry that there’s not enough verified data out there to set up for proper training.

    • FourPacketsOfPeanuts@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 days ago

      I suspect such a model would have to be far more attuned to its data being smaller but trustworthy. Something like chatGPT for example requires a huge volume because it’s weakly affected by any particular datum going in. It’s designed to adapt to general conversation norms, rather than specific facts. If you could take a generalist like chatGPT and combine it with an expert model that’s been told everything it’s told has a huge weighting then that would probably be a big step forward.