It’s getting a bit ridiculous out here. I’m using DuckDuckGo but since it aggregates its search from other sources, it’s also gotten bad recently. Is there a search out there that blocks domains that spam AI? Extra points if there’s something like Ublock Origin that filters things based on a community-made list.
Edit: I’m aware of Kagi but it’s pretty expensive and I’m not a fan that they, too, host their own AI tools.
I tried doing some of this. I trained on a corpus of data I wanted it to read, with such a small amount of training data, I found it was overall too lossy. If I asked it a question about something that was in there and it responded there was a really good chance that it was in there. But there was a lot of not knowing something that was definitely in there. It wasn’t completely useless but I wouldn’t say that it was at the level of being truly helpful.
I worry that there’s not enough verified data out there to set up for proper training.
I suspect such a model would have to be far more attuned to its data being smaller but trustworthy. Something like chatGPT for example requires a huge volume because it’s weakly affected by any particular datum going in. It’s designed to adapt to general conversation norms, rather than specific facts. If you could take a generalist like chatGPT and combine it with an expert model that’s been told everything it’s told has a huge weighting then that would probably be a big step forward.