I feel like every day I come across 15-20 "AI-powered tool"s that “analyze” something, and none of them clearly state how they use data. This one seems harmless enough, put a profile in, it will scrape everything about them, all their personal information, their location, every post they ever made… Nothing can possibly go wrong aggregating all that personal info, right? No idea where this data is sent, where it’s stored, who it’s sold to. Kinda alarming

  • General_Effort@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    A toy like that is easy to create and not that expensive to offer. Much more expensive than some JavaScript or CSS, but in the end it’s not that different.

    I think people don’t really understand this whole scraping thing. For example, you can torrent all of Reddit until the API-change; all the comments, profiles, usernames, including now deleted stuff. There is a lot of outrage here over Reddit cracking down on these 3rd party tools. It’s difficult to see how that outrage over cracking down on 3rd party tools, fits with this outrage here over not cracking down on 3rd party tools.

    Anyway, if someone want to archive all of Bluesky, they don’t need to offer some AI toy. They can just download the content via the API.

    • DuckWrangler9000@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      A toy like that is easy to create and not that expensive to offer.

      Right, and the developers of Bsky didn’t think to maybe block something that scrapes all that personal information?

      • Scipitie@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        That would always by definition block all third parties.

        Think of the reddit example from the person you replied to: there was a huge outcry when reddit announced shutting down their lower API tiers.

        Either information is free to flow or not at all, there is no middle ground.

        With that in mind: I’m sure they thought about it and decided to prioritize transparency she flexibility over security. Personally I support that decision.

        • DuckWrangler9000@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          I know how APIs on reddit work, but you can block people who misuse the API if they’re doing something nefarious. Some of these AI are in my honest opinion very taxing on hardware. Having to retrieve millions of posts, comments, pictures, text, on demand… and send that to who knows where for AI scraping… Sounds very costly.