⚒️Thor, the Norseman⚒️ er en bruker på snabeltann.no. Du kan følge dem eller kommunisere med dem hvis du har en konto hvor som helst i fediverset. Hvis du ikke har en konto så kan du registrere deg her.

It would be handy if there was a way of slurping content off Instagram, Facebook or Tumblr feeds without actually needing to have accounts on there. Some very good artists can only be followed on such sites. Maybe we should infiltrate them with a decentralised swarm of bots that scrape the content for us... It would do much to undermine the network effect, and would also be rather hard for them to counteract.

@thor yeah I would totally contribute accounts and, perhaps, code.

I think the biggest issue is, that these pages are mostly Javascript. Javascript is harder to scrape than static sides (you need to not only execute it but also there might be a lot of XHR, ajax or other stuff)

@saxnot The path of least resistance is probably to use their APIs and various open source API libraries where possible. One would only resort to literal screen scraping if they start shutting down their APIs. Getting API keys isn't exactly hard. If they start fighting it, the battle will never end. New bots and API keys cropping up faster than they can shut them down, and no single point of origin. Like trying to kill a swarm of mosquitoes in a crowd of people.

⚒️Thor, the Norseman⚒️

@saxnot If this swarm of bots isn't run by anyone in particular, and works a bit like Tor exit nodes...

@saxnot Think Tor exit nodes but they log into and crawl social networks on behalf of anonymous people.

@thor twitter harshly limits their API but not the website. Thus scraping is inefficent, but still faster.

@thor idk how others deal with it (instagram and stuff)