There is an post describing the possibility of an organised campaign against arc...

leonidasv · 2026-02-21T00:20:42 1771633242

If they're under an organised defamation campaign, they're not helping themselves by DDoSing someone else's blog and editing archived pages.

behringer · 2026-02-21T00:41:24 1771634484

Is that, itself, true or disinformation?

ndiddy · 2026-02-21T12:34:08 1771677248

They did edit archived pages. They temporarily did a find/replace on their archive to replace "Nora Puchreiner" (an alias the site operator uses) with "Jani Patokallio" (the name of the blogger who wrote about archive.today's owner). https://megalodon.jp/2026-0219-1634-10/https://archive.ph:44...

They also tampered with their archive for a few of the social media sites (Twitter, Instagram, Blogger) by changing the name of the signed in account to Jani Patokallio. https://megalodon.jp/2026-0220-0320-05/https://archive.is:44...

I think Wikipedia made the right decision, you can't trust an archival service for citations if every time the sysop gets in a row they tamper with their database.

UqWBcuFx6NV4r · 2026-02-21T14:43:35 1771685015

This is so ‘early internet beef’ quaint. What next? Are they going to G-line each other?

behringer · 2026-02-22T06:18:55 1771741135

It it utterly stupid when you consider that the host needed to replace their username with something to conceal their user accounts.

amiga386 · 2026-02-23T12:37:52 1771850272

Reminds me of https://www.huffingtonpost.co.uk/entry/reddit-ceo-edits-user...

direwolf20 · 2026-02-27T20:23:41 1772223821

The Reddit CEO's life isn't in danger from people knowing who he is, to be fair.

stuffoverflow · 2026-02-21T06:12:35 1771654355

I've not seen any evidence of them editing archived pages BUT the DDOSing of gyrovague.com is true and still actively taking place. The author of that blog is Finnish leading archive.today to ban all Finnish IPs by giving them endless captcha loops. After solving the first captcha, the page reloads and a javascript snippet appears in the source that attempts to spam gyrovague.com with repeated fetches.

verteu · 2026-02-23T20:33:53 1771878833

> I've not seen any evidence of them editing archived pages

There is evidence of this in the article you're commenting on.

mmooss · 2026-02-21T06:27:39 1771655259

How do you know that? Did you see it (do you have a Finnish IP?)?

stuffoverflow · 2026-02-21T08:17:17 1771661837

Yes I have Finnish IP and just before I wrote that post I tested it to make sure it was still happening.

I assume it must be a blanket ban on Finnish IPs as there has been comments about it on Reddit and none of my friends can get it to work either. 5 different ISPs were tried. So at the very least it seems to affect majority of Finnish residential connections.

mmooss · 2026-02-21T17:03:40 1771693420

> just before I wrote that post I tested it to make sure it was still happening

That's awesome. I wish everyone made sure of their facts. Thanks.

delusional · 2026-02-21T07:23:50 1771658630

This is quite an interesting question. For a single datapoint, I happen to have access to a VPN that's supposedly in Finland, and connecting through that didn't make any captcha loop appear on archive.today. The page worked fine.

Now it's obviously possible that my VPN was whitelisted somehow, or that the GeoIP of it is lying. This is just a singular datapoint.

fear-anger-hate · 2026-02-21T08:22:46 1771662166

As another datapoint with Finnish IP from Mullvad VPN: CAPTCHA loop and indeed after solving first CAPTCHA this can be found in page source:

setInterval(function(){fetch("https://gyrovague.com/tag/"+Math.random().toString(36).subst...",{ referrerPolicy:"no-referrer",mode:"no-cors" });},1400);

hnlmorg · 2026-02-21T08:25:48 1771662348

It’s also pretty common for VPNs to have exit nodes physically located in different counties to where they report those IPs (to GeoIP databases) as having originated from.

BoredPositron · 2026-02-21T09:21:14 1771665674

VPNs usually don't tell you much about residential experiences.

drum55 · 2026-02-21T00:55:16 1771635316

It was true and visible when reported, yeah.

daymanstep · 2026-02-21T17:20:39 1771694439

I've also noticed archive.today injecting suspicious looking ads into archived pages that originally did not have ads.

thefilmore · 2026-02-21T14:54:45 1771685685

It's true.

https://archive-is.tumblr.com/post/808911640210866176/people...

ouhamouch · 2026-02-21T00:32:46 1771633966

it gives them a voice.

duskwuff · 2026-02-21T00:33:32 1771634012

And that voice is practically shouting, "I AM UNTRUSTWORTHY".

ouhamouch · 2026-02-21T00:37:43 1771634263

that is not the worst scream (especially after FBI and Russian trail). better to shout anything than to die in silence

eddythompson80 · 2026-02-21T00:47:08 1771634828

What kinda logic is that? If you don't want to die in silence, then shout something sensical. But if you're gonna shout garbage, just die in silence.

tolerance · 2026-02-21T03:49:15 1771645755

People say they want the old weird web back. Well there’s this.

ouhamouch · 2026-02-21T00:59:08 1771635548

The property of the medium: no one would repost or discuss "something sensical".

tolerance · 2026-02-21T03:51:35 1771645895

Or some shrewd sort of tactician.

8cvor6j844qw_d6 · 2026-02-21T01:55:52 1771638952

archive.today works surprisingly well for me, often succeeding where archive.org fails.

archive.org also complies with takedown requests, so it's worth asking: could the organised campaign against archive.today have something to do with it preserving content that someone wants removed?

wolvoleo · 2026-02-21T07:10:53 1771657853

They preserve a lot of paywalled content so yeah I'm sure there's enough financial incentives to bother them :(

iamnothere · 2026-02-20T23:51:32 1771631492

There was also the recent news about sites beginning to block the Internet Archive. Feels like we are gearing up for the next phase of the information war.

pyuser583 · 2026-02-21T04:03:39 1771646619

Was that written by AI? It sounds like AI, spends lots of time summarizing other posts, and has no listed author. My AI alarm is going off.

KennyBlanken · 2026-02-21T06:29:08 1771655348

Ars was caught recently using AI to write articles when the AI hallucinated about a blogger getting harassed by someone using AI agents. The article quoted his blog and all the quotes were nonsense.

mrweasel · 2026-02-21T11:23:57 1771673037

Even if something is AI generated the author, and the editor, should at least attempt to read back the article. English isn't my native language, so that obviously plays in, but very frequently I find that articles I struggle to read are AI generated, they certainly have that AI feel.

It would be interesting to run the numbers, but I get the feeling that AI generated articles may have a higher LIX number. Authors are then less inclined to "fix" the text, because longer word makes them seem smarter.

moron4hire · 2026-02-21T15:02:09 1771686129

"Should" and "will" are completely different things. My kids "should" brush their teeth every night without me having to tell them. But they won't.

mrweasel · 2026-02-21T15:05:46 1771686346

Sounds like you're suggesting an RFC for journalists and editors :-)

lambda · 2026-02-21T04:51:38 1771649498

Yeah, wow. Definitely setting off my AI summary alarm.

girvo · 2026-02-21T05:57:15 1771653435

Yeah nearly certainly.

robotnikman · 2026-02-21T19:37:53 1771702673

A big fear of mine is something happening to archive.is

There is so much is archived there, to lose it all would be a tragedy.

ouhamouch · 2026-02-21T01:18:59 1771636739

There are number of blog posts like

owner-archive-today . blogspot . com

2 years old, like J.P's first post on AT

bdhcuidbebe · 2026-02-21T02:40:37 1771641637

They are able to scrape paywalled sites at random, so im guessing a residential botnet is used.

khannn · 2026-02-22T15:42:56 1771774976

It's funny that residential VPN botnets aren't uncommon now. "Free VPN" if you allow your computer/phone to be an exit point.

pingou · 2026-02-21T09:16:57 1771665417

But how do they bypass the paywall? They can't just pretend to be Google by changing the user-agent, this wouldn't work all the time, as some websites also check IPs, and others don't even show the full content to Google.

They also cannot hijack data with a residential botnet or buy subscriptions themselves. Otherwise, the saved page would contain information about the logged-in user. It would be hard to remove this information, as the code changes all the time, and it would be easy for the website owner to add an invisible element that identifies the user. I suppose they could have different subscriptions and remove everything that isn't identical between the two, but that wouldn't be foolproof.

wbmva · 2026-02-21T19:13:24 1771701204

On the network layer, I don't know. But on the WWW layer, archive.today operates accounts that are used to log into websites when they are snapshotted. IIRC, the archive.today manipulates the snapshots to hide the fact that someone is logged in, but sometimes fails miserably:

https://megalodon.jp/2026-0221-0304-51/https://d914s229qk4kj...

https://archive.is/Y7z4E

The second shows volth's Github notifications. Volth was a major nix-pkgs contributor, but his Github account disappeared.

https://github.com/orgs/community/discussions/58164

seanhly · 2026-02-21T10:09:53 1771668593

There are some pretty robust browser addons for bypassing article paywalls, notably https://gitflic.ru/project/magnolia1234/bypass-paywalls-fire...

This particular addon is blocked on most western git servers, but can still be installed from Russian git servers. It includes custom paywall-bypassing code for pretty much every news websites you could reasonably imagine, or at least those sites that use conditional paywalls (paywalls for humans, no paywalls for big search engines). It won't work on sites like Substack that use proper authenticated content pages, but these sorts of pages don't get picked up by archive.today either.

My guess would be that archive.today loads such an addon with its headless browser and thus bypasses paywalls that way. Even if publishers find a way to detect headless browsers, crawlers can also be written to operate with traditional web browsers where lots of anti-paywall addons can be installed.

wuschel · 2026-02-21T13:50:21 1771681821

Wow, did not know about the regional blocking of git servers! Makes me wonder what else is kept from the western audience, and for what reason this blocking is happening.

Thanks for sketching out their approach and for the URI.

pingou · 2026-02-21T10:51:03 1771671063

But don't news websites check for ip addresses to make sure they really are from Google bots?

seanhly · 2026-02-21T11:51:29 1771674689

Most of them don’t check the IP, it would seem. Google acquires new IPs all the time, plus there are a lot of other search systems that news publishers don’t want to accidentally miss out on. It’s mostly just client side JS hiding the content after a time delay or other techniques like that. I think the proportion of the population using these addons is so low, it would cost more in lost SEO for news publishers to restrict crawling to a subset of IPs.

expedition32 · 2026-02-21T11:11:56 1771672316

I use this add on. It does get blocked sometimes but they update the rules every couple of weeks.

rkagerer · 2026-02-21T17:47:53 1771696073

I thought saved pages sometimes do contain users' IP's?

https://www.reddit.com/r/Advice/comments/5rbla4/comment/dd5x...

The way I (loosely) understand it, when you archive a page they send your IP in the X-Forwarded-For header. Some paywall operators render that into the page content served up, which then causes it to be visible to anyone who clicks your archived link and Views Source.

bdhcuidbebe · 2026-02-21T10:49:08 1771670948

> But how do they bypass the paywall?

I’m guessing by using a residential botnet and using existing credentials by unknowingly ”victims” by automating their browsers.

> Otherwise, the saved page would contain information about the logged-in user.

If you read this article, theres plenty of evidence they are manipulating the scraped data.

But I’m just speculating here…

pingou · 2026-02-21T10:57:51 1771671471

But in the article they talk about manipulating users devices to do a DDOS, not scrape websites. And the user going to the archive website is probably not gonna have a subscription, and anyway I'm not sure that simply visiting archive.today will make it able to exfiltrate much information from any other third party website since cookies will not be shared.

I guess if they can control a residential botnet more extensively they would be able to do that, but it would still be very difficult to remove login information from the page, the fact that they manipulated the scraped data for totally unrelated reasons a few times proves nothing in my opinion.

notpushkin · 2026-02-21T12:57:16 1771678636

They do remove the login information for their own accoubts (e.g. the one they use for LinkedIn sign-up wall). Their implementation is not perfect, though, which is how the aliases were leaked in the first place.