View Issue Details

IDProjectCategoryView StatusLast Update
0011044phpList 3 applicationGeneralpublic21-06-18 14:00
Reporterbhugh Assigned To 
PrioritynormalSeveritytextReproducibilityalways
Status newResolutionopen 
Product Version2.10.2 
Summary0011044: Web crawlers hit "unsubscribe" link & blacklist subscribers
DescriptionA couple of subscribers to my phplist list system have this problem. (Unfortunately, I am one of them . . . )

For one reason or another, one or more of the email posts we have received through the PHPlist system have been posted to the WWW somewhere.

When email message was posted it must have included the "unsubscribe" link automatically added at the end of each message from PHPlist.

Now whenever a web crawler or spider crawls that page, it hits the link and suddenly we become blacklisted.

I can tell it is a webcrawler doing it because when I look at the subscriber history it always has something like "msnbot" or "googlebot" or "slurp" as the HTTP_USER_AGENT of the culprit.

This sort of thing happens more often than you might imagine--some people will forward a message they have received through PHPlist to another mailing list, that other mailing list has online archives, and the archives get crawled by the bots periodically.

The solution seems pretty simple--on the unsubscription/blacklisting page, just add a "confirm unsubscription" button that the person must click on before the unsubscribe goes through. People will click there but bots won't . . .
Tagsdocumentation

Activities

bhugh

15-08-07 05:52

reporter   ~0030573

BTW a short-term fix to this is to create a robots.txt that has a section like this:

User-agent: *
Disallow: /phplist/?p=unsubscribe

Also it may be a good idea to think about some way to exclude spiders/crawlers from this page, as well, that is typically included as part of every message's footer:

  http://XXXXXX.com/phplist/?p=preferences

Again, if someone inadvertantly posts a PHPlist message to the WWW, then their personal "preferences" link is available for anyone to click.

That is bad enough, but if a crawler/spider finds it, then their personal info is now indexed by a search engine--not good.

Again it can be foiled by some simple means like including a "click here to change your personal info" button that a person will click but a webcrawler won't follow.

michiel

19-04-10 20:51

administrator   ~0050933


hmm, for starters I would make sure that the content of emails doesn't end up on a page.

but I do think the robots file is a better option to avoid this. Alternatively, you can require a password to unsubscribe, to avoid it.

michiel

04-06-13 19:22

administrator   ~0052094


I can't think of a technical solution to this, apart from making sure it doesn't end up on a webpage, which is outside of the phpList control.

Will be best to mention this in the docs somewhere.

The only possible thing to do is to add rel=nofollow to the HTML version of the unsubscribe links, but that won't fix it for the text version.

Main thing is to either ask for a password to unsubscribe, or to have JUMP_OFF false.