Archive for February, 2005

Catching new referrers

Saturday, February 19th, 2005

I had a discussion with one of the regulars here, about how we find new referrers. Which means either new people linking to us, or new spammers. Shrugs…

Anyway, he used to look at the stats, while I’ll only check the first 10 rather quickly.

In my opinion, the problem with website stats, is that they’re for the whole month. And if you want to check out what’s happened since yesterday, you’ll have to slog through the whole list, going googly eyed in the process, trying to remember which ones are new.

So, here’s what I do:

I download my raw log files. Not necessarily the whole file. I might grep for the last two days and download gzipped versions of those. If you’re on a cpanel webhost without shell access, use cron for that. Here are some pointers that can be adapted. But really, it’s as simple as:
grep ‘19/Feb/’ /path/to/yourlog | gzip -9 > /home/username/19feb.gz
Remember that paths are different from host to host, and you may need some time to figure out yours.

OK, so, then I unzip them and copy the contents into one file.

And then I fire up TextHarvest
(this only works for windows machines. For *NIX and Mac I recomment GREP and batch files, though it requires more coding).

I start by removing anything from the /Keep list
and add one by one referrers I don’t need to be reminded of in the /Delete list

Start each keyword with \
I think default is /, but that doesn’t work with log files, because there are two many instances of the /. \ is my favorite. It hasn’t broken yet with log files.

The trick here is to keep the list in a text doc, because it will grow over time. TextHarvest manages a very large list of exclusions, but if you enter several K worth of keywords, it’ll barf.

When you’ve run the query and browse the results, you can add more keywords to the list. Here’s a small part of mine:
\annelisabeth.com\”"\”-”\W3CRobot\metafilter\403 \kuro5hin

What you want to filter out depends on what you’re looking for. New linkers or spammers. I like to look for anything I haven’t seen before. So almost everything gets added to my list with time.

But the beauty of keeping this list in a text doc, is that at any time you can delete the list from TextHarvest and just search for say the error code 403. Remember to put a space afterwords, or you’ll get a lot of false positives. Most of our .htaccess blocks produce 403 errors, so it’s a nice way of keeping track of the spamming activities of the Bulgarians and Alexander.

Any questions?

New clueless reffy spammer

Saturday, February 19th, 2005

We’ve got a new clueless reffy spammers in the logs yesterday.

Say hello to
Matt

Who’s clueless enough to use his own (?) DSL line to spam his adult related domains.

Abuse reports sent to both ISP and webhosts.

Heh, he must be very happy about the way he looks. Didn’t take me more than a minute to find a picture of him. All his whois info has different addresses, but he puts a picture online? Tell me what the logic is in that?

Bulgarians trackbacking again

Friday, February 18th, 2005

I’d turned off some of my .htaccess protections, so when the Bulgarians started sending trackbacks, one came through right away.

This just started over at my place.

I don’t know if it was a one off, or a new attack starting.

I reinstalled the old blocks, and so far no error messages.

More about the Bulgarians

Running down the Bulgarians

Friday, February 18th, 2005

The third entry on February 16 for joatBlog is an interesting lesson in tracing, with our ‘favorite’ spammers as the subject.

More about the Bulgarians

New tactic from the Bulgarian spammers

Friday, February 18th, 2005

We’ve talked about the bait and switch before. The spammers put up an account terminated notice while doing a spam run, then switch it to the real site after the spam run is finished.

And that’s what I thought we were seeing with the nutzu spam run too. But I didn’t look deeply enough.

Michael commented below here, that the real page is actually already there. It’s the javascript that loads the termination page. Which means Google will never see it. They’ll only see the page the spammers intended for it to see.

Which in turn means that from now on, we have to check spamvertized pages even better than before. We know they’ve been cloaking their pages for a while with javascript. But now they’ve taken it to new heights.

UPDATE: I realized after thinking about it for a while, that they’re going to have to remove the javascript at some point. Because after all, the goal is to have humans eyeballing the site at some point. After people have found the site through Google, that is.

More about the Bulgarians

Referrer spam: Many things can be misused, it seems

Thursday, February 17th, 2005

LeechGet was misused by a spammer today.

Sam Spade can even be misused.

And Elliot Bäck even made a free reffy clone.

Also, make note that I caught someone in my logs today, using even the user agents of search engine spiders. Well known ones. IP address: 213.23.176.235

So, why am I posting the links, knowing full well people can misuse these things?

Simple, only stupid people referrer spam from here on out.

I fully expect the domains spamvertized in this way will be banned. Maybe not today, maybe not tomorrow, but unless you run a campaign with a lifespan of one month, and expect to trash the domain after that, you’ll eventually be bested by Google.

It’s just a matter of time.

After all, see what happened to M0nkey…

New offender - referrer spam

Thursday, February 17th, 2005

I have a new offender, spamvertizing a slew of different domains. He’s using software he doesn’t completely understand.

69.225.183.82

My guess is he’s using his own IP address.

Investigation underway, and will be added to this post as I slow through whatever evidence I find…

OK, I’ve went through the domains he’s spamvertizing, and it seems to me this is either a test run, or someone trying to sully the name of respected sites. One of these is a university that’s been online since 1985. It’s of course possible these companies have hired an incredibly clueless SEM (Search Engine Marketing) “Expert”, but I somehow doubt it. The software might be Reffy run without proxies.

You should be able to recognize him by his convention of not including http:// in front of his URL’s.

Trying out some affiliates

Thursday, February 17th, 2005

For those who are wondering why there’s a link and a banner on the right menu here, I’m trying out a few affiliates through shareasale:

Naked Zombie has funny and idiotic t-shirts:

Hacker stickers have stickers, shirts and stuff geared towards hackers and geeks.
Hacker Stickers - Gear, Candy and Clothing for Hackers.


I guess my favorite must be the white hat caps.
I’d be tempted to send M0nkey a black hat, though. Heh, wonder what his parents would say if someone actually did? Oh, and of course accidentally sent it to his parents, with his name on the inside?

Not sure how long I’ll keep the affiliates. I obviously don’t agree with everything they sell… We’ll see. Chuck it up as an experiment.

You know, I first got the taste for these kinds of shirts when I visited a store here in Oslo. They’ve got a catalog full of stupid stuff. Had a grand ole time pointing out how the guy who did most of the shirts couldn’t spell English worth anything too. And then this lady comes out and spoils my fun by saying his typos had been discovered before the shirts went to print. Heh, spoilsport… Actually, I’d love to be an affiliate for that store. Their stuff is better than most I’ve seen.

UPDATE: This is amazing. I checked my stats at shareasale just after writing this post. Up to 4-5 visits already. And a few minutes afterwards it’s 6-7. I had about 1 before. Obviously I’ve got a lot more syndication readers than people visiting the main page of the blog.

Help Puppy Pile!

Thursday, February 17th, 2005

Found this frustrated rant from The Puppy Pile today.

Her referrer spam situation seems way worse than mine, and she’s even dealing with fake search engine queries!

Someone tell her about mod_rewrite!

Oh, crap (excuse me). She says her blog has been added to an update list for reffy. No wonder she’s getting hammered.

Kate, you wanna give him a wedgie? You’ll have to go to Norway first, though.

UPDATE: One reason for lewd search engine queries in your logs, is if you’re slow to clean up comments and trackbacks. You see, the spammers might try searching for their own or competitors words, in an attempt to find poorly moderated blogs. Not saying that’s what happened to you Kate, just mentioning possible explanations.

The Norwegian spammer tries to cover his tracks

Thursday, February 17th, 2005

I found the Norwegian spammer looking at my blog yesterday. And what did he check out?

Heh, maybe he thought I wouldn’t see it if he checked out the Google cache of one of my posts?

But my, he just gave me more information…

His search term was:
reffy william indre

Which means he was looking for people talking about his little program, and also mentioning the name in the whois of some of his domains: William Indre.

Which, in spammer logic, probably means he’s changed the whois info on some earlier unprotected domains. His most central domains are whoisprotected.

Yep, one of his other domains now sports this info:

Quirin
Quirin Stocker (a.r.k.i.t.e.k.t at home.se)
+1.5555555555
Fax: +1.5555555555
Schoneggstrasse 11
Zurich, ZU 8004
CH

Can you say fake?

UPDATE: By the way M0nkey, you should check reffy in this tool. Looks to me like Google banned your domain!!!