Spam and the logbook
In an interesting and somewhat twisted endeavour, I am attempting to receive more spam. That's right -- I WANT MORE SPAM! Well, I obviously don't want it in my own inbox, but I want a good constant source of it to play with so that I can adjust our corporate spam filters.
One of the most common causes of excessive spam stems from the fact that people advertise their email addresses on their websites so that potential customers can easily find contact them. Spambots are programs that surf the internet much like search engine tools but the only information they save are the email addresses. They then use these email addresses to populate their email lists.
As I need source emails for my autotraining Bayesian filters, I decided to break my own rules regarding email addresses on website and leave a couple of addresses out in the open for spambots to harvest. I've chosen addresses and a domain that are brand new and not possibly tainted by any previous use on the internet to that I can be sure that any traffic that results is solely because of this logbook entry.
The first address will simply be logbook_simple@pleasedonotsend.com and this will be the one and only time that I enter that address anywhere. Any mail attracted to that address will be through spambots (or perhaps the occasional curious reader).
The second address will by logbook_reply@pleasedonetsend.com which will have an autoreply message tied to it that will automatically respond to any emails. This replicates the problem experienced by companies that utilize autoresponders on email accounts of employees who have left the company. They want to inform their clients that the user is no longer available but the spammers see the reply as a confirmation of a functioning email address.
When the opportunity arises for me to enter my email address into website forms, I will create a unique alias for pleasedonotsend.com and track any unrequested mail that I receive in return.
In the spirit of experimentation, I'll create 2 more entries that behave as above but with a different tld (three letter domain). What are the odds that spammers/spambots deal with market segmentation and concentrating their efforts on certain markets? Only one way to find out: logbook_simple@pleasedonotsend.ca and logbook_reply@pleasedonetsend.ca
I am aware that there are large scale projects on this topic (honeypot project etc.) but I don't put this experiment even in the same category as those. Those are mammoth projects that seek to put an end to spambots and the people that control them. I'm more curious about how one single logbook entry could balloon into a spam nightmare.
Once I have some data from this little experiment, I will post results in another logbook entry to show the differences between each of the addresses that I've used.
