Title : Sapient whitelisting Author : sapient.fridge (at) spamsights.org Draft : 1.1 (16th July 2003) Draft : 1.7 (5th August 2003) Draft : 1.8 (13th August 2003) Draft : 1.9 (12th October 2003) 1) What is the problem? 2) What is Sapient whitelisting? 3) Why does Sapient whitelisting help? 4) What properties should a good solutions have? 5) What are the problems with other solutions? 6) How will Sapient whitelisting work for incoming mail? 7) How will Sapient whitelisting work for outgoing mail? 8) Limits on keys 9) What will the user see? 10) Is it a good solution? 11) What problems will this cause spammers? 12) Possible spammers attacks 13) Technical questions to resolve 14) FAQ ------------------------------------------------------------------------ 1) What is the problem? The problem is simple, the problem is spam. Mail users around the world are deluged with garbage every day. Some users get hundreds of useless adverts in their E-mail each and every day of the year and it's rapidly increasing. E-mail addresses and complete domains are being abandoned in a vain attempt to control the spam influx. Originally it was thought that the problem would be solved by complaining to the spammer's ISP and asking them to throw the spammers off but unfortunately there are enough ISPs ignoring complaints because they either don't care or have understaffed abuse desks for it to ruin the net for everyone else. Entire countries appear to completely ignore the spam problem and allow "bulletproof" hosting of spammers. As time goes on the spammers mail more and more addresses but get fewer and fewer responses as the percentage of undeliverable addresses goes up and the number of people willing to buy things from spam goes down. One would think that this would stop them but all it appears to mean is that spammers simply send out more spam to make up the loss of responses. ------------------------------------------------------------------------ 2) What is Sapient whitelisting? The basic idea is to embed a short alphanumeric key into E-mail addresses. If a valid key is present then the E-mail is delivered as normal but if the key is missing or invalid then the mail will be challenged or rejected. The key is five characters long, delimited by a {} characters and is not considered part of the E-mail address for internal delivery purposes. So for example if my E-mail address was something like: sapient.fridge{7df3h}@spamsites.org it still gets delivered as if it were: sapient.fridge@spamsites.org The key will be generated randomly by software installed by the ISP and will be checked to ensure it hadn't been previously generated. A record of valid keys will be kept in a shared database at the ISP along with various information about that key such as when it was created and who has used it. The user will be able to control what keys were valid via a web page. Controls include if and when keys expire and what happens when a key fails e.g. whether to fall back to challenge response or simply reject the mail. The challenge response could be as simple as just sending out a token which would need returning to free the mail. The recipient of the challenge would just reply to the challenge to release their mail. Any E-mail address that the user sends E-mail to will (normally) be automatically whitelisted as will the E-mail address of anyone who used a valid key for incoming E-mail. Once an E-mail address is whitelisted keys will no longer be needed for incoming mail from that sender. Because the user has complete control over which keys are allowed and can turn off any key at any time it makes the E-mail address useless to spammers. If a keyed E-mail address started getting spam then all the user will have to do is to turn off that key and blacklist any undesired addresses that had used the key whilst it was active. The system allows users to reset their E-mail address once spammers start to abuse it. They can do this without having to go to all the trouble to changing their E-mail address and telling everyone who was using it what the new address is because mail from whitelisted addresses gets through even it's send to an address with and old key. All the filtering is done by the ISP using rules set up by the user via a web page so no client software is needed. Each user would have a different set of personal rules that they would configure via the web page after logging into it. ------------------------------------------------------------------------ 3) Why does Sapient whitelisting help? Imagine you could change your E-mail address every time spammers started mis-using it. That would solve your spam problem wouldn't it? The problem is that it would mean that all the people you wanted to communicate with would no longer be able to contact you at the old address. Sapient whitelisting solves this problem by giving you a new E-mail address whenever you want but it also remembers who you communicated with in the past so they can continue to use the old addresses. The only people who suffer are the spammers and the E-mail list sellers. ------------------------------------------------------------------------ 4) What properties should a good solutions have? A good solution would: a) Make it hard for spammers to E-mail users b) Allow strangers to E-mail users c) Not require changes to global software or protocols d) Be usable by businesses who need to have an openly listed contacts e) Be easy for users to use f) Work at a low level so users don't have to change their software g) Genuine opt-in mailing lists should not be overly burdened h) It should be hard to sell on a working E-mail address i) The user should be in control of their E-mail addresses j) Should not discard genuine E-mail The problem of course is that a) and b) seem to contradict each other but if one looks at it more closely there is a difference. The genuine stranger is likely to have found your address recently on a web site whereas the spammer is likely to have picked up your mail address from a CD bought from another spammer. The address is likely to be relatively old. The other difference is that in the worst case the stranger is likely to put up with some inconvenience such as a challenge response but a spammer won't. These differences can be exploited. ------------------------------------------------------------------------ 5) What are the problems with other solutions? a) Whitelisting / Blacklisting Whitelisting is when users only accept E-mail from certain addresses. This is fine if you have a tight group of users who want to E-mail each other and want no E-mail from anyone else but the whitelists need continual tweaking and strangers can't send E-mail to the user. It's useless for businesses who need to have open contact addresses. Blacklisting is where mail from individuals is blocked by their E-mail address. This doesn't work because the spammers use random forged E-mail addresses as the sender of their junk. See: http://pubpages.unh.edu/notes/whitelist-blacklist-filter.html b) Greylisting This is where mail is delayed by the MTA (Mail Transport Agent) first saying it is too busy it then allowing the E-mail through at the next attempt. It is a very nice idea (and influenced ideas in this document) but slows down the E-mail and spammers could work around it by either hijacking legitimate mail servers or writing spamware which made the second attempt. See: http://projects.puremagic.com/greylisting/ c) Blocklisting Blocklisting is probably the most successful anti-spam mechanism so far. This is when an ISP uses a list of IP addresses of spammers provided by an external agency and rejects E-mail from those addresses. Blocklists are working well in that they put pressure on the ISP's to get rid of their spammers but there is too much collateral damage and many ISP's won't use them in fear of accidentally discarding wanted E-mail. Another problem is that blocklists are reliant on the people building them (who are often anonymous). They are hard to maintain and the blocklists themselves are under continual attack from spammers trying to close them down. See: http://www.spews.org/ d) Client side filters Filters (such as Bayesian) are fine for individual users but are CPU intensive because they analyse the entire E-mail contents and have a fairly high risk of occasionally flagging genuine E-mail. This is especially true as people tighten the filters up to try to catch spammers putting garbage into their mail to try to get past the filter. See: http://www.paulgraham.com/spam.html Even worse are keyword filters which spot individual works to block mail on. There are people who really do want to talk about mortgages and viagra. e) Challenge response This is where mail is accepted but held and a confirmation token is automatically sent out in reply if the sender isn't whitelisted. The mail is only released when the token is sent back in again. This works well from the users point of view but it can annoy the person who gets the challenge, especially if the user had specifically asked for a response e.g. in a request for help in a newsgroup. Also challenge responses don't work well for large mailing lists where the administrator has to find and respond to all the challenges. This would be impossible for a large list. See: http://cnews.canoe.ca/CNEWS/TechNews/2003/06/08/106782-ap.html f) Time-limited E-mail addresses Users who have control of their E-mail addresses can set up addresses which only work for a set amount of time then expire. These work well for uses such as newsgroup postings but aren't any use for giving to family and friends nor are they of use for signing up to mailing lists. Anyone using them would have to be notified each time a new address were generated and the old address abandoned. See: http://www.spamgourmet.com/ g) Tagged E-mail addresses E-mail addresses are generated which contain a unique section which identifies who the addresses was given to. This solves the problem of tracking down who sold the address but doesn't solve the problem of incoming spam. See: http://www.spammotel.com/ http://sneakemail.com/ ------------------------------------------------------------------------ 6) How will Sapient whitelisting work for incoming mail? The most important part of the system will be the engine that generates the keys and tracks them in the database along with the whitelisting of E-mail addresses that passed the tests. All information will be keyed by the users real (keyless) E-mail address. Whenever an E-mail connection comes into the system has 3 pieces of information that can be gleaned from the connection before the main data section of the E-mail arrives: 1) The IP address of the sender 2) The E-mail address of the sender (easily forged) 3) The E-mail address it is being sent to. The engine will first check whether the mail is a valid reply to a challenge response (identified by the embedded key). If so then it will release the original E-mail and whitelist the sender. Then it will check whether the IP was within any white/blacklisted ranges. If it's whitelisted then it lets the E-mail go through (any embedded key will be stripped off). If the IP is blacklisted then the mail will be rejected. If the IP isn't whitelisted then the engine will then check whether the sender E-mail address is white/blacklisted. If it is whitelisted then again the mail is let through (minus key). If the address is blacklisted then again the mail is rejected. The E-mail address the E-mail is being sent to is then checked to see if it has a key embedded in it. If it does and the key is valid then the mail is let through (minus key) and the sender is whitelisted if it looks like a valid address e.g. has an @ in the middle and some other simple checks. Note that for white/blacklist storage and comparisons any keys in any E-mail addresses are ignored as they are not considered part of the E-mail address. When E-mail is sent on from the engine to the internal E-mail system the keys must be stripped off so the rest of the E-mail system can deal with it as normal. Finally if the E-mail fails the above tests then there are several options: a) The mail could be rejected b) The mail could be greylisted where the MTA (Mail Transport Agent) says it is busy, then accepts the mail at the next attempt. c) There could be a challenge response. The E-mail will be stored (for a period of time) and a reply sent on with a code that needs to be sent back to release the E-mail. This reply will have a one-shot timed key in the sender E-mail address which can be used to identify the stored E-mail when the reply comes back. When the reply arrives then the E-mail will be released. Note that the system will have to watch for and ignore replies from empty senders (indicating that the challenge bounced.) Because the system whitelists addresses whenever it can it should mean that it's very little inconvenience to normal users but a nightmare for spammers since they rely on being able to send massive amounts of E-mail without any effort to get it through. ------------------------------------------------------------------------ 7) How will Sapient whitelisting work for outgoing mail? When an E-mail is sent from the system the address that the mail is being sent to will automatically be whitelisted. This means that replies from that person will not generate a challenge. In addition to this a one-shot, time limited key will be generated and embedded in the E-mail address being used as the sender/return address. This means that bounces will come back through the system without being blocked (because the bounce message will have a valid key.) This system also has the advantage that if the recipient forwarded the mail to another address (say from home to work) then they will still be able to reply to it without being challenged because the reply address they would use will still have a key embedded in it. Optional: The outgoing system could also check to see if the E-mail being sent by the user has a key already in it. This will be picked up and registered as a valid key. This allows the client software to generate keys and even allows the user to "invent" their own keys. If this idea is extended and the keys are allowed to be any length then it allows the user to in effect add passwords to their E-mail addresses in the form of keys. ------------------------------------------------------------------------ 8) Limits on keys A key is simply five alphanumeric characters within {} characters in the local part of the E-mail address. To make them easy to type in by hand they are simply the lowercase characters a-z and numbers 0-9. This gives a total of 60,466,176 different possible keys. Lowercase was chosen because some mail systems lowercase the E-mail addresses automatically. Five characters was chosen because it gives enough combinations to make it hard to brute force but is small enough to be typed in reliably. Actually any size of key would work and there is no reason why different keys couldn't be different sizes. There are two kinds of limits that can be put onto keys: a) Counted keys. These only work a certain number of times then are de-activated. Typically they will actually be one-shot keys which will be used for bounces, for signing up for mailing lists or registering as contact addresses for goods ordered over the internet. In the case of a one-shot key only one E-mail sender can use the key (once they have done so they will then be whitelisted, so future E-mails from them will continue to work.) b) Time limited keys. These will allow any number of E-mails to be sent but only up to a certain date/time. Any sender who sends E-mail during that time will be whitelisted. It will also be possible to combine both of these so for example you could set up one-shot, time-limited keys. Keys without limits are ones you would use for E-mail addresses on web sites etc. When the spam started arriving the user could create a new key and disable (or time limit) the old one the spam were being sent to. Any users who previously sent E-mail using the key will be whitelisted so they could continue to send mail in but no new incoming E-mail could use that key without being challenged. When a key is disabled the user will have control over what happens to E-mail sent using that key i.e. rejected, delayed (greylisted) or challenge response. Optional: A limit could specify how the key is disabled when it runs out e.g. one key might be time limited for 3 month then start rejecting E-mail from new senders and another could be a one shot key that turns into a challenge response after it's used. ------------------------------------------------------------------------ 9) What will the user see? The most important bit about the system is that the user gets to keep their E-mail address, they no longer have to abandon it each time the spam load gets too high. To do this they have to have a control panel of some kind to configure their E-mail addresses and keys. The best way to do this will probably be a web page. The web page code should store the key data in the same database that the mail system accesses when processing the keys. It will probably be accessed by a set URL that each ISP provides for their users. The user will log-in using a user name and password for security. The web page will: a) Allow the creation of multiple new E-mail addresses containing keys. b) Control limits on keys created (one-shot, timed) and what happens when the limits are hit. c) Allow keys to be disabled, re-activated or have their type changed. d) Control whether addresses are automatically whitelisted or not. e) Control the white/blacklists allowing addresses to added/removed. f) Track who was using each key i.e. who is white/blacklisted and when If enhancements can be made to the client software then functions can be added to bring up the control page and possibly to generate a new key and (and E-mail address) for the user at the click of a button. ------------------------------------------------------------------------ 10) Is it a good solution? Above it was stated that a good solution would: a) Make it hard for spammers to E-mail users Without a key a spammer will find it very difficult to mail any user because answering the challenge will take too much effort. Although not impossible for a spammer to do so it slow them down to the point where they couldn't send enough mail to make it worthwhile. b) Allow strangers to E-mail users If a stranger wants to mail a user they will either have to have a key (typically within an E-mail address from a web page or recent newsgroup posting) or answer a challenge response. If the key they use fails (because it has hit it's limit) they can still use the challenge response system to get the mail through but a spammer won't go to that trouble. c) Not require changes to global software or protocols Sapient whitelisting will only need to be implemented on the receiving ISP mailing system. Because it uses normal E-mail addresses no global changes are needed to E-mail protocols but any ISP who uses the system would benefit by being able to offer their users E-mail addresses that are completely under the users control. d) Be usable by businesses who need to have a openly listed contacts A keyed E-mail address could be put on a web site. When it started to get too much spam it could be turned into a time limited address (e.g. another month or so) and be replaced by a new keyed address on the web page. This gives a break in the E-mail address which causes problems for the spammers and anyone they are selling the addresses to but won't affect normal users trying to get in touch. e) Be easy for users to use The default mode for the system will simply be a challenge response system with automatically time limited, one shot generated keys embedded into outgoing mail. This means that most users won't notice the difference though people mailing them will get challenged if they E-mail the user before the user E-mails them. The system to generate and control keys for more advanced use such as generating a keyed address to put on a web site (so incoming mail to it won't be challenged) will have to be slightly more complex. A web page will be needed to generate new keys and to control old keys. Hopefully these pages will be made simple enough for average users to understand how to use them. f) Work at a low level so users don't have to change their software The system will work at the SMTP sending and receiving level at the ISP the user gets their mail from so no changes to their software should be needed. Almost all users have web browsers nowadays that can be used to access the configuration pages. g) Genuine opt-in mailing lists should not be overly burdened If a time or usage limited keyed E-mail address is given as an opt-in address to a mailing list then it will work fine. It just won't work for anyone else if they decide to try to sell the address at a later date. A two-shot E-mail address may be needed if the mailing list sends a confirmation request from a different address to that which the normal mailing list traffic would come from. This stops challenge responses from going to the mailing list and annoying the other recipients. h) It should be hard to sell on a working E-mail address Because the key is embedded into the E-mail address if the user turns the key off then that E-mail address will not work for anyone who hasn't already been whitelisted. It becomes next to useless for spamming. Also if a list owner realises that they can be tracked down as the source of the leaked address then they may be less willing to sell them. i) The user should be in control of their E-mail addresses A combination of the above points means that the user can control who is using their E-mail addresses and what for. If an address gets too much spam then it can be rescued by simply changing the key. The system also has the side benefit that if a different key were used for signing up for different things it will be very obvious who was selling the addresses on to "trusted partners" without permission. j) Should not discard genuine E-mail Since the default fallback of the system is challenge response then E-mail will never actually be lost. The user can set the response to reject the mail but that is the users responsibility and should only be set for a badly polluted key. ------------------------------------------------------------------------ 11) What problems will this cause spammers? Since E-mail address will be completely under the control of the person who owned them there would be little that spammers could do to abuse the address as user will simply disable a key as soon as it started to be spammed. This then makes the address useless to the spammer and anyone they sell the address to. Buying keyed addresses would be pointless because the addresses will become obsolete far more rapidly than normal addresses as the spam victims will simply turn off the keys that are being spammed. On top of this it means that the spammers lists will start clogging up with useless E-mail addresses because they will probably keep adding in the same address with different (but disabled) keys. If enough people used Sapient whitelisting it would be a disaster for spammers but any individual or ISP using it should see the benefit straight away. ------------------------------------------------------------------------ 12) Possible spammers attacks a) Brute force guessing of keys This probably won't be very effective because spammers rely on volume to get their responses. Given that for a five character key there are over sixty million possible combinations for each user the amount of mail they would have to send out to get the same response as for the current system is multiplied by that. Even if each user had 100 active keys it would still take around 600,000 E-mails per person on average to get an E-mail through. The system should spot this happening in case spammers tried to use the challenges for a DOS attack. b) Automated challenge responses It would be possible to build an E-mail system to answer the challenges by simply replying to them. That may be possible but would take a fair amount of effort (something that spammers appear to avoid) and would rely on the return address being valid and staying up. Of course if the spammer send out enough mail and attempted to receive the responses then it would be a self mailbomb and may bring their server down. c) Guessing whitelisted addresses Theoretically it would be possible to send spam to someone by guessing what addresses they have whitelisted but given the number of E-mail addresses out there that would be even harder to do that brute forcing it. The two exceptions are the users own address (users commonly E-mail themselves) and mailing list addresses i.e. everyone on a single mailing list will have whitelisted the sending address of that mailing list. The user mailing themselves problem can be solved by either whitelisting the IP addresses of the local network (which would be done by the hosting ISP, rather than by the user) or by the system spotting that the user was mailing themselves and adding a key on the fly as the outgoing mail was processed. The mailing list problem is little more tricky but unlikely to be a problem since the spammer would have to get hold of the mailing list subscriber before being able to send the mail out forging the from line to be the same as that which the mailing list would use. Too much work for a spammer. d) Harvesting current valid addresses It would still be possible to harvest fresh addresses from newsgroups and web pages but spammers don't do that often because it takes time and effort to do it and they don't collect enough addresses that way. Instead they buy E-mail lists from spam support sites and other spammers. As the spam recipients realised keys were being mis-used they will simply generate new keys and disable the old ones thus increasing the number of undeliverables in the address list far faster than normal. If a valid key were harvested the user would have to disable any keys being mis-used by spammers and clear out any accidentally whitelisted spammer addresses but at least it means they can re-claim their E-mail address without having to change it completely (along with all the hassle that causes.) ------------------------------------------------------------------------ 13) Technical questions to resolve There are few points that I'm not sure about, these will probably become clear during implementation: a) Bounces may be a problem if the MTA at the receiving end uses the "From:" header in the mail rather then the "MAIL-FROM:" one in the SMTP envelope. This could be fixed by re-writing the headers replacing the "From:" address with one with a key. Messy but may be unavoidable if bounces are to work properly. b) Preferably to reduce bandwidth, hooks are needed during the SMTP negotiation phase, before the data section starts arriving. Later hooks would also be needed to cope with the challenge response phase. At them moment I'm not sure what hooks are available in the current mail systems. c) Are {} suitable characters for the delimiter? They are not commonly seen in E-mail addresses but this may be because they are not actually a valid character for all E-mail system. d) Is the idea of allowing the client to generate a key which is then picked up by the system as the E-mail is sent out a good one? The advantage of this is that it would mean that the user could simply press a button to generate a new E-mail address and the first time it was used it would "self-register". The disadvantage is that there would be no way easy way for the client to check that the new key was unique. e) Not all systems recieve their mail from the same mail servers that the send the mail and their may be multiple mails servers receiving the mail. That means that they are going to have to talk to the central database somehow. External MX's are not going to be able to do this but they should deliver to one of the internal servers eventually (so the key checking can be done there). It's not clear whether this causes a problem or not. ------------------------------------------------------------------------ 14) FAQ This section is for the answers to specific questions that people have asked when making comments. a) What about businesses that can't change E-mail addresses often? b) It's not very secure is it? c) Won't it be too complex for normal users? d) Won't it slow the E-mail system down? e) How can you retrofit it? f) If I only have one E-mail address, how can I have many keys? g) How can I use it to track who is selling my E-mail address? h) How can a genuine user of an old key still E-mail me unchallenged? i) Isn't there a patent on challenge responses? Answers: a) What about businesses that can't change E-mail addresses often? Commonly businesses print out their addresses on business cards etc. so can't change the keys in their E-mail addresses very often. This isn't really a problem because E-mail addresses containing new keys could simply be generated whenever the literature containing them is printed (typically every year or so). The old keys could be time limited at that point so would continue to work for some period afterwards. In addition to that business cards are probably not a very common source of addresses for spammers so if the cards were given a key which wasn't used anywhere else it would probably take a long time for the amount of spam to build up to annoying levels. b) It's not very secure is it? It doesn't have to be secure because spammers send out so much E-mail that even a small barrier on each will cause them real problems. c) Won't it be too complex for normal users? The system defaults can default to either doing nothing or to a challenge response system. Users only need to get involved with generating keys etc. if the spam load becomes too high and they want to do something about it. They can gradually start using the features without having to learn a lot up front. Hopefully the system is clear enough to be fairly easily understood. d) Won't it slow the E-mail system down? The key processing should take very little time to do. The database access is probably the slowest part but given the speed of modern computers and databases this shouldn't be a problem. Certainly a lot easier to do than trying to filter on the contents of the E-mail. e) How can you retrofit it? The system is designed so it can be introduced to an ISP without having to change how the internal E-mail system works nor how other ISPs communicate with it. It should be possible to implement Sapient whitelisting so that it's a wrapper around an existing E-mail system leaving the internal system unchanged. f) If I only have one E-mail address, how can I have many keys? The keys are embedded in the E-mail addresses. From the point of view of the external E-mail system they are part of the address but from the point of view of the internal E-mail system they aren't. The keys are stripped of before being passed on to the internal system. Users will still only have one E-mail address but to anyone outside it looks as if they have many addresses (each containing a different key). g) How can I use it to track who is selling my E-mail address? If you give an E-mail address containing a different key to each entity then anyone else using that key must have got it from the entity you gave the key to. That means you can track who sold your address on. Obviously this only works for keys given out one at a time so you wouldn't be able to track the users of a key embedded in an E-mail address on a web page or in a newsgroup article. h) How can a genuine user of an old key still E-mail me unchallenged? Once a user is whitelisted they no longer need a key to send E-mail in. When the E-mail comes in from them the key part is stripped off and ignored. It doesn't matter if they continue to use an old key or not. i) Isn't there a patent on challenge responses? There appears to be patents covering challenge responses which require the sender to answer questions to get the mail through. The simple challenge response described here simply requires a token to be returned in the same way that mailing list confirmations are done and is not patented as far as is known. See: http://zdnet.com.com/2100-1105_2-1016250.html