Skip to content

Commit

Permalink
Merge pull request #583 from EgoAleSum/email-patch
Browse files Browse the repository at this point in the history
normalizeEmail improvements
  • Loading branch information
chriso authored Sep 27, 2016
2 parents 3266767 + e12cd97 commit 9c2a506
Show file tree
Hide file tree
Showing 6 changed files with 538 additions and 38 deletions.
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,18 @@ Passing anything other than a string is an error.
- **escape(input)** - replace `<`, `>`, `&`, `'`, `"` and `/` with HTML entities.
- **unescape(input)** - replaces HTML encoded entities with `<`, `>`, `&`, `'`, `"` and `/`.
- **ltrim(input [, chars])** - trim characters from the left-side of the input.
- **normalizeEmail(email [, options])** - canonicalize an email address. `options` is an object which defaults to `{ lowercase: true, remove_dots: true, remove_extension: true }`. With `lowercase` set to `true`, the local part of the email address is lowercased for all domains; the hostname is always lowercased and the local part of the email address is always lowercased for hosts that are known to be case-insensitive (currently only GMail). Normalization follows special rules for known providers: currently, GMail addresses have dots removed in the local part and are stripped of extensions (e.g. `[email protected]` becomes `[email protected]`) and all `@googlemail.com` addresses are normalized to `@gmail.com`.
- **normalizeEmail(email [, options])** - canonicalizes an email address. `options` is an object with the following keys and default values:
- *all_lowercase: true* - Transforms the local part (before the @ symbol) of all email addresses to lowercase. Please note that this may violate RFC 5321, which gives providers the possibility to treat the local part of email addresses in a case sensitive way (although in practice most - yet not all - providers don't). The domain part of the email address is always lowercased, as it's case insensitive per RFC 1035.
- *gmail_lowercase: true* - GMail addresses are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, GMail addresses are lowercased regardless of the value of this setting.
- *gmail_remove_dots: true*: Removes dots from the local part of the email address, as GMail ignores them (e.g. "john.doe" and "johndoe" are considered equal).
- *gmail_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
- *gmail_convert_googlemaildotcom: true*: Converts addresses with domain @googlemail.com to @gmail.com, as they're equivalent.
- *outlookdotcom_lowercase: true* - Outlook.com addresses (including Windows Live and Hotmail) are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, Outlook.com addresses are lowercased regardless of the value of this setting.
- *outlookdotcom_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
- *yahoo_lowercase: true* - Yahoo Mail addresses are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, Yahoo Mail addresses are lowercased regardless of the value of this setting.
- *yahoo_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "-" sign (e.g. "[email protected]" becomes "[email protected]").
- *icloud_lowercase: true* - iCloud addresses (including MobileMe) are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, iCloud addresses are lowercased regardless of the value of this setting.
- *icloud_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
- **rtrim(input [, chars])** - trim characters from the right-side of the input.
- **stripLow(input [, keep_new_lines])** - remove characters with a numerical value < 32 and 127, mostly control characters. If `keep_new_lines` is `true`, newline characters are preserved (`\n` and `\r`, hex `0xA` and `0xD`). Unicode-safe in JavaScript.
- **toBoolean(input [, strict])** - convert the input string to a boolean. Everything except for `'0'`, `'false'` and `''` returns `true`. In strict mode only `'1'` and `'true'` return `true`.
Expand Down
105 changes: 96 additions & 9 deletions lib/normalizeEmail.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,32 +16,119 @@ var _merge2 = _interopRequireDefault(_merge);
function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { default: obj }; }

var default_normalize_email_options = {
lowercase: true,
remove_dots: true,
remove_extension: true
// The following options apply to all email addresses
// Lowercases the local part of the email address.
// Please note this may violate RFC 5321 as per http://stackoverflow.com/a/9808332/192024).
// The domain is always lowercased, as per RFC 1035
all_lowercase: true,

// The following conversions are specific to GMail
// Lowercases the local part of the GMail address (known to be case-insensitive)
gmail_lowercase: true,
// Removes dots from the local part of the email address, as that's ignored by GMail
gmail_remove_dots: true,
// Removes the subaddress (e.g. "+foo") from the email address
gmail_remove_subaddress: true,
// Conversts the googlemail.com domain to gmail.com
gmail_convert_googlemaildotcom: true,

// The following conversions are specific to Outlook.com / Windows Live / Hotmail
// Lowercases the local part of the Outlook.com address (known to be case-insensitive)
outlookdotcom_lowercase: true,
// Removes the subaddress (e.g. "+foo") from the email address
outlookdotcom_remove_subaddress: true,

// The following conversions are specific to Yahoo
// Lowercases the local part of the Yahoo address (known to be case-insensitive)
yahoo_lowercase: true,
// Removes the subaddress (e.g. "-foo") from the email address
yahoo_remove_subaddress: true,

// The following conversions are specific to iCloud
// Lowercases the local part of the iCloud address (known to be case-insensitive)
icloud_lowercase: true,
// Removes the subaddress (e.g. "+foo") from the email address
icloud_remove_subaddress: true
};

// List of domains used by iCloud
var icloud_domains = ['icloud.com', 'me.com'];

// List of domains used by Outlook.com and its predecessors
// This list is likely incomplete.
// Partial reference:
// https://blogs.office.com/2013/04/17/outlook-com-gets-two-step-verification-sign-in-by-alias-and-new-international-domains/
var outlookdotcom_domains = ['hotmail.at', 'hotmail.be', 'hotmail.ca', 'hotmail.cl', 'hotmail.co.il', 'hotmail.co.nz', 'hotmail.co.th', 'hotmail.co.uk', 'hotmail.com', 'hotmail.com.ar', 'hotmail.com.au', 'hotmail.com.br', 'hotmail.com.gr', 'hotmail.com.mx', 'hotmail.com.pe', 'hotmail.com.tr', 'hotmail.com.vn', 'hotmail.cz', 'hotmail.de', 'hotmail.dk', 'hotmail.es', 'hotmail.fr', 'hotmail.hu', 'hotmail.id', 'hotmail.ie', 'hotmail.in', 'hotmail.it', 'hotmail.jp', 'hotmail.kr', 'hotmail.lv', 'hotmail.my', 'hotmail.ph', 'hotmail.pt', 'hotmail.sa', 'hotmail.sg', 'hotmail.sk', 'live.be', 'live.co.uk', 'live.com', 'live.com.ar', 'live.com.mx', 'live.de', 'live.es', 'live.eu', 'live.fr', 'live.it', 'live.nl', 'msn.com', 'outlook.at', 'outlook.be', 'outlook.cl', 'outlook.co.il', 'outlook.co.nz', 'outlook.co.th', 'outlook.com', 'outlook.com.ar', 'outlook.com.au', 'outlook.com.br', 'outlook.com.gr', 'outlook.com.pe', 'outlook.com.tr', 'outlook.com.vn', 'outlook.cz', 'outlook.de', 'outlook.dk', 'outlook.es', 'outlook.fr', 'outlook.hu', 'outlook.id', 'outlook.ie', 'outlook.in', 'outlook.it', 'outlook.jp', 'outlook.kr', 'outlook.lv', 'outlook.my', 'outlook.ph', 'outlook.pt', 'outlook.sa', 'outlook.sg', 'outlook.sk', 'passport.com'];

// List of domains used by Yahoo Mail
// This list is likely incomplete
var yahoo_domains = ['rocketmail.com', 'yahoo.ca', 'yahoo.co.uk', 'yahoo.com', 'yahoo.de', 'yahoo.fr', 'yahoo.in', 'yahoo.it', 'ymail.com'];

function normalizeEmail(email, options) {
options = (0, _merge2.default)(options, default_normalize_email_options);

if (!(0, _isEmail2.default)(email)) {
return false;
}
var parts = email.split('@', 2);

// The domain is always lowercased, as it's case-insensitive per RFC 1035
parts[1] = parts[1].toLowerCase();

if (parts[1] === 'gmail.com' || parts[1] === 'googlemail.com') {
if (options.remove_extension) {
// Address is GMail
if (options.gmail_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (options.remove_dots) {
if (options.gmail_remove_dots) {
parts[0] = parts[0].replace(/\./g, '');
}
if (!parts[0].length) {
return false;
}
parts[0] = parts[0].toLowerCase();
parts[1] = 'gmail.com';
} else if (options.lowercase) {
parts[0] = parts[0].toLowerCase();
if (options.all_lowercase || options.gmail_lowercase) {
parts[0] = parts[0].toLowerCase();
}
parts[1] = options.gmail_convert_googlemaildotcom ? 'gmail.com' : parts[1];
} else if (~icloud_domains.indexOf(parts[1])) {
// Address is iCloud
if (options.icloud_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.icloud_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else if (~outlookdotcom_domains.indexOf(parts[1])) {
// Address is Outlook.com
if (options.outlookdotcom_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.outlookdotcom_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else if (~yahoo_domains.indexOf(parts[1])) {
// Address is Yahoo
if (options.yahoo_remove_subaddress) {
var components = parts[0].split('-');
parts[0] = components.length > 1 ? components.slice(0, -1).join('-') : components[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.yahoo_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else {
// Any other address
if (options.all_lowercase) {
parts[0] = parts[0].toLowerCase();
}
}
return parts.join('@');
}
Expand Down
201 changes: 192 additions & 9 deletions src/lib/normalizeEmail.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,215 @@ import isEmail from './isEmail';
import merge from './util/merge';

const default_normalize_email_options = {
lowercase: true,
remove_dots: true,
remove_extension: true,
// The following options apply to all email addresses
// Lowercases the local part of the email address.
// Please note this may violate RFC 5321 as per http://stackoverflow.com/a/9808332/192024).
// The domain is always lowercased, as per RFC 1035
all_lowercase: true,

// The following conversions are specific to GMail
// Lowercases the local part of the GMail address (known to be case-insensitive)
gmail_lowercase: true,
// Removes dots from the local part of the email address, as that's ignored by GMail
gmail_remove_dots: true,
// Removes the subaddress (e.g. "+foo") from the email address
gmail_remove_subaddress: true,
// Conversts the googlemail.com domain to gmail.com
gmail_convert_googlemaildotcom: true,

// The following conversions are specific to Outlook.com / Windows Live / Hotmail
// Lowercases the local part of the Outlook.com address (known to be case-insensitive)
outlookdotcom_lowercase: true,
// Removes the subaddress (e.g. "+foo") from the email address
outlookdotcom_remove_subaddress: true,

// The following conversions are specific to Yahoo
// Lowercases the local part of the Yahoo address (known to be case-insensitive)
yahoo_lowercase: true,
// Removes the subaddress (e.g. "-foo") from the email address
yahoo_remove_subaddress: true,

// The following conversions are specific to iCloud
// Lowercases the local part of the iCloud address (known to be case-insensitive)
icloud_lowercase: true,
// Removes the subaddress (e.g. "+foo") from the email address
icloud_remove_subaddress: true,
};

// List of domains used by iCloud
const icloud_domains = [
'icloud.com',
'me.com',
];

// List of domains used by Outlook.com and its predecessors
// This list is likely incomplete.
// Partial reference:
// https://blogs.office.com/2013/04/17/outlook-com-gets-two-step-verification-sign-in-by-alias-and-new-international-domains/
const outlookdotcom_domains = [
'hotmail.at',
'hotmail.be',
'hotmail.ca',
'hotmail.cl',
'hotmail.co.il',
'hotmail.co.nz',
'hotmail.co.th',
'hotmail.co.uk',
'hotmail.com',
'hotmail.com.ar',
'hotmail.com.au',
'hotmail.com.br',
'hotmail.com.gr',
'hotmail.com.mx',
'hotmail.com.pe',
'hotmail.com.tr',
'hotmail.com.vn',
'hotmail.cz',
'hotmail.de',
'hotmail.dk',
'hotmail.es',
'hotmail.fr',
'hotmail.hu',
'hotmail.id',
'hotmail.ie',
'hotmail.in',
'hotmail.it',
'hotmail.jp',
'hotmail.kr',
'hotmail.lv',
'hotmail.my',
'hotmail.ph',
'hotmail.pt',
'hotmail.sa',
'hotmail.sg',
'hotmail.sk',
'live.be',
'live.co.uk',
'live.com',
'live.com.ar',
'live.com.mx',
'live.de',
'live.es',
'live.eu',
'live.fr',
'live.it',
'live.nl',
'msn.com',
'outlook.at',
'outlook.be',
'outlook.cl',
'outlook.co.il',
'outlook.co.nz',
'outlook.co.th',
'outlook.com',
'outlook.com.ar',
'outlook.com.au',
'outlook.com.br',
'outlook.com.gr',
'outlook.com.pe',
'outlook.com.tr',
'outlook.com.vn',
'outlook.cz',
'outlook.de',
'outlook.dk',
'outlook.es',
'outlook.fr',
'outlook.hu',
'outlook.id',
'outlook.ie',
'outlook.in',
'outlook.it',
'outlook.jp',
'outlook.kr',
'outlook.lv',
'outlook.my',
'outlook.ph',
'outlook.pt',
'outlook.sa',
'outlook.sg',
'outlook.sk',
'passport.com',
];

// List of domains used by Yahoo Mail
// This list is likely incomplete
const yahoo_domains = [
'rocketmail.com',
'yahoo.ca',
'yahoo.co.uk',
'yahoo.com',
'yahoo.de',
'yahoo.fr',
'yahoo.in',
'yahoo.it',
'ymail.com',
];

export default function normalizeEmail(email, options) {
options = merge(options, default_normalize_email_options);

if (!isEmail(email)) {
return false;
}
const parts = email.split('@', 2);

// The domain is always lowercased, as it's case-insensitive per RFC 1035
parts[1] = parts[1].toLowerCase();

if (parts[1] === 'gmail.com' || parts[1] === 'googlemail.com') {
if (options.remove_extension) {
// Address is GMail
if (options.gmail_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (options.remove_dots) {
if (options.gmail_remove_dots) {
parts[0] = parts[0].replace(/\./g, '');
}
if (!parts[0].length) {
return false;
}
parts[0] = parts[0].toLowerCase();
parts[1] = 'gmail.com';
} else if (options.lowercase) {
parts[0] = parts[0].toLowerCase();
if (options.all_lowercase || options.gmail_lowercase) {
parts[0] = parts[0].toLowerCase();
}
parts[1] = options.gmail_convert_googlemaildotcom ? 'gmail.com' : parts[1];
} else if (~icloud_domains.indexOf(parts[1])) {
// Address is iCloud
if (options.icloud_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.icloud_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else if (~outlookdotcom_domains.indexOf(parts[1])) {
// Address is Outlook.com
if (options.outlookdotcom_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.outlookdotcom_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else if (~yahoo_domains.indexOf(parts[1])) {
// Address is Yahoo
if (options.yahoo_remove_subaddress) {
let components = parts[0].split('-');
parts[0] = (components.length > 1) ? components.slice(0, -1).join('-') : components[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.yahoo_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else {
// Any other address
if (options.all_lowercase) {
parts[0] = parts[0].toLowerCase();
}
}
return parts.join('@');
}
Loading

0 comments on commit 9c2a506

Please sign in to comment.