Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalizeEmail improvements #583

Merged
merged 5 commits into from
Sep 27, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,18 @@ Passing anything other than a string is an error.
- **escape(input)** - replace `<`, `>`, `&`, `'`, `"` and `/` with HTML entities.
- **unescape(input)** - replaces HTML encoded entities with `<`, `>`, `&`, `'`, `"` and `/`.
- **ltrim(input [, chars])** - trim characters from the left-side of the input.
- **normalizeEmail(email [, options])** - canonicalize an email address. `options` is an object which defaults to `{ lowercase: true, remove_dots: true, remove_extension: true }`. With `lowercase` set to `true`, the local part of the email address is lowercased for all domains; the hostname is always lowercased and the local part of the email address is always lowercased for hosts that are known to be case-insensitive (currently only GMail). Normalization follows special rules for known providers: currently, GMail addresses have dots removed in the local part and are stripped of extensions (e.g. `[email protected]` becomes `[email protected]`) and all `@googlemail.com` addresses are normalized to `@gmail.com`.
- **normalizeEmail(email [, options])** - canonicalizes an email address. `options` is an object with the following keys and default values:
- *all_lowercase: true* - Transforms the local part (before the @ symbol) of all email addresses to lowercase. Please note that this may violate RFC 5321, which gives providers the possibility to treat the local part of email addresses in a case sensitive way (although in practice most - yet not all - providers don't). The domain part of the email address is always lowercased, as it's case insensitive per RFC 1035.
- *gmail_lowercase: true* - GMail addresses are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, GMail addresses are lowercased regardless of the value of this setting.
- *gmail_remove_dots: true*: Removes dots from the local part of the email address, as GMail ignores them (e.g. "john.doe" and "johndoe" are considered equal).
- *gmail_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
- *gmail_convert_googlemaildotcom: true*: Converts addresses with domain @googlemail.com to @gmail.com, as they're equivalent.
- *outlookdotcom_lowercase: true* - Outlook.com addresses (including Windows Live and Hotmail) are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, Outlook.com addresses are lowercased regardless of the value of this setting.
- *outlookdotcom_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
- *yahoo_lowercase: true* - Yahoo Mail addresses are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, Yahoo Mail addresses are lowercased regardless of the value of this setting.
- *yahoo_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "-" sign (e.g. "[email protected]" becomes "[email protected]").
- *icloud_lowercase: true* - iCloud addresses (including MobileMe) are known to be case-insensitive, so this switch allows lowercasing them even when *all_lowercase* is set to false. Please note that when *all_lowercase* is true, iCloud addresses are lowercased regardless of the value of this setting.
- *icloud_remove_subaddress: true*: Normalizes addresses by removing "sub-addresses", which is the part following a "+" sign (e.g. "[email protected]" becomes "[email protected]").
- **rtrim(input [, chars])** - trim characters from the right-side of the input.
- **stripLow(input [, keep_new_lines])** - remove characters with a numerical value < 32 and 127, mostly control characters. If `keep_new_lines` is `true`, newline characters are preserved (`\n` and `\r`, hex `0xA` and `0xD`). Unicode-safe in JavaScript.
- **toBoolean(input [, strict])** - convert the input string to a boolean. Everything except for `'0'`, `'false'` and `''` returns `true`. In strict mode only `'1'` and `'true'` return `true`.
Expand Down
105 changes: 96 additions & 9 deletions lib/normalizeEmail.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,32 +16,119 @@ var _merge2 = _interopRequireDefault(_merge);
function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { default: obj }; }

var default_normalize_email_options = {
lowercase: true,
remove_dots: true,
remove_extension: true
// The following options apply to all email addresses
// Lowercases the local part of the email address.
// Please note this may violate RFC 5321 as per http://stackoverflow.com/a/9808332/192024).
// The domain is always lowercased, as per RFC 1035
all_lowercase: true,

// The following conversions are specific to GMail
// Lowercases the local part of the GMail address (known to be case-insensitive)
gmail_lowercase: true,
// Removes dots from the local part of the email address, as that's ignored by GMail
gmail_remove_dots: true,
// Removes the subaddress (e.g. "+foo") from the email address
gmail_remove_subaddress: true,
// Conversts the googlemail.com domain to gmail.com
gmail_convert_googlemaildotcom: true,

// The following conversions are specific to Outlook.com / Windows Live / Hotmail
// Lowercases the local part of the Outlook.com address (known to be case-insensitive)
outlookdotcom_lowercase: true,
// Removes the subaddress (e.g. "+foo") from the email address
outlookdotcom_remove_subaddress: true,

// The following conversions are specific to Yahoo
// Lowercases the local part of the Yahoo address (known to be case-insensitive)
yahoo_lowercase: true,
// Removes the subaddress (e.g. "-foo") from the email address
yahoo_remove_subaddress: true,

// The following conversions are specific to iCloud
// Lowercases the local part of the iCloud address (known to be case-insensitive)
icloud_lowercase: true,
// Removes the subaddress (e.g. "+foo") from the email address
icloud_remove_subaddress: true
};

// List of domains used by iCloud
var icloud_domains = ['icloud.com', 'me.com'];

// List of domains used by Outlook.com and its predecessors
// This list is likely incomplete.
// Partial reference:
// https://blogs.office.com/2013/04/17/outlook-com-gets-two-step-verification-sign-in-by-alias-and-new-international-domains/
var outlookdotcom_domains = ['hotmail.at', 'hotmail.be', 'hotmail.ca', 'hotmail.cl', 'hotmail.co.il', 'hotmail.co.nz', 'hotmail.co.th', 'hotmail.co.uk', 'hotmail.com', 'hotmail.com.ar', 'hotmail.com.au', 'hotmail.com.br', 'hotmail.com.gr', 'hotmail.com.mx', 'hotmail.com.pe', 'hotmail.com.tr', 'hotmail.com.vn', 'hotmail.cz', 'hotmail.de', 'hotmail.dk', 'hotmail.es', 'hotmail.fr', 'hotmail.hu', 'hotmail.id', 'hotmail.ie', 'hotmail.in', 'hotmail.it', 'hotmail.jp', 'hotmail.kr', 'hotmail.lv', 'hotmail.my', 'hotmail.ph', 'hotmail.pt', 'hotmail.sa', 'hotmail.sg', 'hotmail.sk', 'live.be', 'live.co.uk', 'live.com', 'live.com.ar', 'live.com.mx', 'live.de', 'live.es', 'live.eu', 'live.fr', 'live.it', 'live.nl', 'msn.com', 'outlook.at', 'outlook.be', 'outlook.cl', 'outlook.co.il', 'outlook.co.nz', 'outlook.co.th', 'outlook.com', 'outlook.com.ar', 'outlook.com.au', 'outlook.com.br', 'outlook.com.gr', 'outlook.com.pe', 'outlook.com.tr', 'outlook.com.vn', 'outlook.cz', 'outlook.de', 'outlook.dk', 'outlook.es', 'outlook.fr', 'outlook.hu', 'outlook.id', 'outlook.ie', 'outlook.in', 'outlook.it', 'outlook.jp', 'outlook.kr', 'outlook.lv', 'outlook.my', 'outlook.ph', 'outlook.pt', 'outlook.sa', 'outlook.sg', 'outlook.sk', 'passport.com'];

// List of domains used by Yahoo Mail
// This list is likely incomplete
var yahoo_domains = ['rocketmail.com', 'yahoo.ca', 'yahoo.co.uk', 'yahoo.com', 'yahoo.de', 'yahoo.fr', 'yahoo.in', 'yahoo.it', 'ymail.com'];

function normalizeEmail(email, options) {
options = (0, _merge2.default)(options, default_normalize_email_options);

if (!(0, _isEmail2.default)(email)) {
return false;
}
var parts = email.split('@', 2);

// The domain is always lowercased, as it's case-insensitive per RFC 1035
parts[1] = parts[1].toLowerCase();

if (parts[1] === 'gmail.com' || parts[1] === 'googlemail.com') {
if (options.remove_extension) {
// Address is GMail
if (options.gmail_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (options.remove_dots) {
if (options.gmail_remove_dots) {
parts[0] = parts[0].replace(/\./g, '');
}
if (!parts[0].length) {
return false;
}
parts[0] = parts[0].toLowerCase();
parts[1] = 'gmail.com';
} else if (options.lowercase) {
parts[0] = parts[0].toLowerCase();
if (options.all_lowercase || options.gmail_lowercase) {
parts[0] = parts[0].toLowerCase();
}
parts[1] = options.gmail_convert_googlemaildotcom ? 'gmail.com' : parts[1];
} else if (~icloud_domains.indexOf(parts[1])) {
// Address is iCloud
if (options.icloud_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.icloud_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else if (~outlookdotcom_domains.indexOf(parts[1])) {
// Address is Outlook.com
if (options.outlookdotcom_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.outlookdotcom_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else if (~yahoo_domains.indexOf(parts[1])) {
// Address is Yahoo
if (options.yahoo_remove_subaddress) {
var components = parts[0].split('-');
parts[0] = components.length > 1 ? components.slice(0, -1).join('-') : components[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.yahoo_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else {
// Any other address
if (options.all_lowercase) {
parts[0] = parts[0].toLowerCase();
}
}
return parts.join('@');
}
Expand Down
201 changes: 192 additions & 9 deletions src/lib/normalizeEmail.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,215 @@ import isEmail from './isEmail';
import merge from './util/merge';

const default_normalize_email_options = {
lowercase: true,
remove_dots: true,
remove_extension: true,
// The following options apply to all email addresses
// Lowercases the local part of the email address.
// Please note this may violate RFC 5321 as per http://stackoverflow.com/a/9808332/192024).
// The domain is always lowercased, as per RFC 1035
all_lowercase: true,

// The following conversions are specific to GMail
// Lowercases the local part of the GMail address (known to be case-insensitive)
gmail_lowercase: true,
// Removes dots from the local part of the email address, as that's ignored by GMail
gmail_remove_dots: true,
// Removes the subaddress (e.g. "+foo") from the email address
gmail_remove_subaddress: true,
// Conversts the googlemail.com domain to gmail.com
gmail_convert_googlemaildotcom: true,

// The following conversions are specific to Outlook.com / Windows Live / Hotmail
// Lowercases the local part of the Outlook.com address (known to be case-insensitive)
outlookdotcom_lowercase: true,
// Removes the subaddress (e.g. "+foo") from the email address
outlookdotcom_remove_subaddress: true,

// The following conversions are specific to Yahoo
// Lowercases the local part of the Yahoo address (known to be case-insensitive)
yahoo_lowercase: true,
// Removes the subaddress (e.g. "-foo") from the email address
yahoo_remove_subaddress: true,

// The following conversions are specific to iCloud
// Lowercases the local part of the iCloud address (known to be case-insensitive)
icloud_lowercase: true,
// Removes the subaddress (e.g. "+foo") from the email address
icloud_remove_subaddress: true,
};

// List of domains used by iCloud
const icloud_domains = [
'icloud.com',
'me.com',
];

// List of domains used by Outlook.com and its predecessors
// This list is likely incomplete.
// Partial reference:
// https://blogs.office.com/2013/04/17/outlook-com-gets-two-step-verification-sign-in-by-alias-and-new-international-domains/
const outlookdotcom_domains = [
'hotmail.at',
'hotmail.be',
'hotmail.ca',
'hotmail.cl',
'hotmail.co.il',
'hotmail.co.nz',
'hotmail.co.th',
'hotmail.co.uk',
'hotmail.com',
'hotmail.com.ar',
'hotmail.com.au',
'hotmail.com.br',
'hotmail.com.gr',
'hotmail.com.mx',
'hotmail.com.pe',
'hotmail.com.tr',
'hotmail.com.vn',
'hotmail.cz',
'hotmail.de',
'hotmail.dk',
'hotmail.es',
'hotmail.fr',
'hotmail.hu',
'hotmail.id',
'hotmail.ie',
'hotmail.in',
'hotmail.it',
'hotmail.jp',
'hotmail.kr',
'hotmail.lv',
'hotmail.my',
'hotmail.ph',
'hotmail.pt',
'hotmail.sa',
'hotmail.sg',
'hotmail.sk',
'live.be',
'live.co.uk',
'live.com',
'live.com.ar',
'live.com.mx',
'live.de',
'live.es',
'live.eu',
'live.fr',
'live.it',
'live.nl',
'msn.com',
'outlook.at',
'outlook.be',
'outlook.cl',
'outlook.co.il',
'outlook.co.nz',
'outlook.co.th',
'outlook.com',
'outlook.com.ar',
'outlook.com.au',
'outlook.com.br',
'outlook.com.gr',
'outlook.com.pe',
'outlook.com.tr',
'outlook.com.vn',
'outlook.cz',
'outlook.de',
'outlook.dk',
'outlook.es',
'outlook.fr',
'outlook.hu',
'outlook.id',
'outlook.ie',
'outlook.in',
'outlook.it',
'outlook.jp',
'outlook.kr',
'outlook.lv',
'outlook.my',
'outlook.ph',
'outlook.pt',
'outlook.sa',
'outlook.sg',
'outlook.sk',
'passport.com',
];

// List of domains used by Yahoo Mail
// This list is likely incomplete
const yahoo_domains = [
'rocketmail.com',
'yahoo.ca',
'yahoo.co.uk',
'yahoo.com',
'yahoo.de',
'yahoo.fr',
'yahoo.in',
'yahoo.it',
'ymail.com',
];

export default function normalizeEmail(email, options) {
options = merge(options, default_normalize_email_options);

if (!isEmail(email)) {
return false;
}
const parts = email.split('@', 2);

// The domain is always lowercased, as it's case-insensitive per RFC 1035
parts[1] = parts[1].toLowerCase();

if (parts[1] === 'gmail.com' || parts[1] === 'googlemail.com') {
if (options.remove_extension) {
// Address is GMail
if (options.gmail_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (options.remove_dots) {
if (options.gmail_remove_dots) {
parts[0] = parts[0].replace(/\./g, '');
}
if (!parts[0].length) {
return false;
}
parts[0] = parts[0].toLowerCase();
parts[1] = 'gmail.com';
} else if (options.lowercase) {
parts[0] = parts[0].toLowerCase();
if (options.all_lowercase || options.gmail_lowercase) {
parts[0] = parts[0].toLowerCase();
}
parts[1] = options.gmail_convert_googlemaildotcom ? 'gmail.com' : parts[1];
} else if (~icloud_domains.indexOf(parts[1])) {
// Address is iCloud
if (options.icloud_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.icloud_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else if (~outlookdotcom_domains.indexOf(parts[1])) {
// Address is Outlook.com
if (options.outlookdotcom_remove_subaddress) {
parts[0] = parts[0].split('+')[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.outlookdotcom_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else if (~yahoo_domains.indexOf(parts[1])) {
// Address is Yahoo
if (options.yahoo_remove_subaddress) {
let components = parts[0].split('-');
parts[0] = (components.length > 1) ? components.slice(0, -1).join('-') : components[0];
}
if (!parts[0].length) {
return false;
}
if (options.all_lowercase || options.yahoo_lowercase) {
parts[0] = parts[0].toLowerCase();
}
} else {
// Any other address
if (options.all_lowercase) {
parts[0] = parts[0].toLowerCase();
}
}
return parts.join('@');
}
Loading