Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: support multi-dot file extensions #23416

Closed
wants to merge 4 commits into from

Conversation

GeoffreyBooth
Copy link
Member

@GeoffreyBooth GeoffreyBooth commented Oct 11, 2018

Following up from #23228 and nodejs/citgm#604.

This PR adds support for registering “multi-dot” file extensions (like .coffee.md) to the module loader. Previously userland modules like CoffeeScript were patching Module.prototype.load to support code like require.extensions['.foo.bar'] = .... After this PR, such patching is no longer necessary. This also resolves the original issue that the reverted #23228 was addressing, fixing inconsistency between load and _findPath.

Written with @jdalton.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@addaleax
Copy link
Member

addaleax commented Oct 11, 2018

startIndex = index + 1;
if (index === 0) continue; // Skip dotfiles like .gitignore
currentExtension = name.slice(index);
if (Module._extensions[currentExtension]) return currentExtension;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just store and return the function instead to avoid a second, unnecessary lookup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex Would you clarify your question (maybe with pseudo code)?

Copy link
Contributor

@mscdex mscdex Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of Module._extensions[currentExtension] is thrown away here. However after this function returns, Module._extensions[extension] is done a second time. It would be nice if we could avoid this duplicate lookup.

So instead of:

currentExtension = name.slice(index);
if (Module._extensions[currentExtension]) return currentExtension;

we could do:

const loader = Module._extensions[name.slice(index)];
if (loader)
  return loader;

and change the calling code appropriately:

findExtension(filename)(this, filename);

The function name and the default return value would need to be changed of course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some code to do a performance test:

const path = require('path');

// Mock Module for testing
const Module = {
  _extensions: {
    '.js': function () {
      const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
      return makeWork;
    },
    '.json': function () {
      const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
      return makeWork;
    },
    '.node': function () {
      const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
      return makeWork;
    }
  }
}

// Generate 100000 filenames
const filenames = [];
for (let i = 0; i < 100000; ++i) {
  filenames.push(`${i}.${Module._extensions[i % 3]}`);
}


// Current version
function findExtension(filename) {
  const name = path.basename(filename);
  let currentExtension;
  let index;
  let startIndex = 0;
  while ((index = name.indexOf('.', startIndex)) !== -1) {
    startIndex = index + 1;
    if (index === 0) continue; // Skip dotfiles like .gitignore
    currentExtension = name.slice(index);
    if (Module._extensions[currentExtension]) return currentExtension;
  }
  return '.js';
}

// Altered version that returns the loader itself rather than the extension,
// so that there’s only one lookup
function findLoaderByExtension(filename) {
  const name = path.basename(filename);
  let currentExtension;
  let index;
  let startIndex = 0;
  let loader;
  while ((index = name.indexOf('.', startIndex)) !== -1) {
    startIndex = index + 1;
    if (index === 0) continue; // Skip dotfiles like .gitignore
    currentExtension = name.slice(index);
    if (loader = Module._extensions[currentExtension]) return loader;
  }
  return Module._extensions['.js'];
}


const run = {
  test1: function() {
    console.time('test1');
    filenames.forEach((filename) => {
      var extension = findExtension(filename);
      Module._extensions[extension](this, filename);
    });
    console.timeEnd('test1');
  },
  test2: function() {
    console.time('test2');
    filenames.forEach((filename) => {
      var loader = findLoaderByExtension(filename);
      loader(this, filename);
    });
    console.timeEnd('test2');
  }
}

run[process.argv[2]]();

Here are my results:

 ✦ node -v
v10.12.0
 ✦ node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1
test1: 85.108ms
test1: 85.046ms
test1: 87.216ms
test1: 85.767ms
test1: 85.926ms
test1: 85.650ms
test1: 84.461ms
test1: 85.011ms
test1: 83.525ms
test1: 87.337ms
 ✦ node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2
test2: 90.877ms
test2: 86.902ms
test2: 86.268ms
test2: 85.143ms
test2: 90.064ms
test2: 86.636ms
test2: 84.882ms
test2: 86.778ms
test2: 89.278ms
test2: 85.963ms
  • Test 1 average time: 85.505ms
  • Test 2 average time: 87.279ms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth My suggestion was not so much about a huge performance improvement as it is removing duplicated effort.

Also, if you really want to do benchmarks, they should be done via the official benchmark mechanism for the best comparison.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first version has two lookups, but the second version has an extra assignment (loader = Module._extensions[currentExtension]). So the “return the loader” version saves a second lookup at the cost of an extra assignment, which I think explains why the benchmarks are more or less equivalent between the two versions. Personally I think a helper that returns an extension is easier to grasp (and more potentially useful in the future) than a helper that returns a loader, but that’s a subjective call.

let startIndex = 0;
while ((index = name.indexOf('.', startIndex)) !== -1) {
startIndex = index + 1;
if (index === 0) continue; // Skip dotfiles like .gitignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to skip the dotfiles, shouldn't this just break out of here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thefourtheye

If we are to skip the dotfiles, shouldn't this just break out of here?

No, because it's not skipping the whole dotfile, just the first dot of the dot file.

path.extname(".dotfile")
// => ""
path.extname(".dotfile.ext")
// => ".ext"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test was added for this case.

// find the longest (possibly multi-dot) extension registered in
// Module._extensions
function findExtension(filename) {
const name = path.basename(filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're only interested in the extension, is calling path.basename() necessary? You could just scan backwards for dots, checking that there are no path separators in between:

let dot = -1;
let sep = -1;

for (let i = filename.length; --i > 0; /* empty */) {
  const c = filename[i];
  if (c === '.')
    dot = i;
  else if (c === '/') { // FIXME handle backslash on windows
    sep = i;
    break;
  }
}

if (dot > sep + 1) {
  const ext = filename.slice(dot);
  if (Module._extensions[ext]) return ext;
}

return '.js';  // dot file or no extension

Only one slice == less garbage to collect.

Copy link
Member Author

@GeoffreyBooth GeoffreyBooth Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current algorithm scans forwards to short-circuit at the longest matched extension. So say you have a file like mr.robot.coffee.md, and a loader registered for .coffee.md. The current algorithm will iterate like:

  • Is .robot.coffee.md registered? No, continue.
  • Is .coffee.md registered? Yes, break.

This way, even if .md is also registered, the .coffee.md loader takes precedence. This allows multi-dot extensions like .es6.js, for example.

If I’m reading your code correctly, it’s checking only the longest possible multi-dot extension, which for this example would be .robot.coffee.md, a false match. We want multi-dot extensions to work without prohibiting periods elsewhere in the filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps naming the function findRegisteredExtension would be more self-descriptive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardlau Yes, or perhaps findLongestRegisteredExtension would be even more descriptive (because we’re returning .coffee.md instead of .md, if both of those happened to be registered).

@GeoffreyBooth GeoffreyBooth force-pushed the multi-ext branch 3 times, most recently from efbf50a to 6908da4 Compare October 15, 2018 04:29
@GeoffreyBooth
Copy link
Member Author

@addaleax I rewrote the commit messages to get the build to pass. Does this PR need anything else?

@jasnell jasnell added module Issues and PRs related to the module subsystem. semver-minor PRs that contain new features and should be released in the next minor version. labels Oct 25, 2018
@jdalton
Copy link
Member

jdalton commented Oct 25, 2018

@jdalton
Copy link
Member

jdalton commented Oct 25, 2018

I believe this is author ready. Please feel free to remove the label for any reason.

@jdalton jdalton added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Oct 25, 2018
@addaleax
Copy link
Member

@Trott
Copy link
Member

Trott commented Nov 2, 2018

Landed in 22f7d0a

@Trott Trott closed this Nov 2, 2018
Trott pushed a commit to Trott/io.js that referenced this pull request Nov 2, 2018
Support multi-dot file extensions like '.coffee.md' in Module.load.

PR-URL: nodejs#23416
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: John-David Dalton <[email protected]>
@GeoffreyBooth GeoffreyBooth deleted the multi-ext branch November 2, 2018 03:58
targos pushed a commit that referenced this pull request Nov 2, 2018
Support multi-dot file extensions like '.coffee.md' in Module.load.

PR-URL: #23416
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: John-David Dalton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. module Issues and PRs related to the module subsystem. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants