Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Handling IDNA by adding Punycode encoding in urlParse. #1149 #1174

Closed
wants to merge 7 commits into from
Closed

Handling IDNA by adding Punycode encoding in urlParse. #1149 #1174

wants to merge 7 commits into from

Conversation

jeremys
Copy link

@jeremys jeremys commented Jun 13, 2011

  • Adding punycode implementation from http://stackoverflow.com/questions/183485/useless/301287#301287
  • Always encode during urlParse.
  • Adding test cases to test this new encoding.
  • Code pass JSLint.
  • Disabled 2 tests cases. Because of IDNA encoding and the fact that the hostname is not clearly separated. We encode part of the path. This probably could be fixed somewhat but I'm not even sure we're supposed to support this syntax. For example Safari and Chrome consider those 2 URLs incorrect.

We could also export both functions toASCII and toUnicode.

@ry
Copy link

ry commented Jun 13, 2011

Looking good. Punycode stuff should be moved to a separate file with its own copyright header and the LICENSE file should be updated.

@isaacs
Copy link

isaacs commented Jun 13, 2011

I would like to not remove those two tests. They were added because people use the url.parse function to screen out "valid" urls from user input, so keeping invalid characters out of the hostname is important.

I agree with @ry that punycode should be in lib/punycode.js. We can update the invalid host chars regexps to only screen out the invalid characters that are in the ascii range.

…k 2 test cases and update urlParse to make sure we skip only invalid chars in the ASCII range.
@jeremys
Copy link
Author

jeremys commented Jun 14, 2011

I've enable back the 2 test cases. I've tweak urlParse to skip only chars within the ASCII range, maybe the way I did it is a little bit crappy? By doing this I can apply IDNA encode after the validation and I'm sure the encoding is done on the hostname only.

@ry
Copy link

ry commented Jun 14, 2011

I'm uncomfortable with pulling code off a random website like this and attaching a random copyright header to it. Especially since the author says he created it by modifying a python version. Who created the python version - what's its license? Does it allow that?

Can we please rewrite this from scratch using only the RFC?

@jeremys
Copy link
Author

jeremys commented Jun 14, 2011

He derived his work from the C algorithm that come from the RFC (see: http://tools.ietf.org/html/rfc3492#appendix-C)
About headers, I've put the same headers as the one in the C implementation of Punycode in libidn, see: http://www.gnu.org/software/libidn/doxygen/punycode_8c_source.html

If you think it's not enough, I'll rewrite the whole implementation of this based on the RFC and the C sample provided in it.

@bnoordhuis
Copy link
Member

@ry Punycode or full IDNA?

@bnoordhuis
Copy link
Member

@jeremys I have some C code I could tidy up and publish but I'll hold off if you pick up this one.

@bnoordhuis
Copy link
Member

The one-stop MIT-licensed solution to all your JS punycode encoding and decoding needs: https://gist.github.com/1035853

@ry
Copy link

ry commented Jun 20, 2011

@bnoordhuis++

@jeremys
Copy link
Author

jeremys commented Jun 20, 2011

Ok, I'll replace the concerned code with your gist into my pull request. Thank you very much @bnoordhuis.

@jeremys
Copy link
Author

jeremys commented Jun 20, 2011

So I've replaced the punycode lib with @bnoordhuis's one. I've just added one more test case. And updated your code to pass JSLint. I've also moved the parsing of the domain and detection to use punycode into url.js.

Let me know if it needs anything else.

@jeremys
Copy link
Author

jeremys commented Jun 27, 2011

Hey there, is there something wrong with my pull request that I should change?

@bnoordhuis
Copy link
Member

@jeremys: Ryan is on holiday so it's quiet on the node front.

@jeremys
Copy link
Author

jeremys commented Jun 27, 2011

Ok @bnoordhuis, thanks for the info, I was just wondering if I was holding up something :)

@jeremys
Copy link
Author

jeremys commented Jul 6, 2011

@ry with the release of 0.5.0 should I update my pull request? or is there something wrong with it?

@isaacs
Copy link

isaacs commented Jul 6, 2011

@jeremys Nothing wrong with it. Just been buried under some other things, sorry.

@jeremys
Copy link
Author

jeremys commented Jul 6, 2011

No worries!

@ry
Copy link

ry commented Jul 6, 2011

I haven't tested - but after a cursory glance it looks good to me.

@bnoordhuis or @isaacs can test and land at will.

@isaacs
Copy link

isaacs commented Jul 6, 2011

merging this now.

@isaacs isaacs closed this in 2a848fa Jul 6, 2011
@isaacs
Copy link

isaacs commented Jul 6, 2011

Squashed into 2a848fa

isaacs pushed a commit to isaacs/node-v0.x-archive that referenced this pull request Jul 19, 2011
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants