Improve handling of invalid URI characters


Currently, characters with a character code >= 128 are not recognized as illegal characters and are not hex-encoded. However, legal characters should be from the US-ASCII set, which just covers characters 0-127. The proper encoding is to convert other characters is to convert to UTF-8 encoding first and then to hex-encode the individual bytes one by one.

The encoding error becomes apparent when converting files with German Umlauts (äöüÄÖÜ).

I am proposing the attached patch for the StringUtils.cpp file.

file attachments

Closed Dec 23, 2016 at 7:17 PM by clechasseur
Sorry for the long delay, but it seems you retracted it before I could take a look at it. I will thus close this.


Tauris wrote Nov 23, 2016 at 2:53 PM

I would like to retract this patch proposal. I have since learned that it would create problems with Internet Explorer as described here: http://stackoverflow.com/questions/23679236/unicode-characters-in-a-url-all-ok-except-for-ie.
Internet Explorer would not convert back to UTF-8.