Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multiple flags #88

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,11 @@ will output:
Fédération Camerounaise de Football
Fédération Camerounaise de Football

Options:
Flags:
========
By default, `Encoding::fixUTF8` will use the `Encoding::WITHOUT_ICONV` flag, signalling that iconv should not be used to fix garbled UTF8 strings.

This class also provides options for iconv processing, such as `Encoding::ICONV_TRANSLIT` and `Encoding::ICONV_IGNORE` to enable these flags when the iconv class is utilized. The functionality of such flags are documented in the [PHP iconv documentation](http://php.net/manual/en/function.iconv.php).
This class also provides flags for iconv processing, such as `Encoding::ICONV_TRANSLIT` and `Encoding::ICONV_IGNORE` to enable these flags when the iconv class is utilized. The functionality of such flags are documented in the [PHP iconv documentation](http://php.net/manual/en/function.iconv.php).

Examples:

Expand All @@ -61,12 +61,14 @@ Examples:
echo Encoding::fixUTF8($str); // Will break U+2014
echo Encoding::fixUTF8($str, Encoding::ICONV_IGNORE); // Will preserve U+2014
echo Encoding::fixUTF8($str, Encoding::ICONV_TRANSLIT); // Will preserve U+2014
echo Encoding::fixUTF8($str, Encoding::ICONV_TRANSLIT | Encoding::ICONV_IGNORE); // Will preserve U+2014

will output:

Fédération Camerounaise?de?Football
Fédération Camerounaise—de—Football
Fédération Camerounaise—de—Football
Fédération Camerounaise—de—Football

while:

Expand All @@ -76,12 +78,14 @@ while:
echo Encoding::fixUTF8($str); // Will break invalid characters
echo Encoding::fixUTF8($str, Encoding::ICONV_IGNORE); // Will remove invalid characters, keep those present in Win1252
echo Encoding::fixUTF8($str, Encoding::ICONV_TRANSLIT); // Will trasliterate invalid characters, keep those present in Win1252
echo Encoding::fixUTF8($str, Encoding::ICONV_TRANSLIT | Encoding::ICONV_IGNORE); // Will trasliterate invalid (but legal) characters, remove illegal character in input string, keep those present in Win1252

will output:

????????
šž
ceeišuuž
ceeišuuž


Install via composer:
Expand Down
24 changes: 17 additions & 7 deletions src/ForceUTF8/Encoding.php
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@

class Encoding {

const ICONV_TRANSLIT = "TRANSLIT";
const ICONV_IGNORE = "IGNORE";
const WITHOUT_ICONV = "";
const ICONV_TRANSLIT = 1;
const ICONV_IGNORE = 2;
const WITHOUT_ICONV = 4;

protected static $win1252ToUtf8 = array(
128 => "\xe2\x82\xac",
Expand Down Expand Up @@ -199,7 +199,7 @@ static function toUTF8($text){
$c3 = $i+2 >= $max? "\x00" : $text[$i+2];
$c4 = $i+3 >= $max? "\x00" : $text[$i+3];
if($c1 >= "\xc0" & $c1 <= "\xdf"){ //looks like 2 bytes UTF8
if($c2 >= "\x80" && $c2 <= "\xbf"){ //yeah, almost sure it's UTF8 already
if($c2 < "\x80"){ //yeah, almost sure it's UTF8 already
$buf .= $c1 . $c2;
$i++;
} else { //not valid UTF8. Convert it.
Expand Down Expand Up @@ -337,14 +337,24 @@ public static function encode($encodingLabel, $text)
return self::toUTF8($text);
}

protected static function utf8_decode($text, $option = self::WITHOUT_ICONV)
protected static function utf8_decode($text, $flags = self::WITHOUT_ICONV)
{
if ($option == self::WITHOUT_ICONV || !function_exists('iconv')) {
if ($flags & self::WITHOUT_ICONV || !function_exists('iconv')) {
$o = utf8_decode(
str_replace(array_keys(self::$utf8ToWin1252), array_values(self::$utf8ToWin1252), self::toUTF8($text))
);
} else {
$o = iconv("UTF-8", "Windows-1252" . ($option === self::ICONV_TRANSLIT ? '//TRANSLIT' : ($option === self::ICONV_IGNORE ? '//IGNORE' : '')), $text);
$outCharsetParams = '';

if ($flags & self::ICONV_TRANSLIT) {
$outCharsetParams .= '//TRANSLIT';
}

if ($flags & self::ICONV_IGNORE) {
$outCharsetParams .= '//IGNORE';
}

$o = iconv("UTF-8", "Windows-1252" . $outCharsetParams, $text);
}
return $o;
}
Expand Down
3 changes: 3 additions & 0 deletions test/ForceUTF8Test.php
Original file line number Diff line number Diff line change
Expand Up @@ -97,5 +97,8 @@ function test_double_encoded_arrays_fix(){
Test::identical("fixUTF8() Example 4 still working.",
Encoding::fixUTF8("Fédération Camerounaise de Football\n"),
"Fédération Camerounaise de Football\n");
Test::identical("fixUTF8() Example 5 still working.",
Encoding::fixUTF8("À \n"),
"À \n");

Test::totals();