• 11
name

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191

Backtrace:

File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

UTF-8 problems while reading CSV file with fgetcsv

I try to read a CSV and echo the content. But the content displays the characters wrong.

Mäx Müstermänn -> Mäx Müstermänn

Encoding of the CSV file is UTF-8 without BOM (checked with Notepad++).

This is the content of the CSV file:

"Mäx";"Müstermänn"

My PHP script

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
</body>
</html>

I tried to use setlocale(LC_ALL, 'de_DE.utf8'); as suggested here without success. The content is still wrong displayed.

What I'm missing?

Edit:

An echo mb_detect_encoding($data[$c],'UTF-8'); gives me UTF-8 UTF-8.

echo file_get_contents("specialchars.csv"); gives me "Mäx";"Müstermänn".

And

print_r(str_getcsv(reset(explode("\n", file_get_contents("specialchars.csv"))), ';'))

gives me

Array ( [0] => Mäx [1] => Müstermänn )

What does it mean?

      • 2
    • What happens when you do echo file_get_contents("specialchars.csv")? What happens when you do print_r(str_getcsv(reset(explode(" ", file_get_contents("specialchars.csv"))), ';'))?

Try this:

<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $data = array_map("utf8_encode", $data); //added
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
  • 53
Reply Report

Encountered similar problem: parsing CSV file with special characters like é, è, ö etc ...

The following worked fine for me:

To represent the characters correctly on the html page, the header was needed :

header('Content-Type: text/html; charset=UTF-8');

In order to parse every character correctly, I used:

utf8_encode(fgets($file));

Dont forget to use in all following string operations the 'Multibyte String Functions', like:

mb_strtolower($value, 'UTF-8');
  • 15
Reply Report

Try putting this into the top of your file (before any other output):

<?php

header('Content-Type: text/html; charset=UTF-8');

?>
  • 7
Reply Report
      • 2
    • Perhaps I should mention that I upload the csv file through a form with enctype="multipart/form-data" accept-charset="utf-8". If I put your code into the example than it seems to work.
      • 2
    • @testing that made a difference for me. Had 2 CSV's I was parsing, one had the accept-charset="utf-8" and the other didn't, and it didn't display correctly until I used this.

The problem is that the function returns UTF-8 (it can check using mb_detect_encoding), but do not convert, and these characters takes as UTF-8. ?herefore, it's necessary to do the reverse-convert to initial encoding (Windows-1251 or CP1251) using iconv. But since by the fgetcsv returns an array, I suggest to write a custom function: [Sorry for my english]

function customfgetcsv(&$handle, $length, $separator = ';'){
    if (($buffer = fgets($handle, $length)) !== false) {
        return explode($separator, iconv("CP1251", "UTF-8", $buffer));
    }
    return false;
}
  • 5
Reply Report

In my case the source file has windows-1250 encoding and iconv prints tons of notices about illegal characters in input string...

So this solution helped me a lot:

/**
 * getting CSV array with UTF-8 encoding
 *
 * @param   resource    &$handle
 * @param   integer     $length
 * @param   string      $separator
 *
 * @return  array|false
 */
private function fgetcsvUTF8(&$handle, $length, $separator = ';')
{
    if (($buffer = fgets($handle, $length)) !== false)
    {
        $buffer = $this->autoUTF($buffer);
        return str_getcsv($buffer, $separator);
    }
    return false;
}

/**
 * automatic convertion windows-1250 and iso-8859-2 info utf-8 string
 *
 * @param   string  $s
 *
 * @return  string
 */
private function autoUTF($s)
{
    // detect UTF-8
    if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s))
        return $s;

    // detect WINDOWS-1250
    if (preg_match('#[\x7F-\x9F\xBC]#', $s))
        return iconv('WINDOWS-1250', 'UTF-8', $s);

    // assume ISO-8859-2
    return iconv('ISO-8859-2', 'UTF-8', $s);
}

Response to @manvel's answer - use str_getcsv instead of explode - because of cases like this:

some;nice;value;"and;here;comes;combinated;value";and;some;others

explode will explode string into parts:

some
nice
value
"and
here
comes
combinated
value"
and
some
others

but str_getcsv will explode string into parts:

some
nice
value
and;here;comes;combinated;value
and
some
others
  • 5
Reply Report
      • 2
    • Great answer ! This is the only one that actually deals with wrong character encoding issue when manipulating CSV data with PHP. Either you properly encode your data before manipulating it, otherwise you do it on the fly upon reading. In my case, fgetcsv was returning a broken output (nothing - even NULL nor FALSE - was returned !) without any PHP notice, because of misencoding issue.. you just saved me precious time with fgetcsvUTF8 because I had noway to re-encode original data, I hate encoding issue.. Thanks for sharing !

Now I got it working (after removing the header command). I think the problem was that the encoding of the php file was in ISO-8859-1. I set it to UTF-8 without BOM. I thought I already have done that, but perhaps I made an additional undo.

Furthermore, I used SET NAMES 'utf8' for the database. Now it is also correct in the database.

  • 2
Reply Report

Warm tip !!!

This article is reproduced from Stack Exchange / Stack Overflow, please click

Trending Tags

Related Questions