View Issue Details

IDProjectCategoryView StatusLast Update
0001644phplist applicationHTML Email Supportpublic17-05-11 14:48
Reporterkiang 
PriorityhighSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version2.8.11 
Target Version4.0.xFixed in Version2.11.6 
Summary0001644: Subject error with UTF-8 encode in Traditional Chinese
DescriptionI'm trying to use phplist in UTF-8 enviornment with Traditional Chinese language. Everything seems OK but the subject. Some characters of the subject were be translated to no meanful ones. Anyone could please tell me where or which of the scripts I can trying to fix this problem?? I'm still looking into the code..
Additional InformationVersion: 2.8.11
I've modify the encode of the language file, config for both text and html emails to UTF-8.
TagsNo tags attached.

Relationships

related to 0003139 resolvedmichiel Hebrew support 
related to 0003721 closed phplist 2.10.x 
related to 0004079 resolvedmichiel corrupted russian message subject and from fields when editing 
related to 0011562 resolvedmichiel Random Character Encoding Bug in SHIFT-JIS Japanese emails body & Subject 
related to 0011585 resolvedmichiel Custom Placeholders / Attributes with special characters in HTML-area 
related to 0005528 resolvedmichiel Overwriten config value 
parent of 0013382 resolvedmichiel Encoding problems 
parent of 0013291 resolvedsupport HTML Email Support and character entity encoding 
parent of 0015536 resolvedmichiel Wrong encoding for text version through PhpMailer 
has duplicate 0015245 resolveduser4402 Message footer does not display special characters, like é ó ö ü etc. 
has duplicate 0014238 resolveduser4402 Wrong encoding of pages 
has duplicate 0015250 resolveduser4402 RSS feeds encoded in ISO-8859-1 do not display correctly in UTF-8 encoded messages 
has duplicate 0015258 resolveduser4402 email body being sended with UTF-8 encoding 
has duplicate 0009309 resolveduser4402 Special characters (ä, ö, é, ç, ã etc.) do not display correctly with UTF-8 charset selected 
has duplicate 0015159 resolveduser4402 can dispaly chinese properly, and I also upload simplfied chinese , pls update it. thanks. 
related to 0008134 resolvedmichiel Send Message - After "Save Changes" - Hebrew Subject broken 
related to 0015241 resolveduser4391 Subject will empty when we edit the message 
related to 0015324 resolvedmichiel Subject and From turn to Gibberish when saved not in English 
related to 0015362 resolvedmichiel overall handling of charsets 
related to 0015407 resolvedmichiel pagetop seems not to be included 
related to 0015298 resolvedmichiel userdata substitution in URL not working for UTF databases 

Activities

michiel

04-10-04 12:36

manager   ~0002132

As I wouldn't know how to solve that, do you have any tips of how to make that work?

kiang

07-08-05 19:12

reporter   ~0005995

I found the problem.

Line 869, 875 in the file 'lists/admin/send_core.php', the script try to use htmlentities function for process characters. In my enviornment, it would be better if use this function in following format:

htmlentities($subject, ENT_QUOTES, $_SESSION['adminlanguage']['charset'])

I don't know if this cause other problems in other enviornments. I could always solve the same problem in other page. :)

user1177

13-02-06 16:53

  ~0010629

Did this get resolved in 2.10?

michiel

04-10-06 19:21

manager   ~0019579

instead of using the admin language from the session, I've hardcoded UTF-8, because it may as well be that someone has the interface in english, but wants to send chinese. Let's see if that sorts it

nordblad

18-03-08 20:49

reporter   ~0043080

htmlentities($subject,ENT_QUOTES,'UTF-8')

gives me problems with Swedish special characters (åäö). The message subject just disappears when I click "Save Changes". I guess my input is in ISO-8859-1, because

htmlentities($subject,ENT_QUOTES,'ISO-8859-1')

fixes it. So does

htmlspecialchars($subject,ENT_QUOTES),

which looks even nicer to me. Wouldn't that work for Chinese too?

h2b2

17-03-09 22:06

manager   ~0050558

A somewhat similar issue involving PHP 5.2.5 was discussed in the PHP bug tracker: http://bugs.php.net/bug.php?id=43549

This discussion seems to indicate that if you set htmlentities to UFF-8, you'll need to make sure that the charset for the html page containing the form is also set to UTF-8.

Currently the 'send a message' page produced by phplist is set to:
   <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

Instead of:
   <meta http-equiv="content-type" content="text/html; charset=utf-8" />

h2b2

20-03-09 06:22

manager   ~0050565

On my system, entering special characters in the subject field did not make the subject field disappear completely. Only the special characters disappeared, while normal characters still where displayed, e.g.: "This is a Tést" would display as "This is a Tst".

So I ran a quick test which confirms my previous note, i.e., you need to set the content-type of the HTML page holding the input form fields to UTF-8 if you want to htmlentities($subject,ENT_QUOTES,'UTF-8') to work correctly, i.e., if <meta http-equiv="content-type" content="text/html; charset=utf-8" /> is used, the subject field "This is a Tést" would display in full in the received text and html message.

To do so I had to change $strCharSet in english.inc from ISO-8859-1 to utf-8. Other language files than english.inc would probably need recoding all special characters to UTF-8.

My test system is configured as follows.
Configuration page:
 - Charset for HTML messages: UTF-8
 - Charset for Text messages: UTF-8

config.php:
 - $language_module = "english.inc"; # with this change in english.inc: $strCharSet = 'utf-8';
 - define("HTMLEMAIL_ENCODING","quoted-printable");
 - define("TEXTEMAIL_ENCODING",'7bit');

Server info:
phplist 2.10.9
Linux/Apache
PHP 5.2.3
MySQL 4.1.12 - with *database encoding* set to: utf8_unicode_ci


Some remarks:
- I expected the charset for the backend's html pages to be defined by the settings in languages.php, e.g.:
    "en" => array("English ","iso-8859-1","iso-8859-1, windows-1252 "),
This is not the case, and I'm not sure what exactly the language.php charset settings are used for.
- I wonder whether it is a good idea to hardcode UTF-8 anywhere in the code. It would seem more flexible to have the charset configurable, e.g. through the charset defined on the configuration page.
- Inclusion of UTF-8 encoded .inc language files should perhaps be considered for future phplist releases, along with the existing iso-* encoded files.
- Installation procedures (and documentation) could perhaps include giving the user a choice of charsets to use for database encoding.

h2b2

22-03-09 20:50

manager   ~0050577

I found an interesting article which identifies different aspects that come into play in a PHP/MySQL/UTF-8 application. These are the principal ones:
- the database (individual tables + any text columns) should be set to UTF-8
- the PHP server should send a header telling the browser to expect UTF-8, e.g.:
     header('Content-Type: text/html; charset=utf-8' );
- the HTML page's Content-Type should be set to UTF-8, i.e.:
     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
- and, the PHP-MySQL connection should be set to UTF-8, since it will otherwise default to latin1

The article also provides a useful solution (based on SET NAMES) and some code examples.
For more info, please see: http://www.adviesenzo.nl/examples/php_mysql_charset_fix/

h2b2

23-03-09 01:06

manager   ~0050579

Additional server related info from my test system, collected by using the following query in phpMyAdmin: SHOW VARIABLES LIKE 'character_set_%'

character_set_client utf8
character_set_connection utf8
character_set_database utf8
character_set_results utf8
character_set_server latin1
character_set_system utf8

michiel

23-03-09 02:32

manager   ~0050580

great, that's very helpful research. thanks for that.

h2b2

23-03-09 04:02

manager   ~0050581

Glad to help.

Just a few additional comments regarding hardcoding the charset. Currently UTF-8 has been hardcoded in the following files (v2.10.9):

processbounces.php
  $message = html_entity_decode($message,ENT_QUOTES,'UTF-8');
  
processqueue.php
  $line = html_entity_decode($line,ENT_QUOTES,'UTF-8');
      
sendemaillib.php
  $text = html_entity_decode ( $text , ENT_QUOTES , 'UTF-8' );
        
send_core.php
  value="'.htmlentities($subject,ENT_QUOTES,'UTF-8').'"
  value="'.htmlentities($from,ENT_QUOTES,'UTF-8').'"
  value="'.htmlentities($forwardsubject,ENT_QUOTES,'UTF-8').'"
    
class.phplistmailer.php
  $this->Body = html_entity_decode($text ,ENT_QUOTES, 'UTF-8' ); #$text;
  $this->AltBody .= html_entity_decode($text ,ENT_QUOTES, 'UTF-8' );#$text;
  $this->Body .= html_entity_decode($text ,ENT_QUOTES, 'UTF-8' );#$text;
                
So, the problem will probably not only occur in the subject line, but most likely also in the From: (name) line, and the forward subject line. (Haven't checked the forum for reports on this yet).

It seems to me that instead of hardcoding UTF-8, it might be an improvement if all hardcoded instances of 'UTF-8' were replaced by something like $GLOBALS['strCharSet']
In that case, it is the .inc language file's encoding that will determine the charset for the whole phplist system (frontend, backend, and backend input fields) except for the Charset settings for HTML and Text messages on the configuration page, and except for the encoding for the database and the database connection.

While this would be an improvement, it is not yet an ideal situation, considering that things may still go wrong if the database encoding and/or database connection isn't compatible with the charset, or if the user forgets to change the charset used for message encoding (configuration page) for instance.

I guess the best way to solve this, would be to have a phplist installation script give the user a choice from a number of charsets. The installation script should then make sure the whole system, including the database/db connection, is made ready for use under the chosen charset.

h2b2

02-04-09 21:49

manager   ~0050612

-
Probably related to:
http://mantis.phplist.com/view.php?id=5017
http://mantis.phplist.com/view.php?id=9309
http://mantis.phplist.com/view.php?id=13382
http://mantis.phplist.com/view.php?id=13291
http://mantis.phplist.com/view.php?id=14238
http://mantis.phplist.com/view.php?id=15241
http://mantis.phplist.com/view.php?id=15245

Possibly related to:
http://mantis.phplist.com/view.php?id=15250


Also found a report on the forum which seems to confirm the issue also occurs when the name in the From: line contains special characters.
See: http://forums.phplist.com/viewtopic.php?p=61323#61323

h2b2

09-04-09 07:09

manager   ~0050620

-
A useful suggestion from the forum:

==== Start Quote ====

to support encoding properly you should add these lines:

mysql_query("SET CHARACTER_SET_CLIENT=utf8");
mysql_query("SET CHARACTER_SET_RESULTS=utf8");
mysql_query("SET CHARACTER_SET_CONNECTION=utf8");


to mysql.inc

==== End Quote ====

21-01-10 10:13

 

utf8_fix_for_svn_r1703.diff (3,378 bytes)
diff -urN svn_r1703_sin_punto_svn_cumulus_stage_7_newembargo_fixed/phplist/public_html/lists/admin/class.phplistmailer.php svn_r1703_sin_punto_svn_cumulus_stage_8_utf8/phplist/public_html/lists/admin/class.phplistmailer.php
--- svn_r1703_sin_punto_svn_cumulus_stage_7_newembargo_fixed/phplist/public_html/lists/admin/class.phplistmailer.php	2009-11-29 16:50:16.000000000 +0100
+++ svn_r1703_sin_punto_svn_cumulus_stage_8_utf8/phplist/public_html/lists/admin/class.phplistmailer.php	2010-01-21 10:50:00.000000000 +0100
@@ -137,6 +137,16 @@
     }
 
     function send($to_name = "", $to_addr, $from_name, $from_addr, $subject = '', $headers = '',$envelope = '') {
+
+      // utf8 workaround fix - begin
+      $tmpfrom_name = iconv ("ISO-8859-1","UTF-8", $from_name);
+      $from_name = $tmpfrom_name;
+      $tmpfrom_addr = iconv ("ISO-8859-1","UTF-8", $from_addr);
+      $from_addr = $tmpfrom_addr;
+      $tmpsubject = iconv ("ISO-8859-1","UTF-8", $subject);
+      $subject = $tmpsubject;
+      // utf8 workaround fix - end
+
       $this->From = $from_addr;
       $this->FromName = $from_name;
       if (strstr(VERSION, "dev")) {
diff -urN svn_r1703_sin_punto_svn_cumulus_stage_7_newembargo_fixed/phplist/public_html/lists/admin/connect.php svn_r1703_sin_punto_svn_cumulus_stage_8_utf8/phplist/public_html/lists/admin/connect.php
--- svn_r1703_sin_punto_svn_cumulus_stage_7_newembargo_fixed/phplist/public_html/lists/admin/connect.php	2010-01-19 18:47:41.000000000 +0100
+++ svn_r1703_sin_punto_svn_cumulus_stage_8_utf8/phplist/public_html/lists/admin/connect.php	2010-01-21 10:51:00.000000000 +0100
@@ -36,6 +36,7 @@
 if (isset($message_envelope))
   $envelope = "-f$message_envelope";
 
+Sql_Query("SET NAMES 'utf8'");
 $database_schema = '';
 $database_connection = Sql_Connect($database_host,$database_user,$database_password,$database_name);
 Sql_Set_Search_Path($database_schema);
diff -urN svn_r1703_sin_punto_svn_cumulus_stage_7_newembargo_fixed/phplist/public_html/lists/admin/send_core.php svn_r1703_sin_punto_svn_cumulus_stage_8_utf8/phplist/public_html/lists/admin/send_core.php
--- svn_r1703_sin_punto_svn_cumulus_stage_7_newembargo_fixed/phplist/public_html/lists/admin/send_core.php	2009-11-29 16:51:22.000000000 +0100
+++ svn_r1703_sin_punto_svn_cumulus_stage_8_utf8/phplist/public_html/lists/admin/send_core.php	2010-01-21 10:55:34.000000000 +0100
@@ -1073,13 +1073,13 @@
   $maincontent .= '
   <tr><td>'.Help("subject").' '.$GLOBALS['I18N']->get("Subject").':</td>
     <td><input type=text name="msgsubject"
-    value="'.htmlentities($subject,ENT_QUOTES,'UTF-8').'" size=40></td></tr>
+    value="'.$subject.'" size=40></td></tr>
   <tr>
     <td colspan=2>
     </td></tr>
   <tr><td>'.Help("from").' '.$GLOBALS['I18N']->get("fromline").':</td>
     <td><input type=text name=from
-   value="'.htmlentities($from,ENT_QUOTES,'UTF-8').'" size=40></td></tr>
+   value="'.$from.'" size=40></td></tr>
   <tr><td colspan=2>
 
   </td></tr>';
@@ -1089,7 +1089,7 @@
   " the friend will receive this message instead of the one on the content tab.").
   '<tr><td>'.Help("subject").' '.$GLOBALS['I18N']->get("Subject").':</td>
     <td><input type=text name="forwardsubject"
-    value="'.htmlentities($forwardsubject,ENT_QUOTES,'UTF-8').'" size=40></td></tr>
+    value="'.$forwardsubject.'" size=40></td></tr>
   <tr>
     <td colspan=2>
     </td></tr>

adrian15

21-01-10 10:17

reporter   ~0050837

A similar problem appears in revision 1703 from the svn.
I am trying to write in Spanish but I think it applies to other languages also.
I attach a patch (I think it is not a definitive patch but a workaround) to solve this problem.

In my opinnion most of the problems that I have might come from the fact that, whatever the reason is, pagetop page is not included in any of the admin pages.

But I am not quite sure because I am not an expert on this utf8 issues.

adrian15

haipo

01-08-10 19:34

reporter   ~0051065

Hello,
After reading carefully all the expalnations and trying everything described here,
I still have the same problem described above.
When writing the subject in Hebrew, strange signs apear and the text is corrupted.
The same happens with the "From".
I have chosen the laguage text "hebrew-utf8"
I set the Charset for HTML messages = UTF-8, Charset for Text messages = UTF-8
And still no use.
Please advise,
Yaron, Haipo.co.il

h2b2

06-10-10 03:11

manager   ~0051115

For more in-depth info on MySQL encoding pitfalls and solutions, see: http://mysql.rjweb.org/doc.php/charcoll

h2b2

06-10-10 03:52

manager   ~0051116

@haipo: since v2.10.11, you will also need to set the charset of admin interface pages to UTF-8.
See this forum post for more info: http://forums.phplist.com/viewtopic.php?p=80089#p80089