Utf8_general_ci: Ben şu harmanlama kullanarak, tip metin MySQL bir alan var.
Bu XML alan DOMDocument kullanarak inşa değişken kullanarak doldurulur:
function ed_audit_node($dom, $field, $new, $old){
//create audit_detail node
$ad = $dom->createElement('audit_detail');
$fn = $dom->createElement('fieldname');
$fn->appendChild($dom->createTextNode($field));
$ad->appendChild($fn);
$ov = $dom->createElement('old_value');
$ov->appendChild($dom->createTextNode($old));
$ad->appendChild($ov);
$nv = $dom->createElement('new_value');
$nv->appendChild($dom->createTextNode($new));
$ad->appendChild($nv);
//append to document
return $ad;
}
Burada db ($ xml $ dom-> SaveXML gelir ()) kaydetmek nasıl:
function ed_audit_insert($ed, $xml){
global $visitor;
$sql = <<<EOF
INSERT INTO ed.audit
(employee_id, audit_date, audit_action, audit_data, user_id)
VALUES (
{$ed[emp][employee_id]},
now(),
'{$ed[audit_action]}',
'{$xml}',
{$visitor[user_id]}
);
EOF;
$req = mysql_query($sql,$ed['db']) or die(db_query_error($sql,mysql_error(),__FUNCTION__));
//snip
}
See an older, parallel, slightly related thread on how I’m creating this XML: Another PHP XML parsing error: Input is not proper UTF-8, indicate encoding !
What works: - querying the database, selecting the field and outputting it using jQuery (.ajax()) and populating a textarea. Firebug and the textarea match what's in the database (confirmed with Toad).
Here’s a screenshot of the data in the db:
What doesn't work: - outputting the text from the database into an HTML page. This HTML page has the content-type ISO-8859-1, which I cannot change.
Burada işlenmiş küçük bir HTML örnek:
İşte ekranında o çıktılar kod:
$ XMLDATA = simplexml_load_string ($ d ['audit_data']);
foreach ($xmlData->audit_detail as $a){
echo "<p> straight from db = ".$a->new_value."</p>";
echo "<p> utf8_decode() = ".utf8_decode($a->new_value)."</p>";
}
Ben de Firefox için bir charset değiştirici uzantısı kullandım: ISO-8859-1, UTF-8 ve 1252 başarı olmadan çalıştı.
UTF-8 ise (bu içerik-tipi = ISO-8859-1 beri), içimde soru işaretleri ile elmas görme olmamalı? UTF-8 değilse, nedir?
Edit #1
Burada yapmış diğer testlerin anlık bulunuyor:
İşte o anlık için kod:
$ XMLDATA = simplexml_load_string ($ d ['audit_data']);
foreach ($xmlData->audit_detail as $a){
echo "<p>encoding is, straight from db, using mb_detect_encoding: ".mb_detect_encoding($a->new_value)."</p>";
echo "<p>encoding is, with utf8_decode, using mb_detect_encoding: ".mb_detect_encoding(utf8_decode($a->new_value))."</p>";
echo "<hr/>";
echo "<p> straight from db = <pre>".$a->new_value."</pre></p>";
echo "<p> utf8_decode() = <pre>".utf8_decode($a->new_value)."</pre></p>";
echo "<hr/>";
$iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $a->new_value);
$iso88591_3 = mb_convert_encoding($a->new_value, 'ISO-8859-1', 'UTF-8');
echo "<p> iconv() = ".$iso88591_2."</p>";
echo "<p> mb_convert_encoding() = ".$iso88591_3."</p>";
}
Edit #2
Ben FF özel bir etiket XMP eklendi:
Kod:
$ XMLDATA = simplexml_load_string ($ d ['audit_data']);
foreach ($xmlData->audit_detail as $a){
echo "<p>encoding is, straight from db, using mb_detect_encoding: ".mb_detect_encoding($a->new_value)."</p>";
echo "<p>encoding is, with utf8_decode, using mb_detect_encoding: ".mb_detect_encoding(utf8_decode($a->new_value))."</p>";
echo "<hr/>";
echo "<p> straight from db = <pre>".$a->new_value."</pre></p>";
echo "<p> utf8_decode() = <pre>".utf8_decode($a->new_value)."</pre></p>";
echo "<hr/>";
$iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $a->new_value);
$iso88591_3 = mb_convert_encoding($a->new_value, 'ISO-8859-1', 'UTF-8');
echo "<p> iconv() = ".$iso88591_2."</p>";
echo "<p> mb_convert_encoding() = ".$iso88591_3."</p>";
echo "<hr/>";
echo "<p>straight from db, using <xmp> = <xmp>".$a->new_value."</xmp></p>";
echo "<p>utf8_decode(), using <xmp> = <xmp>".utf8_decode($a->new_value)."</xmp></p>";
}
İşte sayfasından bazı meta etiketleri:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="dc.language" scheme="ISO639-2/T" content="eng" />
IMO, son meta tag bir ilgisi yoktur.
Edit #3
Snapshot of the view source in FF:
Kaynak kodu:
<p>encoding is, straight from db, using mb_detect_encoding: UTF-8</p><p>encoding is, with utf8_decode, using mb_detect_encoding: ASCII</p><hr/><p> straight from db = <pre>Ro马eç ³é ¥n franê¡©s</pre></p><p> utf8_decode() = <pre>Ro?e??n fran?s</pre></p><hr/><p> iconv() = Ro</p><p> mb_convert_encoding() = Ro?e??n fran?s</p><hr/><p>straight from db, using <xmp> = <xmp>Ro马eç ³é ¥n franê¡©s</xmp></p><p>utf8_decode(), using <xmp> = <xmp>Ro?e??n fran?s</xmp></p>
Edit #4
İşte db gidiyor SQL deyimi:
INSERT INTO ed.audit
(employee_id, audit_date, audit_action, audit_data, user_id)
VALUES (
75,
now(),
'u',
'<?xml version="1.0"?>
<audit><audit_detail><fieldname>role_fra</fieldname><old_value>aRo马e砳頥n franꡩs</old_value><new_value>bRo马e砳頥n franꡩs</new_value></audit_detail></audit>
',
333
);
! Not, bu XML metin mutlaka yukarıda verilen ekran eşleşmiyor.
Edit #5
İşte önceki_değeri ve NEW_VALUE düğümleri benim değerler etrafında CDATA etiketi sarar benim yeni fonksiyon bulunuyor:
function ed_audit_node($dom, $field, $new, $old){
//create audit_detail node
$ad = $dom->createElement('audit_detail');
$fn = $dom->createElement('fieldname');
$fn->appendChild($dom->createTextNode($field));
$ad->appendChild($fn);
$ov = $dom->createElement('old_value');
$ov->appendChild($dom->createCDATASection($old));
$ad->appendChild($ov);
$nv = $dom->createElement('new_value');
$nv->appendChild($dom->createCDATASection($new));
$ad->appendChild($nv);
//append to document
return $ad;
}
Ben de XML belgesine kodlama ekledi:
$dom = new DomDocument('1.0', 'UTF-8');
İşte benim yeni SimpleXML çağrısı:
$xmlData = simplexml_load_string($d['audit_data'], "SimpleXMLElement", LIBXML_NOENT | LIBXML_NOCDATA);
Ben de Toad CDATA etiketleri görebilirsiniz. Ancak, ben yine de bir hata alıyorum:
Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 2: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0xE9 0x6C 0x65 in <snip>
XML in Toad (with encoding specified):
Edit #6
Ben sadece jQuery çağrı CDATA'da doğru aksanlı karakterleri verir fark: