Parsing a non utf-8 encoding page to DomDocument, Some web page put tag in following sequence
Assuming you have just received the html content from curl_exec
//.... $htmlContent = curl_exec($ch); $doc=new DocDocument('1.0', 'ENCODING'); //create a new DOMDocument object $doc->loadHtml($htmlContent); //you probably obtain warning here $doc->save('test.html');
Open your test.html with any text editor, you may find the your html body is gone & the header is incomplete.
To resolve this problem, you will have to put the title after the
Here is a simple trick to do
$htmlContent = curl_exec($ch); $pattern="/(
Now you should obtain the proper document content without lose anything.