Ben xml bir html dosyasını dönüştürmek için çalışıyorum. Bu çoğunlukla çalışıyor. Ben yaşıyorum sorun bağlantıları ile. Şu anda tamamen benim test dosyasında bağlantıyı göz ardı edilecek gibi görünüyor.
İşte dönüştürmek kodu:
<?php
ini_set('display_errors', 1);
ini_set('log_errors', 1);
ini_set('error_log', dirname(__FILE__) . '/error_log.txt');
error_reporting(E_ALL);
function convertToXML()
{
$titleLength = 35;
$output = "";
$date = date("D, j M Y G:i:s T");
$fi = fopen( "../newsTEST.htm", "r" );
$fo = fopen( "../newsfeed.xml", "w" );
//This is the first parts of the XML
$output .= "<?xml version=\"1.0\"?>\n";
$output .= "<rss version=\"2.0\">\n";
$output .= "<channel>\n";
$output .= "\t<title>Wiggle 100 News</title>\n";
$output .= "\t<link>http://www.wiggle100.com/news.php</link>\n";
$output .= "\t<description>Wiggle 100 Daily News</description>\n";
$output .= "\t<language>en-us</language>\n";
$output .= "\t<pubDate>". $date ."</pubDate>\n";
$output .= "\t<managingEditor>wiggle100@gmail.com</managingEditor>\n";
$output .= "\t<webMaster>josh@jacurren.com</webMaster>\n";
$article = "";
$skip = true; //if false will continue to put lines into output until </p>
$newArticle = false;
while( !feof($fi) )
{
$line = fgets($fi);
$link = "";
if( strpos( $line, "<p" ) !== false)
{
$pos = strpos( $line, "<p" );
$line = substr( $line, $pos );
$pos = strpos( $line, ">" );
$line = substr( $line, $pos + 1 );
$skip = false;
}
if( strpos( $line, "</p>" ) !== false )
{
$pos = strpos( $line, "</p>" );
$line = substr( $line, 0, $pos - 1 );
$newArticle = true;
}
//This adds the line to the article
if( !$skip )
{
$article .= $line;
}
//This mixes the article, title, link, and date with
// XML and puts it into the output
if( $newArticle )
{
//This if is to get rid of stuff like <p> </p>
if( (strlen($article) > 10) )
{
$link = findLink( $article );
//$article = strip_tags($article);
$title = substr( $article, 0, $titleLength ) . "...";
$output .= "\t<item>\n";
$output .= "\t\t<title>". $title ."</title>\n";
$output .= "\t\t<link>". $link ."</link>\n";
$output .= "\t\t<description>". $article . "</description>\n";
$output .= "\t\t<pubDate>". $date . "</pubDate>\n";
$output .= "\t</item>\n\n";
}
$article = "";
$line = "";
$skip = true;
}
}
$output .= "</channel>\n";
$output .= "</rss>\n";
fwrite( $fo, $output );
fclose($fi);
fclose($fo);
echo "<br /><br /> News converted to XML";
}
//*****************************************************************************
//*****************************************************************************
//Find and return a link in the input.
//Else use the a default
function findLink( $input )
{
$link = "http://www.wiggle100.com/news.php";
if( strpos( $input, "<a" ) !== false )
{
$startpos = strpos( $input, "href" );
$link = substr( $input, $startpos + 5 );
$endpos = strpos( $link, ">" );
$link = substr( $link, 0, $endpos - 2 );
}
return $link;
}
?>
Burada html sınama kodu:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>Test Page</title>
<meta name="GENERATOR" content="MSHTML 8.00.6001.18812">
<meta content="text/html; charset=unicode" http-equiv="Content-Type"></head>
<body bgcolor="#ffffff">
<p> </p>
<p>This is an article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.</p>
<p> </p>
<p>This is another article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.</p>
<p>This is the 3rd article. Blah. Blah. Blah. Blah. Blah. Blah. Blah.</p>
<p> </p>
<p align="center"><font size="6">This is the news for today. Blah Blah Blah!</font>
<a href="http://www.thedailyreview.com/news/">
http://www.thedailyreview.com/news/</a></p>
</body>
</html>
İşte XML çıktı:
<rss version="2.0">
<channel>
<title>Wiggle 100 News</title>
<link>http://www.wiggle100.com/news.php</link>
<description>Wiggle 100 Daily News</description>
<language>en-us</language>
<pubDate>Fri, 23 Oct 2009 23:49:04 EDT</pubDate>
<managingEditor>wiggle100@gmail.com</managingEditor>
<webMaster>josh@jacurren.com</webMaster>
<item>
<title>This is an article. Blah. Blah. Bla...</title>
<link>http://www.wiggle100.com/news.php</link>
<description>This is an article. Blah. Blah. Blah. Blah. Blah. Blah. Blah</description>
<pubDate>Fri, 23 Oct 2009 23:49:04 EDT</pubDate>
</item>
<item>
<title>This is another article. Blah. Blah...</title>
<link>http://www.wiggle100.com/news.php</link>
<description>This is another article. Blah. Blah. Blah. Blah. Blah. Blah. Blah</description>
<pubDate>Fri, 23 Oct 2009 23:49:04 EDT</pubDate>
</item>
<item>
<title>This is the 3rd article. Blah. Blah...</title>
<link>http://www.wiggle100.com/news.php</link>
<description>This is the 3rd article. Blah. Blah. Blah. Blah. Blah. Blah. Blah</description>
<pubDate>Fri, 23 Oct 2009 23:49:04 EDT</pubDate>
</item>
<item>
<title><font size="6">This is the news for...</title>
<link>http://www.wiggle100.com/news.php</link>
<description><font size="6">This is the news for today. Blah Blah Blah!</font>
</description>
<pubDate>Fri, 23 Oct 2009 23:49:04 EDT</pubDate>
</item>
</channel>
</rss>
I () sayisinda strip_tags yorumsuz zaman yazı etiketi kaybolur.