Question

I have a PHP program that, at some point, needs to analyze a big amount of HTML+javascript text to parse info. All I want to parse needs to be in two parts.

Ayrı bütün "HTML goups" ayrıştırmak
Gerekli bilgileri almak için her HTML gruba ayrıştırmak.

1 ayrıştırma o bulmak gerekiyor:

<div id="myHome"

Ve bu etiketi sonra yakalama başlar. Sonra önce yakalamayı durdurmak

<span id="nReaders"

Ve bu etiketi ve durak sonra gelir sayıda yakalamak.

In the 2nd parse use the capture nº 1 (0 has the whole thing and 2 has the number) from the parse made before and then find .

Ben zaten bunu kodu var ve çalışıyor. Bu geliştirmek daha kolay makine ayrıştırmak için yapmak için bir yolu var mı?

preg_match_all('%<div id="myHome"[^>]>(.*?)<span id="nReaders[^>]>([0-9]+)<"%msi', $data, $results, PREG_SET_ORDER);
foreach($results AS $result){
    preg_match_all('%<div class="myplacement".*?[.]php[?]((?:next|before))=([0-9]+).*?<tbody.*?<td[^>]>.*?[0-9]+"%msi', $result[1], $mydata, PREG_SET_ORDER);
//takes care of the data and finish the program

Not: Mümkünse, php uzantılarını kullanmak değil, mümkün olduğunca genel olmak gerekir bu yüzden ücretsiz bir program için bu ihtiyaç ve

ADD: I ommitted some parts here because I didn't expect for answers like those. There is also a need to parse text inside one of the tags that is in the document. It may be the 6th 7th or 8th tag but I know it is after a certain tag. The parser I've checked (thx profitphp) does work to find the script tag. What now? There are more than 1 tag with the same class. I want them all. But I want only with also one of a list of classes..... Where can I find instructions and demos and limitations of DOM parsers (like the one in http://simplehtmldom.sourceforge.net/)? I need something that will work on, at least, a big amount of free servers. Another thing. How do I parse this part: "php?=([0-9]+)" with those HTML parsers?

Bu olabilir gibi verimli olması için bir regex ifadesi geliştirin

0 Cevap

etiketler