PHP ereg vs gebelik

5 Cevap php

Ben ereg ve preg arasında bir seçim olduğunu PHP regex kütüphanede fark etmiş. Fark nedir? Biri diğerinden daha hızlı ve eğer öyleyse, neden bir daha yavaş önerilmiyor değil mi?

Bu diğer üzerinden kullanmak daha iyidir herhangi bir durum var mı?

5 Cevap

Php.net / ereg ziyaret aşağıdakileri görüntüler:

Uyarı

Bu işlev PHP 5.3.0 ÖNERİLMEMEKTEDİR ve PHP 6.0.0 'da ÇIKARILDI olmuştur. Bu özelliği dayanarak önerilmez.

Sadece biraz daha sayfanın aşağı ve biz bunu okuyun:

Not: Bir Perl uyumlu düzenli ifade sözdizimini kullanan preg_match (), bir often a faster alternative to ereg().

Benim vurgu unutmayın.

preg is the Perl Compatible Regex library
ereg is the POSIX complient regex library

Onlar biraz diffrent sözdizimi var ve gebelik biraz daha hızlı, bazı durumlarda olduğunu. ereg önerilmemektedir (ve PHP6 çıkarılır) bu yüzden kullanılmasını tavsiye etmem.

Daha hızlı ve daha iyi olduğu konusunda çok tartışma var.

Eğer birgün PHP6 ilerleyen planlıyorsanız karar yapılır. Aksi takdirde:

The general consensus is that PCRE is the better all around solution, but if you have a specific page with a lot of traffic, and you don't need PHP6 it may be worth some testing. For example, from the PHP manual comments:

Deprecating POSIX regex in PHP for Perl searching is like substituting wooden boards and brick for a house with pre-fabricated rooms and walls. Sure, you may be able to mix and match some of the parts but it's a lot easier to modify with all the pieces laid out in front of you.

PCRE faster than POSIX RE? Not always. In a recent search-engine project here at Cynergi, I had a simple loop with a few cute ereg_replace() functions that took 3min to process data. I changed that 10-line loop into a 100-line hand-written code for replacement and the loop now took 10s to process the same data! This opened my eye to what can IN SOME CASES be very slow regular expressions. Lately I decided to look into Perl-compatible regular expressions (PCRE). Most pages claim PCRE are faster than POSIX, but a few claim otherwise. I decided on bechmarks of my own. My first few tests confirmed PCRE to be faster, but... the results were slightly different than others were getting, so I decided to benchmark every case of RE usage I had on a 8000-line secure (and fast) Webmail project here at Cynergi to check it out. The results? Inconclusive! Sometimes PCRE are faster (sometimes by a factor greater than 100x faster!), but some other times POSIX RE are faster (by a factor of 2x). I still have to find a rule on when are one or the other faster. It's not only about search data size, amount of data matched, or "RE compilation time" which would show when you repeated the function often: one would always be faster than the other. But I didn't find a pattern here. But truth be said, I also didn't take the time to look into the source code and analyse the problem. I can give you some examples, though. The POSIX RE ([0-9]{4})/([0-9]{2})/([0-9]{2})[^0-9]+ ([0-9]{2}):([0-9]{2}):([0-9]{2}) is 30% faster in POSIX than when converted to PCRE (even if you use \d and \D and non-greedy matching). On the other hand, a similarly PCRE complex pattern /[0-9]{1,2}[ \t]+[a-zA-Z]{3}[ \t]+[0-9]{4}[ \t]+[0-9]{1,2}:[0-9]{1,2}(:[0-9]{1,2})?[ \t]+[+-][0-9]{4}/ is 2.5x faster in PCRE than in POSIX RE. Simple replacement patterns like ereg_replace( "[^a-zA-Z0-9-]+", "", $m ); are 2x faster in POSIX RE than PCRE. And then we get confused again because a POSIX RE pattern like (^|\n|\r)begin-base64[ \t]+[0-7]{3,4}[ \t]+...... is 2x faster as POSIX RE, but the case-insensitive PCRE /^Received[ \t]:[ \t]by[ \t]+([^ \t]+)[ \t]/i is 30x faster than its POSIX RE version! When it comes to case sensitivity, PCRE has so far seemed to be the best option. But I found some really strange behaviour from ereg/eregi. On a very simple POSIX RE (^|\r|\n)mime-version[ \t]: I found eregi() taking 3.60s (just a number in a test benchmark), while the corresponding PCRE took 0.16s! But if I used ereg() (case-sensitive) the POSIX RE time went down to 0.08s! So I investigated further. I tried to make the POSIX RE case-insensitive itself. I got as far as this: (^|\r|\n)[mM][iI][mM][eE]-vers[iI][oO][nN][ \t]: This version also took 0.08s. But if I try to apply the same rule to any of the 'v', 'e', 'r' or 's' letters that are not changed, the time is back to the 3.60s mark, and not gradually, but immediatelly so! The test data didn't have any "vers" in it, other "mime" words in it or any "ion" that might be confusing the POSIX parser, so I'm at a loss. Bottom line: always benchmark your PCRE / POSIX RE to find the fastest! Tests were performed with PHP 5.1.2 under Windows, from the command line. Pedro Freire cynergi.com

Ereg PHP 5.3 önerilmiyor olsa, mb_ereg * işlevleri değildir. Ben PHP6 tüm MB / Unicode desteği yeniden ve mb_ereg yeni / daha iyi olacak çünkü, bu nedenle eski "normal" ereg yöntemler işe yaramaz, çünkü bu ana nedeni olduğuna inanıyorum.

Ben bu hız ile ilgili soru cevap vermez biliyorum, ama bu POSIX ve PCRE'yi hem kullanmaya devam etmek için izin verir.

Eh, ereg ve türev fonksiyonları (ereg_match, vb) php5 önerilmiyor ve PHP6 kaldırılmaktadır, böylece muhtemelen en iyi yerine gebelik ailesi ile gidiyoruz.

ereg standart POSIX regex ise gebelik, Perl-tarzı düzenli ifadeler içindir.