Question

Python en PHP'nin strip_tags eşdeğer?

http://php.net/manual/en/function.strip-tags.php

Answer 1

PHP Web odaklı bir dil olarak başlamış iken Python genel amaçlı bir dildir, çünkü There is no such thing in the Python standard library. Bu.

Nevertheless, you have 3 solutions:

Aceleniz: sadece kendi yapmak. re.sub(r'<[^>]*?>', '', value) hızlı ve kirli bir çözüm olabilir.
Bir üçüncü taraf kitaplığı kullan (önerilir, çünkü daha fazla kurşun geçirmez): beautiful soup gerçekten iyi biridir ve yüklemek için bir şey yok, sadece lib dir ve ithalat kopyalayın. Full tuto with beautiful soup.
Bir çerçeve kullanın. Çoğu Web Python Devs, bu tür sizin için otomatik olarak bu şeyleri yapar django gibi bir çerçeve sıfırdan kod asla kullanmayın. Full tuto with django.

Answer 2

Kullanımı BeautifulSoup

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(htmltext)
''.join([e for e in soup.recursiveChildGenerator() if isinstance(e,unicode)])

Answer 3

Python bir web geliştirme dili daha genel amaçlı bir betik dili daha olduğundan yerleşik PHP HTML işlevler için birçok yerleşik Python eşdeğer bulamazsınız. HTML işleme için, BeautifulSoup genellikle tavsiye edilir.

Answer 4

Python yerleşik bir tane değil, bir ungodly number of implementations vardır.

Answer 5

Ben HTMLParser sınıfını kullanarak Python 3 tane inşa. Bu PHP'nin daha ayrıntılı olduğunu. Ben HTMLCleaner sınıfı denir ve here kaynağını bulabilir ve here örnekler bulabilirsiniz.

Answer 6

Bunun için etkin bir devlet tarifi yoktur,

http://code.activestate.com/recipes/52281/

Eğer açıklamalarda belirtildiği gibi HTMLparser için sgml ayrıştırıcı değiştirmek zorunda böylece eski kod

İşte değiştirilmiş kod,

import HTMLParser, string

class StrippingParser(HTMLParser.HTMLParser):

    # These are the HTML tags that we will leave intact
    valid_tags = ('b', 'a', 'i', 'br', 'p', 'img')

    from htmlentitydefs import entitydefs # replace entitydefs from sgmllib

    def __init__(self):
        HTMLParser.HTMLParser.__init__(self)
        self.result = ""
        self.endTagList = []

    def handle_data(self, data):
        if data:
            self.result = self.result + data

    def handle_charref(self, name):
        self.result = "%s&#%s;" % (self.result, name)

    def handle_entityref(self, name):
        if self.entitydefs.has_key(name): 
            x = ';'
        else:
            # this breaks unstandard entities that end with ';'
            x = ''
        self.result = "%s&%s%s" % (self.result, name, x)

    def handle_starttag(self, tag, attrs):
        """ Delete all tags except for legal ones """
        if tag in self.valid_tags:       
            self.result = self.result + '<' + tag
            for k, v in attrs:
                if string.lower(k[0:2]) != 'on' and string.lower(v[0:10]) != 'javascript':
                    self.result = '%s %s="%s"' % (self.result, k, v)
            endTag = '</%s>' % tag
            self.endTagList.insert(0,endTag)    
            self.result = self.result + '>'

    def handle_endtag(self, tag):
        if tag in self.valid_tags:
            self.result = "%s</%s>" % (self.result, tag)
            remTag = '</%s>' % tag
            self.endTagList.remove(remTag)

    def cleanup(self):
        """ Append missing closing tags """
        for j in range(len(self.endTagList)):
                self.result = self.result + self.endTagList[j]    


def strip(s):
    """ Strip illegal HTML tags from string s """
    parser = StrippingParser()
    parser.feed(s)
    parser.close()
    parser.cleanup()
    return parser.result

Python en PHP'nin strip_tags eşdeğer?

6 Cevap

etiketler