Gmail bu bilgi almak için herhangi bir API sağlamaz beri bazı web scraping yapmak istiyorum gibi, bu sesler.
Web scraping (also called Web
harvesting or Web data extraction) is
a computer software technique of
extracting information from websites
Önce bağlanan wikipedia makalede belirtildiği gibi, bunu yapmanın birçok yolu vardır:
Human copy-and-paste: Sometimes even
the best Web-scraping technology can
not replace human’s manual examination
and copy-and-paste, and sometimes this
may be the only workable solution when
the websites for scraping explicitly
setup barriers to prevent machine
automation.
Text grepping and regular expression
matching: A simple yet powerful
approach to extract information from
Web pages can be based on the UNIX
grep command or regular expression
matching facilities of programming
languages (for instance Perl or
Python).
HTTP programming: Static and dynamic
Web pages can be retrieved by posting
HTTP requests to the remote Web server
using socket programming.
DOM parsing: By embedding a
full-fledged Web browser, such as the
Internet Explorer or the Mozilla Web
browser control, programs can retrieve
the dynamic contents generated by
client side scripts. These Web browser
controls also parse Web pages into a
DOM tree, based on which programs can
retrieve parts of the Web pages.
HTML parsers: Some semi-structured
data query languages, such as the XML
query language (XQL) and the
hyper-text query language (HTQL), can
be used to parse HTML pages and to
retrieve and transform Web content.
Web-scraping software: There are many
Web-scraping software available that
can be used to customize Web-scraping
solutions. These software may provide
a Web recording interface that removes
the necessity to manually write
Web-scraping codes, or some scripting
functions that can be used to extract
and transform Web content, and
database interfaces that can store the
scraped data in local databases.
Semantic annotation recognizing: The
Web pages may embrace metadata or
semantic markups/annotations which can
be made use of to locate specific data
snippets. If the annotations are
embedded in the pages, as Microformat
does, this technique can be viewed as
a special case of DOM parsing. In
another case, the annotations,
organized into a semantic layer2,
are stored and managed separated to
the Web pages, so the Web scrapers can
retrieve data schema and instructions
from this layer before scraping the
pages.
Ben devam etmeden önce ve legal implications Tüm bu lütfen unutmayın. Bu Gmail'in şartlarına uyumlu olup olmadığını bilmiyorum ve ben ileri gitmeden önce onları kontrol öneriyoruz. Ayrıca fişleniyor sonuna kadar veya bu gibi diğer sorunlarla karşılaşabilirsiniz.
Tüm bu söyleniyor, ben sizin durumunuzda örümcek ve gmail oturum ve istediğiniz verileri bulmak için DOM parser çeşit gerektiğini söyleyebilirim. Bu aracın seçimi, teknoloji yığını bağlıdır.
Bir yakut dev olarak, Mechanize ve nokogiri kullanarak gibi. PHP kullanarak Sphider gibi çözümlere bakmak olabilir.