Blog Of Sem: Python 27 beatifulsoap siteden tüm linkleri çek

Python 27 beatifulsoap siteden tüm linkleri çek

önce cmd satırında pip.exeyi bul ve easy_install bs4
pip install lxml


from bs4 import BeautifulSoup
import urllib2

resp = urllib2.urlopen("http://www.gpsbasecamp.com/national-parks")
soup = BeautifulSoup(resp,  "lxml")

for link in soup.find_all('a', href=True):
    print link['href']



alternatif

from bs4 import BeautifulSoup
import urllib
import re

html_page = urllib.urlopen("http://arstechnica.com")
soup = BeautifulSoup(html_page, "lxml")
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
    print link.get('href')