初見爬蟲

今日打算查下邊度有得睇銀行嘅定期利率走勢，但 Google 完都冇我想要嘅嘢。諗起之前 interview 嗰陣有傾開爬蟲可以用嚟做好多嘢，於是就想試下用嚟整合我想要嘅資訊。

^background

Python

一講爬蟲，多數人都係黐埋 Python 一齊講。Python 可以做幾多嘢素有聽聞，但我一直以嚟都冇乜點接觸過。以下係一啲學爬蟲會用到嘅 packages。

Requests

簡單 GET request，適合攞 JSON results。

import requests
res = requests.get('https://belloah.gitlab.io/')
print(res.text)

Selenium WebDriver

模擬 browser，適合攞 loaded page 嘅 output，所謂嘅動態網路爬蟲。

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
driver.get('https://belloah.gitlab.io/')
pageSource = driver.page_source
print(pageSource)
driver.quit()

BeautifulSoup

Pulling data out of HTML and XML files，爬蟲常用 package。

import requests
res = requests.get('https://belloah.gitlab.io/')
from bs4 import BeautifulSoup
soup = BeautifulSoup(res.text, 'html.parser')
soup.title.string

接住落嚟會先熟習下呢三個 package，搵啲網頁嚟試下拎 data，之後再睇下有冇其他 package 可以處理爬返嚟嘅 data。

- 完 -

Python🍌

Requests🍌

Selenium WebDriver🍌

BeautifulSoup🍌

Python

Requests

Selenium WebDriver

BeautifulSoup