크롤링 ) requests + beautifulSoup

Notice

Recent Posts

Recent Comments

Link

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

EEYatHo 앱 깎는 이야기

크롤링 ) requests + beautifulSoup 본문

Server/크롤링

크롤링 ) requests + beautifulSoup

EEYatHo 2023. 3. 15. 22:22

requests + beautifulSoup

크롤링을 하는 한 방법
html을 쉽게 불러올 수 있는 requests 라이브러리와,
html을 사용하기 쉽게 파싱해주는 beautifulSoup 라이브러리를 사용하여,
원하는 태그를 찾고 데이터를 크롤링함

한계점
- 로그인이 필요한 페이지를 크롤링하기 매우 힘듦 ( 세션관리.. )
- 동적페이지를 크롤링 할 수 없음 ( 동적 페이지는 selenium 사용 )

사용법

페이지 로딩 및 태그 선택

import requests
from bs4 import BeautifulSoup

# html 가져오기
response = requests.get("https://www.naver.com")
html = response.text

# BeautifulSoup 를 사용
# html 을 사용하기 좋게 파싱
soup = BeautifulSoup(html, 'html.parser')

# id == NM_set_home_btn 인 태그를 찾기
word = soup.select_one("#NM_set_home_btn")

# 태그가 가진 텍스트 값 출력
print(word.text)

# "네이버를 시작 페이지로"

태그의 속성값 가져오기
- text = 태그의 내용
- attrs[key] = 태그의 속성 값

links = soup.select(".news_tit")

for link in links:
    title = link.text 			# 태그안에 text요소를 가져온다
    url = link.attrs['href'] 	# 속성 값 중 href를 가져온다
    print(title, url)

저작자표시

'Server > 크롤링' 카테고리의 다른 글

크롤링 ) Selenium (2)	2023.03.15
크롤링 ) 기본지식 (0)	2023.03.15

'Server/크롤링' Related Articles

Comments

EEYatHo 앱 깎는 이야기

크롤링 ) requests + beautifulSoup 본문

크롤링 ) requests + beautifulSoup

requests + beautifulSoup

사용법

'Server > 크롤링' 카테고리의 다른 글

티스토리툴바