특정 속성만을 가진 태그를 찾는 방법

programing

특정 속성만을 가진 태그를 찾는 방법 - BeautifulSoup

closeapi 2023. 7. 30. 17:49

특정 속성만을 가진 태그를 찾는 방법 - BeautifulSoup

BeautifulSoup을 사용하여 검색하는 속성만 포함하는 태그를 어떻게 검색합니까?

예를 들어, 나는 모든 것을 찾고 싶습니다.<td valign="top">꼬리표

다음 코드:raw_card_data = soup.fetch('td', {'valign':re.compile('top')})

원하는 모든 데이터를 가져오지만 모든 데이터를 가져옵니다.<td>속성이 있는 태그valign:top

저도 해봤어요.raw_card_data = soup.findAll(re.compile('<td valign="top">'))그리고 이것은 아무것도 반환하지 않습니다(나쁜 정규식 때문에).

BeautifulSoup에서 "찾기"라고 말할 수 있는 방법이 있는지 궁금합니다.<td>유일한 속성이 있는 태그valign:top"

예를 들어, HTML 문서에 다음이 포함된 경우<td>태그:

<td valign="top">.....</td><br />
<td width="580" valign="top">.......</td><br />
<td>.....</td><br />

나는 첫 번째 것만 원합니다.<td>태그(<td width="580" valign="top">) 반환합니다.

BeautifulSoup 설명서에 설명된 대로

다음을 사용할 수 있습니다.

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})

편집:

valign="top" 특성만 있는 태그를 반환하려면 태그의 길이를 확인할 수 있습니다.attrs속성:

from BeautifulSoup import BeautifulSoup

html = '<td valign="top">.....</td>\
        <td width="580" valign="top">.......</td>\
        <td>.....</td>'

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})

for result in results :
    if len(result.attrs) == 1 :
        print result

이는 다음과 같습니다.

<td valign="top">.....</td>

사용할 수 있습니다.lambda의 기능.findAll문서에 설명된 바와 같이.그래서 당신의 경우에는 검색할 수 있습니다.td로만 태그합니다.valign = "top"다음을 사용합니다.

td_tag_list = soup.findAll(
                lambda tag:tag.name == "td" and
                len(tag.attrs) == 1 and
                tag["valign"] == "top")

값이 있는 속성 이름으로만 검색하려는 경우

from bs4 import BeautifulSoup
import re

soup= BeautifulSoup(html.text,'lxml')
results = soup.findAll("td", {"valign" : re.compile(r".*")})

Steve Lorimer에 따르면 정규식 대신 True를 통과하는 것이 더 낫습니다.

results = soup.findAll("td", {"valign" : True})

이를 위한 가장 쉬운 방법은 새로운 CSS 스타일을 사용하는 것입니다.select방법:

soup = BeautifulSoup(html)
results = soup.select('td[valign="top"]')

의 주장으로 전달해 주세요.findAll:

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("""
... <html>
... <head><title>My Title!</title></head>
... <body><table>
... <tr><td>First!</td>
... <td valign="top">Second!</td></tr>
... </table></body><html>
... """)
>>>
>>> soup.findAll('td')
[<td>First!</td>, <td valign="top">Second!</td>]
>>>
>>> soup.findAll('td', valign='top')
[<td valign="top">Second!</td>]

Chris Redford와 Amr의 응답 조합을 추가하여 select 명령을 사용하여 임의의 값을 가진 속성 이름을 검색할 수도 있습니다.

from bs4 import BeautifulSoup as Soup
html = '<td valign="top">.....</td>\
    <td width="580" valign="top">.......</td>\
    <td>.....</td>'
soup = Soup(html, 'lxml')
results = soup.select('td[valign]')

특정 속성이 있는 모든 태그를 풀하려는 경우 승인된 응답과 동일한 코드를 사용할 수 있지만 태그 값을 지정하는 대신 True를 입력하면 됩니다.

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : True})

유효한 특성을 가진 모든 td 태그를 반환합니다.이 기능은 프로젝트에서 모든 곳에서 사용되는 div와 같은 태그에서 정보를 가져와야 하지만 사용자가 찾을 수 있는 매우 구체적인 속성을 처리할 수 있는 경우에 유용합니다.

태그에서 속성을 사용하여 찾기

<th class="team" data-sort="team">Team</th>    
soup.find_all(attrs={"class": "team"}) 

<th data-sort="team">Team</th>  
soup.find_all(attrs={"data-sort": "team"})

특정 속성을 가진 다른 줄에 있는 모든 태그의 이름을 인쇄하려면 예를 들어 값에 관계없이 ID 속성을 가진 모든 태그를 인쇄합니다.

from bs4 import BeautifulSoup ;
from bs4 import element ;
html = '!DOCTYPE html><html><head><title>Navigate Parse Tree</title></head>\
<body><h1>This is your Assignment</h1><a href = "https://www.google.com">This is a link that will take you to Google</a>\
<ul><li><p> This question is given to test your knowledge of <b>Web Scraping</b></p>\
<p>Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web.</p></li>\
<li id = "li2">This is an li tag given to you for scraping</li>\
<li>This li tag gives you the various ways to get data from a website\
<ol><li class = "list_or">Using API of the website</li><li>Scrape data using BeautifulSoup</li><li>Scrape data using Selenium</li>\
<li>Scrape data using Scrapy</li></ol></li>\
<li class = "list_or"><a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">\
Clicking on this takes you to the documentation of BeautifulSoup</a>\
<a href="https://selenium-python.readthedocs.io/" id="anchor">Clicking on this takes you to the documentation of Selenium</a>\
</li></ul></body></html>'

data = BeautifulSoup(html, 'html.parser');
for i in data.descendants :
     if type(i) == element.Tag:
        if i.attrs != {} and 'id' in i.attrs:
           print(i.name)

언급URL : https://stackoverflow.com/questions/8933863/how-to-find-tags-with-only-certain-attributes-beautifulsoup

'programing' 카테고리의 다른 글

node.js에 대한 가상 환경이 있습니까? (0)	2023.07.30
조건이 충족되는 경우에만 딕트에 추가 (0)	2023.07.30
awk가 있는 maxscale 다시 쓰기 필터 (0)	2023.07.30
jQuery 검증 필수 선택 (0)	2023.07.30
AJAX 호출의 오류를 어떻게 처리합니까? (0)	2023.07.30

현재글특정 속성만을 가진 태그를 찾는 방법 - BeautifulSoup

각종 프로그래밍 정보를 다루는 블로그입니다.

wordpress, PowerShell, asp.net, jQuery, SWIFT, ReactJS, json, Excel, PYTHON, mysql, angularJS, ajax, mongodb, MariaDB, Android, Oracle, sql-server, Git, spring-boot, C,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

closeapi

특정 속성만을 가진 태그를 찾는 방법 - BeautifulSoup

특정 속성만을 가진 태그를 찾는 방법 - BeautifulSoup

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

특정 속성만을 가진 태그를 찾는 방법 - BeautifulSoup

특정 속성만을 가진 태그를 찾는 방법 - BeautifulSoup

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바