Python Newbie
Member
- Jul 17, 2017
- 77
- 50
import requests
url="https://www.bloomberg.com/quote/SPX:IND"
biz_r=requests.get(url)
from bs4 import BeautifulSoup
biz_soup=BeautifulSoup(biz_r.text,'html.parser')
print(biz_soup.find_all('span', attrs={'class': 'priceText__1853e8a5'}))
most likely unakua detected as 'bot'
cheki na Selenium inamimick 'web browser' kwenye scriptThanks...though nayo imekataa...
Thanks...cheki na Selenium inamimick 'web browser' kwenye script
Sure...kwa code zinazofanana nimeweza kuscrape zoomtanzaniamost likely unakua detected as 'bot'
tafuta tutorial za ku ping kwa Selenium.....
import requests
url="https://www.zoomtanzania.com/medicine-pharmaceutical-jobs"
zoom_r=requests.get(url)
zoom_r.text
from bs4 import BeautifulSoup
zoom_soup=BeautifulSoup(zoom_r.text,'html.parser')
title_zoom=zoom_soup.findAll('div', attrs={'class':'listing-card__header__title'})
print (title_zoom)
Sure...kwa code zinazofanana nimeweza kuscrape zoomtanzania
The code ipo Poa ni kwamba the CAPCHA verification Ndio inazuiia baadae nitatafuta Jinsi ya kudecode (nakumbuka nilona sehemu) but kwa nilivyo ona the best solution ni kutumia API yao japokuwa ina itaji build toolsKwa nini hizi codes zinarudisha 'None'. Nimescrape kupata price ya hiyo item S%P 500 Index
Code:import requests url="https://www.bloomberg.com/quote/SPX:IND" biz_r=requests.get(url) from bs4 import BeautifulSoup biz_soup=BeautifulSoup(biz_r.text,'html.parser') print(biz_soup.find_all('span', attrs={'class': 'priceText__1853e8a5'}))
Asante kwa shule...njia nyepesi ya ku detect bot kwenye web yako ni kutumia
onmousemove()
kwamba kwa human wa kawaida akiingia kwenye iyo site lazima mouse ita move japo kidogo, ila kwa scraping script, hakuna functon za kumuvuzisha iyo mouse, ndio maana unakua detected as a 'bot'
zoom hawajaweka anti scraping scripts kwenye page zao
Cool...ukija na solution uje utupe shule hapaThe code ipo Poa ni kwamba the CAPCHA verification Ndio inazuiia baadae nitatafuta Jinsi ya kudecode (nakumbuka nilona sehemu) but kwa nilivyo ona the best solution ni kutumia API yao japokuwa ina itaji build tools
Full disclosure I will try kutafuta njia through the sitemap na pia a way throuh the Capcha na api (nadhan ninayo biuld toolls installed)
Sitemap, selenium, na capcha zote zimekataa but nimepata an alternative za ku retrive data mpaka nipate njia nyingine ipi ni kutumia AlphavantageCool...ukija na solution uje utupe shule hapa
Thanks kiongozi...Sitemap, selenium, na capcha zote zimekataa but nimepata an alternative za ku retrive data mpaka nipate njia nyingine ipi ni kutumia Alphavantage
Kwa sababu ya kukosa u serious tumehamia Slack (cheki kwenye signature yangu)
Sure...kwa code zinazofanana nimeweza kuscrape zoomtanzania
SureNadhani Bloomberg wamezuia makusudi kupunguza traffic ya scrappers kwa sababu ya aina ya data walizonazo zina interest ya data scientists wengi
Thanks...But hii link haifanyi kaziKwa sababu ya kukosa u serious tumehamia Slack (cheki kwenye signature yangu)
This invite link is no longer active.
user-agent : Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36
origin : https://www.bloomberg.com
referer: https://www.bloomberg.com/quote/SPX:IND
cookie : <cookie from chrome browser>
Thanks...But hii link haifanyi kazi
Umeinasa kama ulimbo aisee!Cookie hiyo unaipata kwenye chrome browser, visit hiyo site in the browser, right click nenda inspect, fungua network, refresh page then uclick the first link uangalie Headers walizotuma, kuna cookie pale, include hiyo kwenye header basi, watadetect kua you are human from there. Kazi kwisha problem solved.
Lazima kutakuwa na modules tu za kufanya hivyo. Sijaijaribu hii lakini maelezo yanaonesha inaweza kuwa msaadaHiyo cookie hujui lini ita-expire, so iki-expire probably ukihitaji kuscrape itagoma, so tafuta jinsi ya kupata cookie jar na kurefresh ujue wanavyoigenerate. Au njia nyingine ndefu unaweza tumia libraries ambazo zinascrape kwa kujifanya browser inakua na implementations zote done for you.