Python Coders...Msaada

Python Coders...Msaada

Python Newbie

Member
Joined
Jul 17, 2017
Posts
77
Reaction score
50
Kwa nini hizi codes zinarudisha 'None'. Nimescrape kupata price ya hiyo item S%P 500 Index
Code:
import requests
url="https://www.bloomberg.com/quote/SPX:IND"
biz_r=requests.get(url)
from bs4 import BeautifulSoup
biz_soup=BeautifulSoup(biz_r.text,'html.parser')
print(biz_soup.find_all('span', attrs={'class': 'priceText__1853e8a5'}))
 
Hili Jukwaa halina coders siku hizi..@MaxMase Stefano Mtangoo Graph Arduino Sentinel
most likely unakua detected as 'bot'

tafuta tutorial za ku ping kwa Selenium au jarib

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

url="https://www.bloomberg.com/quote/SPX:IND"
biz_r=requests.get(url,headers =headers)
from bs4 import BeautifulSoup
biz_soup=BeautifulSoup(biz_r.text,'html.parser')
print(biz_soup.find_all('span', attrs={'class': 'priceText__1853e8a5'}))
 
most likely unakua detected as 'bot'

tafuta tutorial za ku ping kwa Selenium.....
Sure...kwa code zinazofanana nimeweza kuscrape zoomtanzania
Code:
import requests
url="https://www.zoomtanzania.com/medicine-pharmaceutical-jobs"
zoom_r=requests.get(url)
zoom_r.text
from bs4 import BeautifulSoup
zoom_soup=BeautifulSoup(zoom_r.text,'html.parser')
title_zoom=zoom_soup.findAll('div', attrs={'class':'listing-card__header__title'})
print (title_zoom)
 
Sure...kwa code zinazofanana nimeweza kuscrape zoomtanzania

njia nyepesi ya ku detect bot kwenye web yako ni kutumia

onmousemove()

kwamba kwa human wa kawaida akiingia kwenye iyo site lazima mouse ita move japo kidogo, ila kwa scraping script, hakuna functon za kumuvuzisha iyo mouse, ndio maana unakua detected as a 'bot'

zoom hawajaweka anti scraping scripts kwenye page zao
 
Kwa nini hizi codes zinarudisha 'None'. Nimescrape kupata price ya hiyo item S%P 500 Index
Code:
import requests
url="https://www.bloomberg.com/quote/SPX:IND"
biz_r=requests.get(url)
from bs4 import BeautifulSoup
biz_soup=BeautifulSoup(biz_r.text,'html.parser')
print(biz_soup.find_all('span', attrs={'class': 'priceText__1853e8a5'}))
The code ipo Poa ni kwamba the CAPCHA verification Ndio inazuiia baadae nitatafuta Jinsi ya kudecode (nakumbuka nilona sehemu) but kwa nilivyo ona the best solution ni kutumia API yao japokuwa ina itaji build tools

Full disclosure I will try kutafuta njia through the sitemap na pia a way throuh the Capcha na api (nadhan ninayo biuld toolls installed)
 
njia nyepesi ya ku detect bot kwenye web yako ni kutumia

onmousemove()

kwamba kwa human wa kawaida akiingia kwenye iyo site lazima mouse ita move japo kidogo, ila kwa scraping script, hakuna functon za kumuvuzisha iyo mouse, ndio maana unakua detected as a 'bot'

zoom hawajaweka anti scraping scripts kwenye page zao
Asante kwa shule...
 
The code ipo Poa ni kwamba the CAPCHA verification Ndio inazuiia baadae nitatafuta Jinsi ya kudecode (nakumbuka nilona sehemu) but kwa nilivyo ona the best solution ni kutumia API yao japokuwa ina itaji build tools

Full disclosure I will try kutafuta njia through the sitemap na pia a way throuh the Capcha na api (nadhan ninayo biuld toolls installed)
Cool...ukija na solution uje utupe shule hapa
 
Ni vizuri uwe na tool inaitwa Postman kama unapenda kuscrape, itakurahisishia sana kumake http requests.

Haya kuhusu swala lako, hiyo site nimejaribu kwenye postman naona tatizo wanahisi wewe ni bot kama jamaa juu hapo alivyosema so wanakupush kwenye captcha page. Kuweza kuvuka hiyo ukiweka "User-Agent" kwenye headers kwanza inagoma coz wanataka more information user-agent peke yake haitoshi. Nilizoweka zikakubali ni hizi

Code:
user-agent : Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36

origin : https://www.bloomberg.com

referer: https://www.bloomberg.com/quote/SPX:IND

cookie : <cookie from chrome browser>

Cookie hiyo unaipata kwenye chrome browser, visit hiyo site in the browser, right click nenda inspect, fungua network, refresh page then uclick the first link uangalie Headers walizotuma, kuna cookie pale, include hiyo kwenye header basi, watadetect kua you are human from there. Kazi kwisha problem solved.

Hiyo cookie hujui lini ita-expire, so iki-expire probably ukihitaji kuscrape itagoma, so tafuta jinsi ya kupata cookie jar na kurefresh ujue wanavyoigenerate. Au njia nyingine ndefu unaweza tumia libraries ambazo zinascrape kwa kujifanya browser inakua na implementations zote done for you.
 
Cookie hiyo unaipata kwenye chrome browser, visit hiyo site in the browser, right click nenda inspect, fungua network, refresh page then uclick the first link uangalie Headers walizotuma, kuna cookie pale, include hiyo kwenye header basi, watadetect kua you are human from there. Kazi kwisha problem solved.
Umeinasa kama ulimbo aisee!

Hiyo cookie hujui lini ita-expire, so iki-expire probably ukihitaji kuscrape itagoma, so tafuta jinsi ya kupata cookie jar na kurefresh ujue wanavyoigenerate. Au njia nyingine ndefu unaweza tumia libraries ambazo zinascrape kwa kujifanya browser inakua na implementations zote done for you.
Lazima kutakuwa na modules tu za kufanya hivyo. Sijaijaribu hii lakini maelezo yanaonesha inaweza kuwa msaada

 

Similar Discussions

Back
Top Bottom