I am trying to scrape player’s name and their ratings on this website:
After scraping then I put the data in a csv. But, it does not scrape consistently. I probably have to run the script more than once(2-5 times) to get it to scrape the data. This also happens when I try to scrape for other matches. For example, if I get the data from 3 matches, probably it will only scrape the first match and doesn’t scrape the remaining data for other pages. Here is my code:
from bs4 import BeautifulSoup from selenium import webdriver match_link='https://www.whoscored.com/Matches/1549539/Live/England-Premier-League-2021-2022-Brentford-Arsenal.' driver=webdriver.Chrome('C:\\Program Files (x86)\\chromedriver.exe') driver.get(match_link) soup=BeautifulSoup(driver.page_source,'html.parser') Players_list= Player_rating= try: player_name=soup.select('a.player-link span.iconize.iconize-icon-left') player_rating=soup.select('td.rating') #print('------------getting player name and ratings-----------') for nme in player_name: #print(nme.text) Players_list.append(nme.text) for rat in player_rating: #print(rat.text) Player_rating.append(rat.text) except: print('NO ELEMENT') Players_list=pd.DataFrame(Players_list) Player_rating=pd.DataFrame(Player_rating) df=pd.concat([Players_list,Player_rating],axis=1) df.to_csv('brentford-arsenal.csv')
It doesn’t raise an error. It just returns an empty results (meaning the data wasn’t scraped).
Empty DataFrame Columns:  Index: 
You should add a wait for the page to render.
driver = webdriver.Chrome('C:\\Program Files (x86)\\chromedriver.exe') driver.get(match_link) driver.implicitly_wait(3)
Also, if you always want to use the latest chrome driver, then the new web manager auto-detects when a new driver is ready and caches it. To install manager:
pip install webdriver-manager
from selenium import webdriver from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager s = Service(ChromeDriverManager().install()) driver = webdriver.Chrome(service=s) # web driver goes to page driver.get(match_link) # to give time for the page to load driver.implicitly_wait(5) ...