-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathPython_basic.qmd
More file actions
84 lines (61 loc) · 1.99 KB
/
Python_basic.qmd
File metadata and controls
84 lines (61 loc) · 1.99 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: "Python_basic"
format: html
---
# Code form "Basic Webscraper" by John Watson Rooney
[Source YouTube](https://www.youtube.com/watch?v=APukTnnwQEY&t=301s)
```{r}
library(reticulate)
py_install("bs4", method="conda" )
py_install("pandas", method="conda" )
py_install("selenium", method="conda" )
```
```{python}
#| label: Scrape books from eshop
import requests
from bs4 import BeautifulSoup
import pandas as pd
book_list = []
for x in range (1, 3):
print(x)
url = f'http://books.toscrape.com/catalogue/page-{x}.html'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
article = soup.find_all('article', class_ = 'product_pod')
for book in article:
title = book.find_all('a')[1]['title']
price = book.find('p', class_ = 'price_color').text[2:]
instock = book.find('p', class_ = 'instock availability').text.strip()
book = {
'title': title,
'price': price,
'instock': instock
}
book_list.append(book)
print(len(book_list))
print(book_list[1:4])
df = pd.DataFrame(book_list)
print(df)
```
```{python}
#| label: Selenium installation with Chrome Webdriver
# Source: https://www.youtube.com/watch?v=pUUhvJvs-R4
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
drv_path = "C:\\Users\\I0172484\\Documents\WebDriver\\chromedriver.exe"
driver = webdriver.Chrome(drv_path)
url="http://the-internet.herokuapp.com/login"
driver.get(url)
driver.find_element("xpath", '//*[@id="username"]').send_keys('tomsmith')
driver.find_element("xpath", '//*[@id="password"]').send_keys('SuperSecretPassword!')
driver.find_element("xpath", '//*[@id="login"]/button').click()
# driver.quit()
# Wait for response part
url02="https://the-internet.herokuapp.com/dynamic_loading/2"
driver.get(url02)
driver.find_element("xpath", '//*[@id="start"]/button').click()
driver.implicitly_wait(10)
text = driver.find_element("xpath", '//*[@id="finish"]/h4').text
print(text)
```