Thursday, October 3, 2024
No menu items!
HomeData Analytics and VisualizationWeb scrape google news with R

Web scrape google news with R

This tutorial outlines how to extract google news with R programming language. It is useful when you need to show newsletter of the topic you are interested to see in the dashboard. In Google news you can search news with the keywords of your interest.

Make sure to install rvest, dplyr and xml2 R packages before running the following script. The script returns the following columns (information).

Title : Headline of the article Link : URL of the article Description : 1 or 2 lines summary of the article Source : Name of the Original Content Creator Time : When article was published

news
require(dplyr)
require(xml2)
require(rvest)

html_dat
dat %
html_nodes(‘.VDXfz’) %>%
html_attr(‘href’)) %>%
mutate(Link = gsub(“./articles/”,”https://news.google.com/articles/”,Link))

news_dat Title = html_dat %>%
html_nodes(‘.DY5T1d’) %>%
html_text(),
Link = dat$Link,
Description = html_dat %>%
html_nodes(‘.Rai5ob’) %>%
html_text()
)

# Extract Source and Time (To avoid missing content)
prod Source norm % html_text() ,
error=function(err) {NA})
})

time norm % html_text(),
error=function(err) {NA})
})

mydf dff % distinct(Time, .keep_all = TRUE)

return(dff)
}

newsdf %20 refers to space between the two words. Read MoreListenData

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments