This tutorial outlines how to extract google news with R programming language. It is useful when you need to show newsletter of the topic you are interested to see in the dashboard. In Google news you can search news with the keywords of your interest.
Make sure to install rvest, dplyr and xml2 R packages before running the following script. The script returns the following columns (information).
Title : Headline of the article Link : URL of the article Description : 1 or 2 lines summary of the article Source : Name of the Original Content Creator Time : When article was published
news
require(dplyr)
require(xml2)
require(rvest)
html_dat
dat %
html_nodes(‘.VDXfz’) %>%
html_attr(‘href’)) %>%
mutate(Link = gsub(“./articles/”,”https://news.google.com/articles/”,Link))
news_dat Title = html_dat %>%
html_nodes(‘.DY5T1d’) %>%
html_text(),
Link = dat$Link,
Description = html_dat %>%
html_nodes(‘.Rai5ob’) %>%
html_text()
)
# Extract Source and Time (To avoid missing content)
prod Source norm % html_text() ,
error=function(err) {NA})
})
time norm % html_text(),
error=function(err) {NA})
})
mydf dff % distinct(Time, .keep_all = TRUE)
return(dff)
}
newsdf %20 refers to space between the two words. Read MoreListenData