Category Archives: r

A text mining function for websites

For one of my projects I needed to download text from multiple websites. In this case, I used rvest and dplyr. Accessing the information you want can be relatively easy if the sources come from the same websites, but pretty tedious when the websites are heterogenous. The reason is how the content is kept in the HTML of the website (Disclaimer: I am not an expert at all on HTML or anything website related). Assume that you want to extract the title, author information, publish date, and of course the main article text. You can identify the location of that information via Cascading Style Sheets (CSS) or XML Path Language (XPath). As soon as you have the CSS or XPath locations, you can access it in R. The following text will walk you through an example and provide the relevant code.

Continue reading A text mining function for websites

Using RStudio and LaTeX

This post will explain how to integrate RStudio and LaTeX, especially the inclusion of well-formatted tables and nice-looking graphs and figures produced in RStudio and imported to LaTeX. To follow along you will need RStudio, MS Excel and LaTeX.

Continue reading Using RStudio and LaTeX

Using RStudio and Git version control

lIt is fairly easy to link Github or Bitbucket with RStudio, in order to enable version control, or in order to work collectively on a data project, science article, or book. It can also be used to make your data or project publicly accessible (however, there is no guarantee that it will be accessible forever, and also it doesn’t get a DOI, so e.g. OSF might be a better alternative).

Github and Bitbucket are web-based filehosts that support the version control Git. Git allows you to track changes to files, to revert files to earlier stages, and to work on files in groups. This makes it especially important for work among programmers, data analysists, and also researchers. Github and Bitbucket store all the information on different versions of your project on their server, so that others can see exactly what others on the same projects worked on, or changed.

This post will explain to you how to set up Github and Bitbucket with RStudio in order to enable version control and storage in an external repository. In nerd-speak, it explains how to “push your commits to an external repo”. Note the main differences between Github and Bitbucket relevant to this post are that the former allows you to create a public repo free of charge, while the latter allows you to create a private repo free of charge. Choose one of both platforms (or both) so that it suits your needs.

I am not going to explain how to download, install, or set up Git on your computer. I expect that you did all that and now want to link it to RStudio.

Continue reading Using RStudio and Git version control