Writing scripts
Include basic information on a script: give it a name, say when you created it, note where it ran last, etc This will help you to trace different versions, to cope with problems that arise with software updates, and to attribute your work to you.
###################################################################################################
# File-Name: advR_workFlow.r
# Date: 20/03/27
# Author: DD
# Machine: DD's MacBook Air
###################################################################################################
Executing the various commands in R requires to install and load additional packages to provide you with more tools than the ones offered by Base R (and beyond the packages RStudio loads automatically). Some packages you will need almost every time, some only occassionaly. Keeping a running list at the beginning of your script helps to have them handy and also to remember when you extended your tool kit into using a new package. This is the way to make your analysis sharable and replicable! When you share a file, avoid commands that change settings on others’ computers (e.g. with # r install.packages
or # r setwd()
).
First, define a running list of potentially required packages, then install if necessary # r for (pkg in pkgs) install.packages(pkg, character.only = TRUE)
and load packages. If you do not want to load a package because you only need to reference it once, type #r dplyr::mutate(data,newVariable=1)
, which will make installed packages, here dplyr, available for just that one command called.
Occassionaly you will be thrown error messages related to outdated version of packages; it helps to #r remove.packages('name_package')
and re-install. Occassionaly you will also load packages that use the same name for a command that does different things. With the conflicted-package, you can tell R which command from which package to prefer.
pkgs <- c("tidyverse","dplyr","ggplot2",'conflicted','rmarkdown')
for (pkg in pkgs) library(pkg, character.only = TRUE)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
conflict_prefer("filter", "dplyr")
## [conflicted] Will prefer dplyr::filter over any other package
conflict_prefer("select", "dplyr")
## [conflicted] Will prefer dplyr::select over any other package
conflict_prefer("geom_errorbarh", "ggplot2")
## [conflicted] Will prefer ggplot2::geom_errorbarh over any other package
data <- read.csv('fakeData.csv')
#ggplot(dota=data) + geom_point(mapping=aes(x=varInd,y=var))
ggplot(data=data) + geom_point(mapping=aes(x=varInd,y=var))
#fliter(data,var>5)
filter(data,var>3)
## var cat varCorr varWeakCorr varInd varNonMon varBiRaw varOutlier
## 1 5 0 6.802379 8.679899 5.445352 12.75 0.6922094 -1.666667
## 2 15 0 12.806620 18.433080 9.987334 2.75 0.1257472 50.000000
## 3 12 1 11.591310 8.500076 7.071755 11.00 0.6499108 -4.000000
## 4 6 1 1.205441 1.811344 3.001570 14.00 0.6948789 -2.000000
## 5 14 1 18.475360 17.343190 11.915390 6.00 0.3620667 -4.666667
## 6 9 0 9.927052 18.557400 5.372198 14.75 0.1283483 -3.000000
## 7 11 0 14.020340 14.084650 8.714008 12.75 0.4842256 -3.666667
## 8 15 1 15.447200 14.447050 11.565670 2.75 0.1523629 50.000000
## 9 13 1 14.138830 15.123470 10.162380 8.75 0.1706715 -4.333333
## 10 13 0 8.874312 13.891430 12.218340 8.75 0.6864638 -4.333333
## varOutlierNoise varBi varExp
## 1 15.591280 1 2.718282
## 2 68.350900 0 20.085540
## 3 -0.429595 1 11.023180
## 4 13.922400 1 3.320117
## 5 -4.481500 0 16.444650
## 6 3.763536 0 6.049647
## 7 9.527660 0 9.025014
## 8 61.923960 0 20.085540
## 9 9.865591 0 13.463740
## 10 3.943735 1 13.463740