Brought to you by molecularsciences.org.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License.
This publication may not be redistributed without this notice.

R Quickstart

R can be downloaded from http://cran.r-project.org. On Ubuntu Linux, type sudo apt-get r-recommended to install R. The type R on the commandline to use R.

> 2+2
[1] 4
> 3-5
[1] -2
> 3*2
[1] 6
> 7/3
[1] 2.333333
> 8^2
[1] 64
> pi
[1] 3.141593

Storing in variables

> radius <- 24
> area <- 2*pi*radius
> area
[1] 150.7964

Vectors

> math <- c(60,90,34)
> science <- c(56,98,76)
> english <- c(34,98,22)
> avg_grades <- (math + science + english) / 3
> avg_grades
[1] 50.00000 95.33333 44.00000

Graphical summaries
- For a single categorical variables, we use bar plots and dot plots
- For single numerical variables, we use histograms and boxplots
- For two numerical variables, we use scatterplot

Histogram
A histogram is a special kind of bar plot. It is used for visualizing the distribution of values of a numerical variable. When drawn with a density scale, the area of each bar is the proportion of observations in the interval. Height represents density where the total area is 100%.

Type the following for help on histogram

> ?hist

Example

> par(mfrow=c(2,2))
> simdata <- rchisq(100,8)
> hist(simdata)
> hist(simdata,breaks=2)

Mean is appropriate for distributions that are fairly symmetrical.

> mean(math)

The median is the middlemost number. Half of the values are greater than the median and the other half are smaller. Median is usually more appropriate summary for skewed distributions.