R can be downloaded from http://cran.r-project.org. On Ubuntu Linux, type sudo apt-get r-recommended to install R. The type R on the commandline to use R.
> 2+2 [1] 4 > 3-5 [1] -2 > 3*2 [1] 6 > 7/3 [1] 2.333333 > 8^2 [1] 64 > pi [1] 3.141593
Storing in variables
> radius <- 24 > area <- 2*pi*radius > area [1] 150.7964
Vectors
> math <- c(60,90,34) > science <- c(56,98,76) > english <- c(34,98,22) > avg_grades <- (math + science + english) / 3 > avg_grades [1] 50.00000 95.33333 44.00000
Graphical summaries
- For a single categorical variables, we use bar plots and dot plots
- For single numerical variables, we use histograms and boxplots
- For two numerical variables, we use scatterplot
Histogram
A histogram is a special kind of bar plot. It is used for visualizing the distribution of values of a numerical variable. When drawn with a density scale, the area of each bar is the proportion of observations in the interval. Height represents density where the total area is 100%.
Type the following for help on histogram
> ?hist
Example
> par(mfrow=c(2,2)) > simdata <- rchisq(100,8) > hist(simdata) > hist(simdata,breaks=2)
Mean is appropriate for distributions that are fairly symmetrical.
> mean(math)
The median is the middlemost number. Half of the values are greater than the median and the other half are smaller. Median is usually more appropriate summary for skewed distributions.