At Elson S. Floyd College of Medicine, I had the opportunity to teach a brief course on R for analysts in the Evaluation and Assessment units. I wanted to give them a working knowledge that would both hit all the scripting basics and highlight important parametric and non-parametric tests. To make the lessons more interesting, I provided some background on the history of computing, the Internet and the cloud.
Clear, concise and all on one slide, I simplified and summarized the functions of the CPU and defined looping, stacking, LIFO, pointers and functions.
A quick, easy-to-comprehend explanation of the magic of TCP/IP, and how and why it revolutionized networking (and, eventually, our lives).
Differentiating IaaS, PaaS and SaaS and providing meaningful, relatable examples.
Quickly summarizing in two slides the commonalities of programming languages and how they can be built into data structures & algorithms.
Summarizing what kind of language R is, where it excels, where it is growing and who provides and maintains it.
If - Then - Else, For, While and the Next and Break Statements, in a simple, engaging and easy-to-understand way.
Defining functions and signatures
Defining scope and providing an example (the output at the bottom doesn't appear until the spacebar is pressed)
and, finally, DataFrames - the common R data structures
Physics, CS / Stats and Mathematical definitions
Summarizing the basics of vector mathematics with some great visuals from 3Blue1Brown.
Two and three-dimension vector representation and how to add vectors.
Linear transformation and matrix multiplication.
Determinant and Eigenvector definitions.
Dimension reduction definition using eigenvectors and eigenvalues.
I learned Regular Expressions back in the late 1990s. However, many younger people lack the patience for such esoteric syntax, as well they might. So, in this lesson, I introduced the analysts to Hadley Wickham's TidyVerse with some slides acquainting them with StringR and REBUS - a great alternative to RegEx that accomplishes the same ends in a more readable way.
Including differences between indexing in R and python!
A fun little exercise using str_ functions to collect the longer than average words!
Just a couple of examples from the Rebus lesson.
Some Examples of Using the TidyVerse for Data Preparation
dplyr's mutate() function
the summarize() function for summary statistics
Writing to csv or Excel files
Making data tidy with gather()
A little bit on teaching joins
Quick review of joining rules
Filtering joins, one of the three classes of joins in dplyr
Summarizing All the Basics in Six Slides
Continuous, discrete, categorical, ordinal, nominal, skewed, normal
mean, variance, standard deviation, standard error, median, mode, min, max, range, quantile, IQR, skew and kurtosis
estimate, hypothesis testing, p-value, robustness, sensitivity testing, type 1 error, type 2 error and power
correlation, T-testing, Chi-squared, ANOVA, CMH, Fisher's exact and linear and logistic regression
Mann Whitney U / Wilcoxon Sum Rank, Wilcoxon Signed Rank, Kruskal-Wallis
R-Squared, RMSE, ROC / AUC, AIC, cross validation, confusion matrix and A/B testing
Using R for Statistical Summaries, Correlation Charts, Distribution Testing, and Power Calculation
Using group_by with the summarize() function
Using PerformanceAnalytics' chart.Correlation() function
Are my data normally distributed?
Significance level, type I and II error and Effect Size
Comparing the means of groups (where the data are normally distributed)
Due to the focus on categorical, qualitative, and survey analysis in the Evaluation and Assessment groups, this course focused primarily on introducing analysts to tests that are performed on categorical and non-normally-distributed data.
aka Wilcoxon Rank Sum Test; Tests whether mean ranks differ between groups - a nonparametric t-test
A nonparametric ANOVA
A nonparametric alternative to the paired t-test; are these distributions statistically different?
Using the Chi-Squared Test of Independence allows us to compare observed and expected frequencies among groups
A Chi-Square Test with Very Small Frequencies
A Chi-Squared Test that examines the significance of the association between two categorical variables when stratified by a third variable (often a confounder)
To view assignment answers, see R-Studio notebooks and toy data sets available on GitHub
References used in the development of these lessons include:
Paul D. Crutcher, Neeraj Kumar Singh, and Peter Tiegs: Essential Computer Science, A Programmer’s Guide to Foundational Concepts APress / Springer Nature, 2021 https://doi.org/10.1007/978-1-4842-7107-0
Garrett Grolemund: Hands On Programming in R O'Reilly and Associates, 2014 http://bit.ly/HandsOnR
Mahoney, M. (2019). Introduction to Data Exploration and Analysis with R. Bookdown.
Retrieved from https://bookdown.org/mikemahoney218/IDEAR/
Soetewey, A. (2022). Stats and R (Blog). https://statsandr.com/
Homer White: Beginning Computer Science with R Georgetown College https://homerhanumat.github.io/r-notes/index.html
Wickham, C., Jeon, T. & Cotton, R. (2021). String Manipulation with stringr in R; DataCamp https://app.datacamp.com/learn/courses/string-manipulation-with-stringr-in-r
and
Wickham, H. & Grolemund, G. (2017) R for Data Science, O'Reilly and Associates Freely accessible here: https://r4ds.had.co.nz/index.html