Rising to Data Science YouTuber Ken Jee's challenge and enjoying every nerdy minute! 🤓
Click on the thumbnails to explore the code on my GitHub, or to expand Google Drive documents.
The kick-off moment
Charting a course to explore more medical-specific biostats applications, healthcare data analyses and using graph algorithms for relationship analysis!
The plan was to focus on healthcare-specific applications 4 hours per day for 66 consecutive days. However, I quickly discovered that while the plan was a good plan, with all sorts of juicy projects and reviews that kept me learning and growing in health data science, it wasn't really feasible to accomplish the entire plan in 66 days. Instead of throwing the plan away, I instead gave myself permission to continue to work at it per diem over time. Â
While the project continues, I'm using this space to document some of the highlights of the wonderful journey!
Kicking things off with a simple polyglot (SAS, python & R) analysis of something in my wheelhouse - the Heart Attacks data set.Â
Using SAS, computed measures of central tendency, performed power calculation, and conducted hypothesis testing on:
Pearson's R Correlation of Age vs. Chol with Outliers Removed
Student T-Test of Males vs. Females, Mean BP Readings
Code Link: Day Two SAS
Using python to complete:
Data type definition
Measures of central tendency
Hypothesis testing
Power calculation
Code Link: Day Two Python
Using Chi-squared and CMH in R to calculate the odds of Coronary Artery Disease, and the odds of CAD given Exercise-Induced Angina in the sample.
Code Link: Day Three R
Defining the types of Validity, calculating Cohen's Kappa & Cronbach's Alpha, and Examining the Types of Healthcare Data; some musings on using HCPCS or Place of Service data in an analysis.
Link: Validity
Networks, graph types, the Power Law, trees, and graph algorithms: pathfinding, centrality and community detection.
Link: Day Six: Graph Algorithms
Jupyter Notebook:Â Introduction to PytorchÂ
Tokenizing, modeling, predicting, rules-based matching, operators & quantifiers
Jupyter Notebook: Learning Advanced NLP with SpaCy
Shortest Path, All Pairs Shortest Path and Minimum Spanning Tree
Resources:
Datacamp https://www.datacamp.com/
Kaggle https://www.kaggle.com/
Khan Academy https://www.khanacademy.org/
Needham, M. & Hodler, A. E. (2019). Graph Algorithms, Practical Examples in Apache Spark & Neo4j. Sebastopol: O'Reilly.
Starmer, J. (2022). The StatQuest Illustrated Guide to Machine Learning!!! Chapel Hill: Joshua Starmer.
Vinod, H.D. (2011). Hands-On Matrix Algebra Using R, Active and Motivated Learning with Applications. Hackensack: World Scientific.
White, S. (2021). A Practical Approach to Analyzing Healthcare Data, Fourth Edition. Chicago: American Health Information Management Association (AHIMA).