Most of this work is done explicitly in the notes on the exercise of comparing blood pressure samples to normal distribution
In this Try-It activity, you will review the theory about the central limit theorem and experiment with Python code to simulate and compute the mean of a sample distribution. Throughout the Jupyter Notebook, you will verify the hypothesis of the central limit theorem by working on two different examples: one with uniformly distributed samples and one with exponentially distributed samples.
To start the Try-It activity, check out my Github gist containing the notebook and open the items locally or in a colaboratory session.
Open the Jupyter Notebook file using your own local instance of Jupyter Notebook. You can also use something like Colaboratory which works with gists. There are also additional image files that you will need to visualize the Try-It activity correctly.
Throughout the Jupyter Notebook, detailed instructions will guide you through the activity and suggest numerical values. Run the code and visualize the results. You may also change the numerical values to visualize different outcomes and results.
Now that you have experimented with Try-It Activity 7.2, discuss the results with your peers. Describe the steps in the code. What did you notice visually as you ran your code and implemented changes? How did the plots change as you satisfied the hypothesis of the central limit theorem?
TODO Link to this gist from colaboratory The rest of this document task from
IMPORTANT INSTRUCTIONS: This activity is designed for you to experiment with Python code about sampling, variance, and mean. Feel free to change any numerical value throughout the code in the activity to visualize different outcomes and results.
Again, the workbook Jupyter notebook is here.
This activity involves running various python based experiments with the Central Limit Theorem. Including:
Visually when working on the uniform distribution based on dice rolls, it was obvious the more I increased both the number of rolls in a trial and the number of trials I was taking means from that the distribution start to converge towards the Gaussian. Of note, I most quickly noticed a change once the typical 30 sample size heuristic was crossed. That's where you start seeing the least mean error in all the distributions.
Then moving on to the exponential distribution to demonstrate the Central Limit Theorem applies to all samples from all distributions. We see the same thing, the sample means converge towards the normal distribution once more samples are taken. You're given some example code to fill in values for, and with every increase in sample size you converge on the Gaussian distribution with the means every time.
In this Try-It activity, you will experiment with Python code that computes the correlation of given data using both the NumPy and pandas libraries. To start the activity, download the zip file correlation workbook jupyter notebook gist and open the items.
Open the Jupyter Notebook file using your own local instance of Jupyter Notebook. Alternatively, use a notebook service like Colaboratory which works natively with gists and are nice due to the ephemeral nature. There are also additional image files that you will need to visualize the Try-It activity correctly.
Now that you have experimented with Try-It Activity 7.3, discuss the results with your peers. Describe the steps in the code. What did you notice visually as you ran your code and implemented changes?
Suggested Time: 45 minutes
This is a self-study activity and does not count toward your final grade in this course.
This notebook wasn't too hard to follow. It was a nice way to walk through the concepts of a correlation, especially with how we analyze them in python. What was nice was seeing how we visualize it both using pandas and seaborn by creating heatmaps to represent correlation.
What was new to me however was that I never knew that
there are other common ways to compute correlation coefficients.
When looking at pandas documentation for
the corr()
DataFrame
method apparently the pearson
method
is only the most common.
You can also tell it to use the method=kendall
or method=spearman
methods.
Doing so in this notebook the only thing I really noticed is that
the spearman
& pearson
methods give very similar results.
The kendall
method seems to create smaller correlations.
All of the content here has notes taken on it in my markdown notes.
In this Try-It activity, you will experiment with Markdown syntax to create your own Markdown cheat sheet that you will refer to when you need to look up Markdown examples. To start the activity, open your own local instance of Jupyter Notebook and create a new file.
To complete your personal cheat sheet, please include the following elements in your Jupyter Notebook:
Now that you have experimented with Try-It Activity 7.4, share your experience with your fellow learners:
How was your experience of writing your own Markdown syntax? Did you find any step of creating your own cheat sheet particularly challenging? What additional tags did your peers include that you found interesting or that were new to you? Read the statements posted by your peers. Engage with them by responding with thoughtful comments and questions to deepen the discussion.
Suggested Time: 60 minutes
Suggested Length: 150-200 words
This is a required activity and will count toward course completion.
Personally, I'm already quite deep into the markdown rabbit hole. I discovered a knowledge management system about a year ago called zettelkasten. Basically, you take notes in any manner you like, ideally in modern times that would be digital notes in an easy digitized markup format like markdown, and make heavy use of the linking syntax to other notes of related subject matter. Then having a site generator create a site, complete with all the links each of the notes and tracking back-links create what is essentially a personal wikipedia for all your knowledge that you've taken notes on to review. Since you can easily create links in markdown to other notes, this makes connecting ideas and topics together in a network of knowledge fairly easy.
I have my own static site generator that handles all this in a git repository containing my notes. And whenever a note change is pushed to the remote the static site generator renders a new webpage for me to review on any device with a browser. There's more complete turn-key solutions like Obsidian Notes that will take care of all of this. I highly recommend adopting some kind of system like this for the course.
So in answer to the discussion prompt.
I love it, it's a core part of my work and personal knowledge management. By writing markup in a human readable syntax it makes writing rich text really useful in many ways. Especially when you start considering the relationship between markdown syntax to HTML.
Not really, I've used markdown a lot before I took this course. But I can link to a helpful cheat sheet other people can use if they're interested.
I noticed some people bring up LateX syntax being possible in markdown. I knew this already but haven't used it much so the provided cheat-sheets are helpful. Basically you just wrap the mathematical expressions in dollar signs and use Latex syntax to describe the math.
Now that you've learned a lot of tools for data analysis, you are ready to create a model and start predicting outcomes.
Here is the CSV file of the housing data which will get used in the Project.
The notes on predicting house prices using linear regression & python cover the rest of this.