PCDE Course Overview
DELETEME Test diff for deploy
Introduction
This is a course where MIT's xPro program will prepare you for a
certification in the professional skills required in Data Engineering.
The course will go through topics in Python programming,
the basics of database design.
First Some Lessons About Good Note Taking
Interacting with Lectures
It's important to take notes on the content released each week.
Good note taking can even save time on doing activities and assignments,
as the information needed for some of them will often be within good notes taken.
Effective learning includes what goes on before lectures, during and after.
The University of British Columbia
have put together some recommendations on those three phases.
Also there's another note on notetaking effectively,
including the information presented in University of B.C.'s article.
Chat with your Learning Facilitators
There are times we're simply stumped,
regardless of our best efforts.
It's best to go through the modules as early as possible,
with ample time to reach out to learning facilitators and peers.
Attend office hours frequently,
when possible,
both for information reinforcement and to ask your questions.
Submit support tickets to ask those questions and get extra guidance.
You'll find instructions on how to submit support tickets in your
Orientation Week Module.
Connect with your Fellow Learners
Just because you are viewing course material on your own,
doesn't mean that you're the only one pursuing this certificate.
Use Slack to connect with others,
ask questions, and get some help from your peers.
They may have the same difficulties as you or
might have some great tips that will make the topics click.
The moral of the story here is that
the more you can interact with course content in many ways,
through effective note-taking, review and connecting with others,
the more you'll be able to get the concepts down and
get the most success from the course.
Some Time Management Recommendations
If taking the course in a well structured way, it should be taking about ~15 hours a week.
This is a minimal recommendation and
you may find yourself spending more than 20 hours per week some weeks.
Integrating this time into your schedule will require disciplined time management.
Here are some more in depth tips on managing time while in the program.
Remember though with previous knowledge from previous cohort that is now deferred,
shoot for 10 hours a week.
If more time is necessary for each week it's important to seek help earlier.
Course Outline
Here is the outline copy:
Here are the due dates for each module outlined
Module 0: Course Orientation
Notes Links
Key Activities
- Course Introduction
- Learning Platform Overview
- Introduce Yourself
- Course Agreement
- Install Tools Needed for Modules 1-3
Module 1: Introduction to Python
Notes Links
Learning Outcomes
- Starts: 2022-12-07
- Due: 2022-12-14
- Compare Python basic data types and operators.
- Create basic Python data types in a coding environment.
- Identify lists, tuples, sets, and dictionaries in Python.
- Create Python lists, tuples, sets, and dictionaries in a coding environment.
- Use indexing and slicing in Python.
- Interpret memory allocation for Python objects.
- Define loops and conditionals in a Python coding environment.
- Integrate loops and conditionals in a Python coding environment.
- Define Python functions and variable scope.
- Use Python functions in a coding environment.
- Interpret Python classes.
- Read and write files in Python.
Key Activities
- Discussions
- Activities
- Knowledge Checks
- Coding Assignment
Module 2: Introduction to NumPy
Notes on Topic
Learning Outcomes
- Create NumPy arrays, functions, and multidimensional arrays.
- Define NumPy arrays, functions, and multidimensional arrays.
- Interpret NumPy memory allocation.
- Describe basic probability concepts.
- Explain the connection between histograms and probability densities.
- Differentiate between discrete and continuous distributions.
- Define probability density functions and probability distribution functions.
- Create discrete and continuous distributions.
- Define Matplotlib graphs.
- Visualize data using Matplotlib graphs.
- Interpret data using Matplotlib graphs.
Module 3: Introduction to Pandas
Learning Outcomes
- Define pandas series and dataframes
- Implement pandas series and dataframes
- Perform data cleaning in pandas
- Prepare data using one-hot encoding in pandas
- Explain time and data functionality in pandas
- Analyze data in pandas
- Design dataframes in pandas
Note Links
Module 4: Databases & Intro to SQL
Module 5: Databases with SQL Statements
Notes on Topic
Key Activities
- Discussions
- Activities
- Knowledge Checks
- Coding Assignment
Outcomes
- Outline big data and database systems.
- Design databases conceptually and formally.
- Interpret database components.
- Correlate databases.
- Interpret cardinality and normalization of tables.
- Design physical components of databases.
- Define a database in a coding environment.
- Manipulate a database in a coding environment.
- Explain database data types and indexing.
Module 6: Databases Analysis and the Client Server Interface
Notes on Topic
Key Activities
- Discussions: 2
- Activities: 5
- Self Study Drag & Drop: 2
- Knowledge Checks: 7
- Coding Assignment: 1
- Video Lectures: 25
- Mini Lessons: 5
- Estimated 17.5hrs to complete
Time Log
Outcomes
- Write functional queries to explore a database.
- Analyze the structure of a database.
- Create visualizations of data using histograms in SQL.
- Clean a dataset in SQL.
- Handle date and time in SQL.
- Define the client-server interface.
- Read and write tables using a driver.
- Discriminate between RDBMS and in-memory databases.
Module 7: A Model to Predict Housing Prices
Due Date: 1629 UTC February 8, 2023
Available for late submission till: February 22, 2023
Notes on Topic
Key Activities
- Discussions: 4
- Activities: 0
- Self Study Drag & Drop: 0
- Knowledge Checks: 3
- Coding Assignment: 1 (PROJECT)
- Video Lectures: 6 LONG LECTURES
- Mini Lessons: 0
- Estimated 18hrs to complete
- Divided by 7 days & 40% overshoot = 4hrs/day
Outcomes
- Describe how descriptive statistics are used in Python.
- Explain central limit theorem and correlation.
- Describe how to calculate a linear regression.
- Write Markdown syntax.
- Build a prediction model using linear regression.
Module 8: ETL, Analysis, Visualization
Due Date: 4:29 PM UTC February 15, 2023
Available for late submission till: February 22, 2023
Notes on Topic
Key Activities
- Discussions: 4
- Activities: 0
- Self Study Drag & Drop: 0
- Knowledge Checks: 3
- Coding Assignment: 1 (PROJECT)
- Video Lectures: 6 LONG LECTURES
- Mini Lessons: 0
- Estimated 18hrs to complete
- Divided by 7 days & 40% overshoot = 4hrs/day
Outcomes
- Describe how descriptive statistics are used in Python.
- Explain central limit theorem and correlation.
- Describe how to calculate a linear regression.
- Write Markdown syntax.
- Build a prediction model using linear regression.
Module 9: GitHub & Advanced Python
Notes on Topic
Key Activities
- Discussions: 1
- Activities: 6
- Self Study: 2
- Knowledge Checks: 4
- Coding Assignment: 1
- Video Lectures: 90 minutes
- Mini Lessons: 0
Outcomes
- Debug Python code.
- Use GitHub for version control.
- Create a portfolio using GitHub Pages.
- Implement Python classes.
- Write code using advanced Python functions.
- Utilize Python decorators and wrappers.
References
Notes Links