What I’m Learning In
ATU’s Master’s Course in Data Science
--
The course doesn’t officially begin till September, however, this particular course also included a set of online preparatory classes which started at the beginning of June and ended at the start of July. Subjects taught included Python, Database Systems and Applied Statistics.
Along with the lectures we had to attend on a daily basis, we also had to complete assignments, some written and some code-based.
Database Systems
The first module we did was Database Systems where we got a crash course in all things SQL, Data modelling, Oracle SQL Developer, etc. It was one of the harder subjects given that my background was primarily in NoSQL, which made getting to the finish line and submitting the final project all the more rewarding.
Some of the assignments we were given included:
- Designing a Relational Schema
- Implementing and Querying a Relational Database
We, unfortunately, will not be getting our final marks until September when an external examiner marks so my most up-to-date results are as follows:
Programming
In the Programming classes we learned all about programming concepts, ranging from the basics (variables, functions and iterable objects) to concepts that were a bit more complex to grasp such as iterators and generators. We also took a deep dive into software development cycles and looked at the different methodologies that could be incorporated such as RAD (Rapid App Development), the Waterfall method, and Agile. At the tail end of the course, we got to mess around with Python libraries which are a bit more data-science specific, but of course can be used generally too, such as numpy and plotly.
Some of the assignments we were given included:
- Lab Reflection Logs — where we had to reflect on code we wrote in class, talk about why we made certain choices, and finally compare it to code written by classmates to figure out ways in which we could write better code.
- Writing a program that would take in data from a CSV file, clean up the code and perform a couple of operations listed in the assignment brief such as find the weighted mean of the student’s grades.
- Recording a video presentation of my code, talking through the reason why I coded the program the way I did.
My most up-to-date results are as follows:
Applied Statistics
This course was one of the most intense modules I’ve ever done. There were about ten videos a day (10 to 20 minutes per video), lots of new material to try and understand, and assignments and/or quizzes due almost every day. But once again, given that it was so intense, it was extremely rewarding to get to complete and definitely taught me how to better manage my time and be more resilient as a learner.
Topics covered in this module included: descriptive statistics, inferential statistics, and how to use the tools and functions available in Excel in a much greater capacity.
Some of the assignments we were given included:
- Writing a hypothetical report to the manager of a grocery store presenting my findings in detail and offering suggestions.
My most up-to-date results are as follows:
For semester 1 of the main part of the course, we’re taking three modules — Computational Mathematics, Predictive Analytics and Descriptive Analysis and Visualization.
Computational Mathematics
In this module, we’re learning a lot about the maths that’s under the hood of a lot of machine learning algorithms. Topics include subject matters like Linear Systems, Matrices, Eigenvectors and eigenvalues, Gradient Descent, Iterative Methods for Solution Approximations, Random Variables, Applied Calculus and Integration, etc.
The lectures are pretty conversational which makes for a good learning experience and offers a great breeding ground for better understanding and growth.
The module grounds you in the fundamental concepts which machine learning is built upon, allowing you to have a deeper understanding of what you’re doing when working with ML models.
[Results will go here when released]
Descriptive Analytics and Visualization
In this module, we’re building upon what we learned in the Database Systems crash course we were given during the summer. We went over the concepts we did over the summer and then quickly moved on to much more advanced concepts, such as partioning, subqueries, the use of materialised views and dimensional object in data warehouse performance optimisation, Explain plans and costs, group by extensions, buckets, Oracle Cloud, Talend, etc.
It’s helped me understand the complexity behind database systems a little more, and how much work goes into not just storing it, but also optimising our data stores for querying and analysing.
[Results will go here when released]
Predictive Analysis
In this module, we learned a lot about the Analytic Pipeline, and the different stages involved. The lessons culminated in us being able to create our own pipelines from start to finish.
Topics covered include: Data transformation, Data Visualisation in Python, Training Machine Learning Models, Feature Engineering, Principal Component Analysis, t-SNE, etc.
This module helped put all the pieces together from our Computational Maths class, showing us how the things we learned there applied to the real world when it came to implementing a model.
[Results will go here when released]