What I’m Learning In
ATU’s Master’s Course in Data Science
The course doesn’t officially begin till September, however, this particular course also included a set of online preparatory classes which started at the beginning of June and ended at the start of July. Subjects taught included Python, Database Systems and Applied Statistics.
Along with the lectures we had to attend on a daily basis, we also had to complete assignments, some written and some code-based.
Database Systems
The first module we did was Database Systems where we got a crash course in all things SQL, Data modelling, Oracle SQL Developer, etc. It was one of the harder subjects given that my background was primarily in NoSQL, which made getting to the finish line and submitting the final project all the more rewarding.
Some of the assignments we were given included:
- Designing a Relational Schema
- Implementing and Querying a Relational Database
We, unfortunately, will not be getting our final marks until September when an external examiner marks so my most up-to-date results are as follows:
Programming
In the Programming classes we learned all about programming concepts, ranging from the basics (variables, functions and iterable objects) to concepts that were a bit more complex to grasp such as iterators and generators. We also took a deep dive into software development cycles and looked at the different methodologies that could be incorporated such as RAD (Rapid App Development), the Waterfall method, and Agile. At the tail end of the course, we got to mess around with Python libraries which are a bit more data-science specific, but of course can be used generally too, such as numpy and plotly.
Some of the assignments we were given included:
- Lab Reflection Logs — where we had to reflect on code we wrote in class, talk about why we made certain choices, and finally compare it to code written by classmates to figure out ways in which we could write better code.
- Writing a program that would take in data from a CSV file, clean up the code and perform a couple of operations listed in the assignment brief such as find the weighted mean of the student’s grades.
- Recording a video presentation of my code, talking through the reason why I coded the program the way I did.
My most up-to-date results are as follows:
Applied Statistics
This course was one of the most intense modules I’ve ever done. There were about ten videos a day (10 to 20 minutes per video), lots of new material to try and understand, and assignments and/or quizzes due almost every day. But once again, given that it was so intense, it was extremely rewarding to get to complete and definitely taught me how to better manage my time and be more resilient as a learner.
Topics covered in this module included: descriptive statistics, inferential statistics, and how to use the tools and functions available in Excel in a much greater capacity.
Some of the assignments we were given included:
- Writing a hypothetical report to the manager of a grocery store presenting my findings in detail and offering suggestions.
My most up-to-date results are as follows:
For semester 1 of the main part of the course, we’re taking three modules — Computational Mathematics, Predictive Analytics and Descriptive Analysis and Visualization.
Computational Mathematics
In this module, we’re learning a lot about the maths that’s under the hood of a lot of machine learning algorithms. Topics include subject matters like Linear Systems, Matrices, Eigenvectors and eigenvalues, Gradient Descent, Iterative Methods for Solution Approximations, Random Variables, Applied Calculus and Integration, etc.
The lectures are pretty conversational which makes for a good learning experience and offers a great breeding ground for better understanding and growth.
The module grounds you in the fundamental concepts which machine learning is built upon, allowing you to have a deeper understanding of what you’re doing when working with ML models.
[Results will go here when released]
Descriptive Analytics and Visualization
In this module, we’re building upon what we learned in the Database Systems crash course we were given during the summer. We went over the concepts we did over the summer and then quickly moved on to much more advanced concepts, such as partioning, subqueries, the use of materialised views and dimensional object in data warehouse performance optimisation, Explain plans and costs, group by extensions, buckets, Oracle Cloud, Talend, etc.
It’s helped me understand the complexity behind database systems a little more, and how much work goes into not just storing it, but also optimising our data stores for querying and analysing.
[Results will go here when released]
Predictive Analysis
In this module, we learned a lot about the Analytic Pipeline, and the different stages involved. The lessons culminated in us being able to create our own pipelines from start to finish.
Topics covered include: Data transformation, Data Visualisation in Python, Training Machine Learning Models, Feature Engineering, Principal Component Analysis, t-SNE, etc.
This module helped put all the pieces together from our Computational Maths class, showing us how the things we learned there applied to the real world when it came to implementing a model.
Results for Certificate in Computing (Preliminary summer course)
Semester 1
Semester 1 has finally come to an end and it definitely didn’t pass lightly. A lot of new topics were explored, a lot of growth had to happen, but overall it left all of the students who participated in it better off because of it.
Descriptive Analysis and Visualisation
In Descriptive Analysis the entire data pipeline was explored from collection to storage to analysis. This module focused more so on providing its students with the ability to provide statistical solutions to analysis problems, because although AI and Machine Learning are great, they’re just tools, and they’re not always the best tool to use for every situation.
The practical side of this course introduced software like Talend OS and Tableau, it also reintroduced SQL, Oracle and data modeling with some familiar concepts and also more advanced ones too.
Two reports that were written for this module can be found here:
Predictive Analysis
This module filled in the gaps for the previously mentioned Descriptive Analysis and Visualisation module. It provided insight into ML, mainly focusing on giving a high-level overview of the topic, key terminology, and supplying its students with the knowledge necessary to produce predictive models using algorithms like Linear Regression, KNN, SVM, Bagging Estimators, etc., utilising tools like one-hot encoding, data visualisation, model evaluation and Principal Component Analysis.
A report for this module can also be found at the Github repo linked earlier: https://github.com/RichardOgujawa/academic-writeups
Computational Maths
This module focused heavily on the mathematical side of ML, granting the foundational understanding necessary to be able to not just use black box machine learning algorithms or blindly type formulas into Excel but to have a much deeper understanding of what was going on under the hood. Topics covered included subjects like matrices, ANOVA tests, different distributions (chi squared, Poisson distribution, Bernoulli), calculus and integration.
Some of the assignments pertaining to this module can be found here: