Machine Learning Foundations: Linear Algebra, Calculus, Statistics & Computer Science
[ ]
🔹 Introductory Lectures:
These are great courses to get started in machine learning and AI. No prior experience in ML and AI is needed. You should have some knowledge of linear algebra, introductory calculus and probability. Some programming experience is also recommended.
This modern classic of machine learning courses is a great starting point to understand the concepts and techniques of machine learning. The course covers many widely used techniques, The lecture notes are detailed and review necessary mathematical concepts.
This course aims to provide an iterative framework for designing real-world machine learning systems. The goal of this framework is to build a system that is deployable, reliable, and scalable.
(:octocat: repo on github) — отличный десятинедельный курс по нейросетям и компьютерному зрению.
A great way to start with deep learning. The course focuses on convolutional neural networks and computer vision, but also gives an overview on recurrent networks and reinforcement learning.
Alternative to Stanford CS229. As the name implies, this course takes a more applied perspective than Andrew Ng's machine learning lecture at Stanford. You will see more code than mathematics. Concepts and algorithms are using the popular Python libraries scikit-learn and Keras.
Modern NLP techniques from recurrent neural networks and word embeddings to transformers and self-attention. Covers applied topics like questions answering and text generation.
This course aims to introduce students to modern state of Machine Learning and Artificial Intelligence. It is designed to take one year (two terms at MIPT) - approximately 2 * 15 lectures and seminars.
[ ]
🔸 Advanced Lectures:
Advanced courses that require prior knowledge in machine learning and AI.
Machine Learning от Andrew Ng (Stanford University) – самый популярный курс по машинному обучению (осторожно, вместо стандартных Питона или R – Matlab/Octave)
A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2021 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art techniques!
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.
This is a solid path for those of you who want to complete a Data Science course on your own time, for free, with courses from the best universities in the World
A guide covering Machine Learning including the applications, libraries and tools that will make you better and more efficient with Machine Learning development.
Awesome AI Booksmarks - Curated list of reads, implementations and core concepts of Artificial Intelligence, Deep Learning, Machine Learning by best folk in the world
Awesome AI on Kubernetes - awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Awesome Big Data - A curated list of awesome big data frameworks, ressources and other awesomeness
Awesome Business Machine Learning - A curated list of practical business machine learning (BML) and business data science (BDS) applications for Accounting, Customer, Employee, Legal, Management and Operations
Awesome CS Courses - List of awesome university courses for learning Computer Science
Awesome ML Courses - Awesome free machine learning and AI courses with video lectures
Neural Networks for Machine Learning. Цитата: «Я уже использовал фразу "живая легенда" и теперь испытываю сложности, поскольку как-то иначе охарактеризовать Джеффри Хинтона (человека, стоящего у истоков современных подходов к обучению нейросетей с помощью алгоритма обратного распространения ошибки) сложно. Курс у него получился отличный»
dennybritz/reinforcement-learning, Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
Basic Statistics – хорошие лекции и квизы, идеально для новичков. Минус — недоработанные практические задания (Labs) на DataCamp, которые не помогают закрепить выученный материал, а, скорее, отвлекают от изучения статистики
Основы статистики на stepic.org — качественное введение в статистику, на русском языке
Основы статистики: часть 2 — курс продолжает знакомить слушателей с основными понятиями и методами статистики. Курс затронет такие темы как анализ номинативных данных, непараметрические критерии и методы понижения размерности
Intro to Statistics — covers visualization, probability, regression and other topics that will help you learn the basic methods of understanding data with statistics
Intro to Descriptive Statistics – подробный курс для новичков. This course will teach you the basic terms and concepts in statistics as well as guide you through introductory probability
Intro to Inferential Statistics – курс знакомит с базовыми понятиями индуктивной статистики (t-test, ANOVA, корреляция, регрессия и др.)
Intermediate Python — (переведённое на русский!) краткое онлайн-руководство по нюансам языка, мимо которых часто проходят новички (автор — Yasoob Khalid);
Scipy lecture notes — tutorials on the scientific Python ecosystem: a quick introduction to central tools and techniques
Очень большой список интересных питоновских ноутбуков (от туториалов на три минуты, до целых книг (!) в таком формате)
Data Science IPython Notebooks on Deep learning (TensorFlow, Theano, Caffe), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
100 Numpy exercises. The goal is both to offer a quick reference for new and old users and to provide also a set of exercices for those who teach
Jupyter – open source, interactive data science and scientific computing across over 40 programming languages. The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text
nbviewer – renders notebooks available on other websites
Sublime Text 3 - VIM XXI века*;, отлично подходит для python, если использовать вместе с плагинами:
Package Control - для быстрой и удобной работы с дополнениями
JS Charts – a JavaScript based chart generator that requires little or no coding. With JS Charts drawing charts is a simple and easy task, since you only have to use client-side scripting
Advanced R by Hadley Wickham – онлайн-книга для тех, кто хочет повысить свой навык программирования на R и лучше понять этот язык (в т.ч. для программистов на других языках);
Machine Learning in R (:octocat: github repo) — Interface to a large number of classification and regression techniques, including machine-readable parameter descriptions;
The Academic Torrents site has a growing number of datasets, including a few text collections that might be of interest (Wikipedia, email, twitter, academic, etc.) for current or future projects.
What's is the difference between train, validation and test set, in neural networks?
Training Set: this data set is used to adjust the weights on the neural network.
Validation Set: this data set is used to minimize overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set stays the same or decreases, then you're overfitting your neural network and you should stop training.
The validation data set is a set of data for the function you want to learn, which you are not directly using to train the network. You are training the network with a set of data which you call the training data set. If you are using gradient based algorithm to train the network then the error surface and the gradient at some point will completely depend on the training data set thus the training data set is being directly used to adjust the weights. To make sure you don't overfit the network you need to input the validation dataset to the network and check if the error is within some range. Because the validation set is not being using directly to adjust the weights of the network, therefore a good error for the validation and also the test set indicates that the network predicts well for the train set examples, also it is expected to perform well when new example are presented to the network which was not used in the training process.
Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.
Also, in the case you do not have enough data for a validation set, you can use cross-validation to tune the parameters as well as estimate the test error.
Cross-validation set is used for model selection, for example, select the polynomial model with the least amount of errors for a given parameter set. The test set is then used to report the generalization error on the selected model.
Early stopping is a way to stop training. There are different variations available, the main outline is, both the train and the validation set errors are monitored, the train error decreases at each iteration (backpropagation and brothers) and at first the validation error decreases. The training is stopped at the moment the validation error starts to rise. The weight configuration at this point indicates a model, which predicts the training data well, as well as the data which is not seen by the network . But because the validation data actually affects the weight configuration indirectly to select the weight configuration. This is where the Test set comes in. This set of data is never used in the training process. Once a model is selected based on the validation set, the test set data is applied on the network model and the error for this set is found. This error is a representative of the error which we can expect from absolutely new data for the same problem.
Fully connected neural network that recognizes handwriting numbers from MNIST database (Modified National Institute of Standards and Technology database)
Question Answering System using BiDAF Model on SQuAD
Implemented a Bidirectional Attention Flow neural network as a baseline on SQuAD, improving Chris Chute's model implementation, adding word-character inputs as described in the original paper and improving GauthierDmns' code.