Ml Things Versions Save

This is where I put things I find useful that speed up my work with Machine Learning. Ever looked in your old projects to reuse those cool functions you created before? Well, this repo is designed to be a Python Library of functions I created in my previous project that can be reused. I also share some Notebooks Tutorials and Python Code Snippets.

0.0.1

2 years ago

This is my first release of the Machine Learning Things package.

From this version onwards I will keep track of the updates.

Installation

This repo is tested with Python 3.6+.

It's always good practice to install ml_things in a virtual environment. If you guidance on using Python's virtual environments you can check out the user guide here.

You can install ml_things with pip from GitHub:

pip install git+https://github.com/gmihaila/ml_things

Current features:

Functions

All function implemented in the ml_things module.

  • Array Functions: Array manipulation related function that can be useful when working with machine learning.

    • pad_array: Pad variable length array to a fixed numpy array. It can handle single arrays [1,2,3] or nested arrays [[1,2],[3]].
    • batch_array: Split a list into batches/chunks. Last batch size is remaining of list values. Note: This is also called chunking. I call it batches since I use it more in ML.
  • Plot Functions: Plot related function that can be useful when working with machine learning.

    • plot_array: Create plot from a single array of valu
    • plot_dict: Create plot from a single array of values.
    • plot_confusion_matrix: This function prints and plots the confusion matrix.
  • Text Functions: Text related function that can be useful when working with machine learning.

    • clean_text: Clean text using various techniques.
  • Web Related: Web related function that can be useful when working with machine learning.

    • download_from: Download file from url. It will return the path of the downloaded file.

Snippets

This is a very large variety of Python snippets without a certain theme. I put them in the most frequently used ones while keeping a logical order. I like to have them as simple and as efficient as possible.

Name Description
Read FIle One liner to read any file.
Write File One liner to write a string to a file.
Debug Start debugging after this line.
Pip Install GitHub Install library directly from GitHub using pip.
Parse Argument Parse arguments given when running a .py file.
Doctest How to run a simple unittesc using function documentaiton. Useful when need to do unittest inside notebook.
Fix Text Since text data is always messy, I always use it. It is great in fixing any bad Unicode.
Current Date How to get current date in Python. I use this when need to name log files.
Current Time Get current time in Python.
Remove Punctuation The fastest way to remove punctuation in Python3.
PyTorch-Dataset Code sample on how to create a PyTorch Dataset.
PyTorch-Device How to setup device in PyTorch to detect if GPU is available.

Notebooks Tutorials

This is where I keep notebooks of some previous projects which I turnned them into small tutorials. A lot of times I use them as basis for starting a new project.

All of the notebooks are in Google Colab. Never heard of Google Colab? :scream_cat: You have to check out the Overview of Colaboratory, Introduction to Colab and Python and what I think is a great medium article about it to configure Google Colab Like a Pro.

If you check the /ml_things/notebooks/ a lot of them are not listed here because they are not in a 'polished' form yet. These are the notebooks that are good enough to share with everyone:

Name Description Links
:grapes: Better Batches with PyTorchText BucketIterator How to use PyTorchText BucketIterator to sort text data for better batching. Open In Colab Generic badge Generic badge Generic badge Generic badge
:dog: Pretrain Transformers Models in PyTorch using Hugging Face Transformers Pretrain 67 transformers models on your custom dataset. Open In Colab Generic badge Generic badge Generic badge Generic badge
:violin: Fine-tune Transformers in PyTorch using Hugging Face Transformers Complete tutorial on how to fine-tune 73 transformer models for text classification — no code changes necessary! Open In Colab Generic badge Generic badge Generic badge Generic badge
⚙️ Bert Inner Workings in PyTorch using Hugging Face Transformers Complete tutorial on how an input flows through Bert. Open In Colab Generic badge Generic badge Generic badge Generic badge
🎱 GPT2 For Text Classification using Hugging Face 🤗 Transformers Complete tutorial on how to use GPT2 for text classification. Open In Colab Generic badge Generic badge Generic badge Generic badge