Springboard Data Science Immersive Save

Project README

Springboard-Data-Science-Immersive

This repository will house all code, data, and files related to my work in the Springboard Data Science Immersive program. The following acts as a table of contents for the whole repository with links to the respective work cited

Capstone 1

Facilitation of Cryptocurrency Price Prediction by Sentiment Analysis

Key Skills

Web Scraping
NLP - Natural Language Processing
Time Series Analysis
Deep Neural Networks

Custom Sentiment Analysis Library Created to facilitate in Overall Sentiment Analysis on Cryptocurrency News Articles scraped form the web. Used in conjunction with historical price data, the analysis is used in a deep neural network in order to predict future pricing for a crypto coin of interest

Capstone 2

Exploring Computational Efficiency in Object Detection with Convolutional Neural Networks

Key Skills

Image Processing
Video Processing
H5 Storage
Object Oriented Programming
Tensorflow
Tensorboard
Convolutional Neural Networks
Object Detection

Exploring different image preprocessing techniques and methods in order to speed up CNN training. As a positive side effect, the transformation of original full scale data results in a smaller memory expense, both hard drive and RAM.

Clustering Methods

K-Nearest Neighbors and PCA

Key Skills

K-Means
PCA - Principle Component Analysis
Elbow Sum of Squares Method

Mini project on customer segmentation and being able to identify different types of customers and then figure out ways to find more of those individuals so you can get more customers! The data comes from John Foreman's book Data Smart. The dataset contains both information on marketing newsletters/e-mail campaigns (e-mail offers sent) and transaction level data from customers (which offer customers responded to and what they bought).

Exploratory Data Analysis' (EDA)

Hospital Readmittance Data

Human Temperature Data

Racial Discrimination Data

Key Skills

Central Limit Theorem
Statistical Analysis
Data Visualization
z-test
t-test
Margin of Error (MOE)
Chi-Squared Test
Bootstrap Statistics

Several EDA's performed on varying data categories. Hospital Readmittance performs a statistical analysis on a previously done analysis to critique its validity. Human Temperature EDA uses bootstrap statistics to determine the true average temperature of the human body in both male and females. Racial Discrimination performs a statistical analysis on if race has a meaningful impact on the callback rate of candidates who have submitted resumes to jobs of interest.

Machine Learning Algorithms

Linear Regression

Logistic Regression

Naive Bayes

Key Skills

Logistic Regression
Linear Regression
Naive Bayes

Performing several Machine Learning Algorithms in miniprojects such as: Labeling an obersvation as either male or female based on height and weight data (Logistic Regression), Regression Price Estimate on Boston Housing data using Linear Regression, and predicting movie reviews with Naive Bayes Models

PYSPARK

MapReduce with Pyspark

Performing several exercises utlitizing MapReduce Pyspark (RDD) with a touch of MLlib

Key Skills

Pyspark
RDD
Spark Dataframes

SQL

Yammer SQL Case Study

Key Skills

SQL
Time Series Analysis
Applied Plotting and Charting

This is a SQL case study as proposed from Mode Analytics at https://modeanalytics.com/. The Jupyter notebook in this repository is a cleaned up verison of the original case study which contains all original SQL queries, and can be found here: https://modeanalytics.com/mooseburger/reports/14cbbb5670b8

JSON

Data Wrangling with JSON

Key Skills

JSON Manipulation and Extraction
Applied Plotting and Charting

An exercise of data extraction and exploration utilizing a JSON data source

Take Home Data Challenges

Relax Chalenge

Ultimate Challenge Parts 1 & 2

Ultimate Challange Part 3

Key Skills

Full Stack Data Scientist

Relax Challenge - Defining an "adopted user" as a user who has logged into a product on three separate days in at least one seven-day period, identify which factors predict future user adoption. You are given two datasets

A user table ("takehome_users") with data on 12,000 users who signed up for the product in the last two years
A usage summary table ("takehome_user_engagement") that has a row for each day that a user logged into the product.

Ultimate Challenge

Part 1 ‐ Exploratory data analysis
Part 2 ‐ Experiment and metrics design
Part 3 - Predictive Modelling

Open Source Agenda is not affiliated with "Springboard Data Science Immersive" Project. README Source: Mooseburger1/Springboard-Data-Science-Immersive

Stars

Open Issues

Last Commit

5 years ago

Repository

Mooseburger1/Springboard-Data-Science-Immersive

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/springboard-data-science-immersive"><img src="https://www.opensourceagenda.com/projects/springboard-data-science-immersive/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022