Weekly Report 1 - Feb 05-09 2018

10 Feb 2018

Amlaan

What I did

Explored datasets online; especially on kaggle.com.
Curated three Chicago related datasets:
- Crimes in Chicago (~1.4m samples)
- Chicago Restaurant Inspections (~154k samples)
- Chicago - Citywide Payroll Data (~32k samples)
Wrote scripts to clean, handle missing data, and store Crimes and Restaurant Inspection datasets (fully usable for ML models)
Created Wunderground API key to get weather data when zipcode is available

What I plan to do next week

Explore and clean more datasets
Start out on a strategy to integrate various available and usable datasets with entity resolution techniques
Try out Decision trees and Random Forrests from Scikit library

Blocks or impediments to the plan for next week

None for now. The biggest issue would be data integration and handling conflicting and empty data that arises when columns are integrated.

Somshubra

What I did

Setup project structure and core modules that will be built later
Setup scrapers to scrape yelp.com for restaurants and bars around Chicago from the address 60601 - 60607. Scraped roughly 550~ restaurants + bars and some useful data.
Setup scrapers to scrape yelp.com/bi for reviews based on the names of restaurants from the above dataset. Scraped roughly 50,000 reviews of nearly 280 restaurants + bars.
Obtained Yelp Dataset for nearly 5.2 million raw reviews of various restaurants from all over the U.S. We plan to use this to supplement the final classifier to be more generalized and learn to predict sentiment based on just the text description alone.
Setup script to call the Wunderground API to obtain nearly 4 years worth of weather data for a pin codes 60601 & 60605.
Wrote scripts to clean and preprocess the above scraped datasets and utils to manipulate the data.

What I plan to do next week

Finish scraping and building the weather dataset by scraping more data for previous years.
Setup basic preprocessing for cleaning the text reviews
Setup basic ML baselines for Sentiment Analysis of the reviews, if time permits

Blocks or impediments to the plan for next week

None for the foreseeable future. However, next week will be data integration. So we need to prepare all of the other datasets asap.

Debojit

What I did

Imported and ran the scripts to collect data. Updated myself with the structure and workflow.
Surfed the internet to decided on a few datasets.
Designed a initial pipeline for the project and imparted it to teammates

What I plan to do next week

Scrape data from some sites and download a dataset I have decided on.
Clean the scraped data into the commonly decided format.
Discuss with teammates the required duties and responsibilities.

Blocks or impediments to the plan for next week

None so far, except a few conditions to be decided while cleaning the data and integrating it with the other three.

Christopher

What I did

Imported project structure and core modules that will be built later
As part of data discovery; for now obtained the public bike data
Spent time in understanding work that has to be done and what is being done by the team.

What I plan to do next week

Planning on getting the data in the health sector and relate it to the existing requirements
Must finalize the data and draft the first report.
Learn more on visualization, for that is where I look forward to contribute primarily.

Blocks or impediments to the plan for next week

Nothing unprecedented, like pointed above the integration of data could need some extra effort.

Project Alethea

Weekly Report 1 - Feb 05-09 2018

Amlaan

What I did

What I plan to do next week

Blocks or impediments to the plan for next week

Somshubra

What I did

What I plan to do next week

Blocks or impediments to the plan for next week

Debojit

What I did

What I plan to do next week

Blocks or impediments to the plan for next week

Christopher

What I did

What I plan to do next week

Blocks or impediments to the plan for next week

Related Posts

Weekly Report 1- - April 07 2018 - April 13 2018 13 Apr 2018

Weekly Report 9 - Mar 23 2018 - Mar 30 2018 30 Mar 2018

Weekly Report 8 - Mar 19 2018 - Mar 23 2018 23 Mar 2018