Predict Future Sales : Kaggle Competition

Dataset description

ID - an Id that represents a (Shop, Item) tuple within the test set
shop_id - unique identifier of a shop
item_id - unique identifier of a product
item_category_id - unique identifier of item category
item_cnt_day - number of products sold. We are predicting a monthly amount of this measure
item_price - current price of an item
date - date in format dd/mm/yyyy
date_block_num - a consecutive month number, used for convenience. January 2013 is 0, February 2013 is 1,..., October 2015 is 33
item_name - name of item
shop_name - name of shop
item_category_name - name of item category

Cleaning Data

There are a little bit noise in shop city names and categories extract from shop names but since it’s unique, so it won’t significantly effect our prediction

Preprocessing

Each row of testset consist of combination of shop_id and item_id in that shop
this figure shows that we need to predict item_cnt_month for every items in each shop
exploring the details and dimension of test and training set

Regressor Modeling

Split the original test set out of the training set
Implement regression using XGBRegressor and using root mean square error values as evaluation method
result after regression modeling

Feature Engineer

Implement lag feature function
Implementing lag feature on each date block’s features
•date_block_num, item_category_type_code -> date_cat_avg_item_cnt•date_block_num, shop_id, item_category_id -> date_shop_cat_avg_item_cnt•date_block_num, shop_id, item_category_type_code -> date_shop_type_avg_item_cnt•date_block_num, shop_id, item_category_sub_type_code -> date_shop_subtype_avg_item_cnt•date_block_num, item_category_sub_type_code -> date_subtype_avg_item_cnt•date_block_num, shop_city -> date_city_avg_item_cnt•date_block_num, item_id, shop_city -> date_item_city_avg_item_cnt•date_item_city_avg_item_cnt -> date_type_avg_item_cnt•date_block_num, item_category_sub_type_code  -> date_subtype_avg_item_cnt

Hi there, these are the collection of my works done during the study in the university. If you want to explore more on my development please proceed to my repo.