Skip to content

A collection of the different models written for the New York City Taxi Fare Prediction Kaggle competition.

Notifications You must be signed in to change notification settings

Abilityguy/New-York-City-Taxi-Fare-Prediction

 
 

Repository files navigation

DA project repository for teamAPP

Links to various kernels: Data feature engineering:
https://www.kaggle.com/pradyu99914/data-feature-engineering
Version 29 has supporting visualizations, while version 32 adds more features to the existing dataset. Please note that the code has been executed over a number of commits.(as it takes ~7H per feature)

Linear, Ridge, Lasso regression, and Random forest model with 100 estimators: https://www.kaggle.com/anushkini/nyc-taxi-fare-models

Data Visualizations (visualizations that gave insights into feature engineering new columns): https://www.kaggle.com/anushkini/nyc-taxi-fare-graphs

Xgboost: https://www.kaggle.com/anushkini/taxi-xgboost

kNN regression with vizualisation for best k value: https://www.kaggle.com/pradyu99914/nyc-taxi-fare-models

LGBM: https://www.kaggle.com/anushkini/taxi-lightgbm

Final Pipeline, with the links to all relevant kernels: https://www.kaggle.com/pradyu99914/

A brief description of the files and folders:
-- TeamAPP_FinalReport.pdf - The final report of our project.
-- demo.py - A demo script which shows our recommender system in action.
--feature_engineering.py - A script which describes the feature engineering performed on the data.
--final_pipeline.py - The final pipeline code for our project.
--model.txt - The final LGBM model with an RMSE of 2.93.
--test_df.feather - A feather file which contains the test dataset in compressed form.
--visualization.py - A python script of all the visualizations performed on the dataset.
--Models:
----kNN Model/knn.py - Script to train the K Nearest Neighbours Model.
----ANN/ann.py - Script to train the Neural Network Model.
----LGBM/lgbm.py - Script to train the LGBM Model.
----XGBoost/XGBoost.py - Script to train the XGBoost Model.
----Lasso regression.py - Script to train the Lasso regression Model.
----LRRF.py - Script to train the Random forest Model.
----LR.py - Script to train the Linear regression Model.
----RidgreRegression.py - Script to train the XGBoost Model.

Results:
We have been able to obtain an RMSE rate of about 2.93 on the kaggle competition.

All kaggle submissions made till date:

Model Model details RMSE
XGBoost Trained on 1 Million data points 4.46939
XGBoost (Bagging) Trained on 8 Million data points 4.19958
XGBoost (Bagging) Trained on 54 Million data points 4.12760
XGBoost Used Bayeisan Optimization for hyperparameter tuning 4.18783
XGBoost (Bagging) Improved Dataset with feature Engineering, 8 Million data points 3.91798
XGBoost Improved dataset and used Bayesian Optimization 3.17963
XGBoost Converted Coordinates from decimals into radians and Bayesian Optimization 3.17697
XGBoost Feature Engineered Day, Month columns to the dataset 3.11282
LGBM Improved Dataset 3.13226
LGBM Changed boosting hyperparameters 3.08951
LGBM Converted Coordinates from decimals into radians 3.08830
LGBM Feature Engineered Day, Month columns to the dataset 3.02238
LGBM Reworked Dataset with more Feature Engineering 2.99228
LGBM Added new distance features 2.99095
LGBM Trained on 15 Million data points 2.93553
Linear Regression Baseline Model 5.39
Linear Regression Final model trained on engineered data 5.18
Ridge ression Ridge regression with grid search 5.18
Lasso regression Lasso regression 9.409
Lasso regression Lasso regression with grid search 5.05
Random forest regressor Random forest regressor 4.43
Linear Regression Final model trained on engineered data 5.18
kNN regressor with bagging using feature engineered data 3.54
Artificial neural network ANN with normalization of the data 3.39

About

A collection of the different models written for the New York City Taxi Fare Prediction Kaggle competition.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%