Skip to content

Latest commit

 

History

History
36 lines (26 loc) · 2.13 KB

README.md

File metadata and controls

36 lines (26 loc) · 2.13 KB

Data Science Portfolio


Data Science portfolio of ipython notebooks implementing several Machine Learning algorithms following a structured, well-organized methodology to face each challenge:

[Data acquisition -> Data cleaning -> Data analysis -> Algorithm implementation -> Algorithm applied to dataset -> further optimization and advanced topics]

The notebooks cover a variety of topics and algorithms:

Algorithm Model Topic
Recommender Matrix Factorization - ALS LastFM music-user-artist data
Regression Random Forests Airplane Delay
Simulation MonteCarlo in TimeSeries Finantial Risk
Clustering KMeans Network Traffic and Anomaly Detection
Clustering KMeans in TimeSeries Timeseries of NeuroImages

The last couple of notebooks belong to a Challenge by SAFRAN, two three-hour sessions that were part of their recruitment process. They served as the ultimate test to everything learnt beforehand, since no work was allowed out of the sessions.

Details

  • Language: Python over Jupyter Notebooks.
  • Execution: set over a remote Spark cluster in EURECOM, managed by Zoe
  • Libraries: numpy, pandas, matplotlib, pyspark, thunder

Authors

  • Ole Andreas Hansen @oleaha
  • Alberto Ibarrondo Luis @ibarrond

Sources and acknowledgments

The rough sketches of all the notebooks are the main focus of the course Algorithmic Machine Learning in EURECOM, and in particular Pietro Michiardi

The majority of the Notebooks are based on use cases illustrated in the book Advanced Analytics with Spark, by Sandy Ryza, Uri Laserson, Sean Owen & Josh Wills.

The Notebooks are based on publicly available data.

License

MIT Free software