Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial setup and EDA project steps #41

Closed
wants to merge 4 commits into from

Conversation

dhedderich
Copy link

The following changes have been made:

  1. Added markdown cells to explain the purpose of the notebook and make it understandable.
  2. Updated the main.py script to obtain a sample of the data. The pipeline also uploads the data to Weights & Biases.
  3. Executed the EDA step using the command mlflow run src/eda. This command installs Jupyter and the necessary dependencies for pandas-profiling and opens a Jupyter notebook instance.
  4. In the Jupyter notebook, added code to fetch the artifact (sample.csv) from Weights & Biases and read it using pandas.
  5. Utilized ydata-profiling to create a profile of the dataset and display it using interactive widgets.
  6. Provided guidance on what to observe during the EDA process, such as identifying missing values, data format issues, and outliers in the price column.
  7. Included code to drop outliers based on a specified price range and convert the last_review column to datetime format.

@dhedderich dhedderich closed this Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant