The goal of this project is to create a data pipeline that extracts posts from the r/dataengineering subreddit.
The data will be ingested daily to Google Cloud Storage. The next step is loading raw data into BigQuery where transformations will take place next. Finally, the enriched data will be ingested to QlikView to create a simple visualization.
- The workflow will be orchestrated using Apache Airflow.
- The project will be containerized using Docker.
- The infrastructure will be managed using Terraform.
- Continuous Integration/Continuous Deployment (CI/CD) will be implemented using GitHub Actions to automate the deployment process.