Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for downloading from Azure cloud storage #382

Open
FlorisCalkoen opened this issue Feb 16, 2024 · 2 comments
Open

Add support for downloading from Azure cloud storage #382

FlorisCalkoen opened this issue Feb 16, 2024 · 2 comments
Labels
enhancement Idea or request for a new feature

Comments

@FlorisCalkoen
Copy link

FlorisCalkoen commented Feb 16, 2024

Edit by @leouieda on 2024-02-19

Add a AzureDownloader that can fetch the data from Azure cloud storage. It should support an authentication token, ideally with the option to read it from an environment variable.


Original issue 👇🏾

Description of the desired feature:

Would it be possible to add support for fetching data from private cloud containers?

import os

import dotenv
import pooch
import pandas as pd

dotenv.load_dotenv(override=True)
sas_token = os.getenv("AZURE_STORAGE_SAS_TOKEN")

storage_options = {"account_name": "storage_account_name", "account_key": sas_token}

href = "az://some/private/container/file.parquet"
fp = pooch.retrieve(href, known_hash=None, storage_options=storage_options)
pd.read_parquet(fp)

# this currently works for azure, but I'm not sure if its the best approach
href = "az://some/private/container/file.parquet" + sas_token
fp = pooch.retrieve(href, known_hash=None)
pd.read_parquet(fp)

Are you willing to help implement and maintain this feature?
Maybe, yes!

@FlorisCalkoen FlorisCalkoen added the enhancement Idea or request for a new feature label Feb 16, 2024
@remrama
Copy link

remrama commented Feb 16, 2024

@FlorisCalkoen I just made a custom Downloader like this, but for Google Cloud Storage. If it's useful to you, I linked it in a comment on a similar Issue thread (#363).

@leouieda
Copy link
Member

Hi @remrama @FlorisCalkoen @WesleyTheGeolien I have 0 experience with cloud containers but since multiple people have requested this than we can look into it.

As @remrama said, this would be best implemented as a downloader. It could take the token as input but could also take a name of an environment variable and do the reading for you.

From what I gather, each cloud would have their own API for fetching the data so they'd need separate implementations. Since Pooch is supposed to be a very lightweight dependency for other projects, any downloader that requires a new dependency would have to make that dependency optional. We already do this for SFTP for example.

I'll edit this issue and #363 to make them explicitly about AWS and Azure. @remrama would you mind opening a new one for Google Cloud Storage and include the link to your code?

If either of you would like to implement this, then it would be great! We'd need:

  1. A new downloader (GCSDownloader, AWSDownloader, AzureDownloader) in pooch/downloaders.py (see https://www.fatiando.org/pooch/latest/downloaders.html and the existing downloaders). Make sure to add it to the choose_downloader function so that Pooch can automatically find it based on the prefix (az: etc).
  2. The test data in our data folder uploaded to the storage so we can test that it works.
  3. Tests in pooch/tests/test_downloaders.py that check if the download works and that any errors that should be raised are actually raised.
  4. Example documentation, probably in https://www.fatiando.org/pooch/latest/protocols.html

Not sure what the pricing model is for these providers (which is why I never bothered with them) but if it's not possible to have our test data on them so that we can very the functionality then I think it's best to leave the downloader outside of Pooch itself.

@leouieda leouieda changed the title Fetching data from a private cloud container Add support for downloading from Azure cloud storage Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants