-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to register a dataset with the python sdk #1091
Comments
I'm pretty sure you can do this from the Azure ML UI (Create Datasets in the Studio). Another alternative would be to use another os, or a Azure ML Compute Instance Notebook VM. |
I've managed to make some progress. It appears the dependency that's missing is python lttngust. This isn't a pip / conda package, but an optional dependency of dot net core, and is installed at the os level. I'm running archlinux, so
allowed me to progress beyond the point where it was checking for dependencies. Looking through the codebase, it appears that azureml puts dotnetcore2 in site-packages/bin of your python environment, then checks for os level dependencies, and if it can't find them, it copies them for the OSes "supported" from azure blobs, and when everything's ok, it writes a deps/success file. Tbf, this seems quite strange, as a library is effectively downloading stuff at runtime, but ah well... @swanderz the UI is not an option, as I'm automating this. Another OS may be needed in the build pipeline unless I can get lttngust on our existing images. Either way the automation will be running outside the Azure ML environment. I'm still not sure whether this brings down data or not, but at the very least, a dataset registration is working. For those facing similar issues (i.e. "NotImplementedError: Unsupported Linux distribution"), running the following in a python terminal with the same python environment will help you see what the missing dependencies are. If you then install them at the OS level, it should work:
The third instruction will have a debug line with the missing dependencies. (A similar issue: #713 ) |
I'm trying to programmatically register a dataset with the python sdk. The dataset will be from a datastore for a storage container, and point to a csv file. I'm trying the following:
Now this fails when I run it from archlinux, stating that archlinux is not supported. Going through the stack trace, it looks like the ml sdk is asking the dot net core 2 python package to install additional dependencies at runtime. I've got dot net core 2 and 3 installed on the machine. Is there some way to install whatever's needed up front to not require this download of binaries at runtime? I'll need to run this script as part of a ci pipeline, and it's not really that sensible to download binaries each and every time.
Also, is there a way to register a dataset without having the data accessed, or used locally via dot net or otherwise? I just want a dataset in the azureml workspace - I don't really care about using the data locally (or in the ci pipeline).
Thanks.
The text was updated successfully, but these errors were encountered: