Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing chunks to be filled with fill values (when server returns HTTP error 403) #131

Closed
Alexander-Barth opened this issue Dec 4, 2023 · 2 comments

Comments

@Alexander-Barth
Copy link
Contributor

When I try to load the following dataset with Zarr.jl, I get unfortunately an error:

using Zarr

ds = Zarr.zopen("https://s3.waw3-1.cloudferro.com/mdl-arco-time/arco/MEDSEA_MULTIYEAR_PHY_006_004/med-cmcc-cur-rean-d_202012/timeChunked.zarr")

ds["uo"][:,:,1,1]
# full error below

Yet, the data can be read with python zarr

import zarr

z = zarr.open("https://s3.waw3-1.cloudferro.com/mdl-arco-time/arco/MEDSEA_MULTIYEAR_PHY_006_004/med-cmcc-cur-rean-d_202012/timeChunked.zarr");

gz = z["uo"]
data = gz[0,0,:,:];

Note that all the data is filled with fill value (1e20) for this chunk. According to the OGC spec, it seems to be ok that not all chunks are present:

There is no need for all chunks to be present within an array store. If a chunk is not present then
it is considered to be in an uninitialized state. An unitialized chunk MUST be treated as if it was
uniformly filled with the value of the “fill_value” field in the array metadata. If the “fill_value” field
is null then the contents of the chunk are undefined.

Can Zarr.jl handle this case too? Are you accepting a PR for this issue?

I am using Zarr v0.9.1.

Thank for this great package, by the way :-)

Full error from Zarr.jl:

ERROR: TaskFailedException
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:920
  [2] wait()
    @ Base ./task.jl:984
  [3] wait(c::Base.GenericCondition{ReentrantLock}; first::Bool)
    @ Base ./condition.jl:130
  [4] wait
    @ ./condition.jl:125 [inlined]
  [5] take_buffered(c::Channel{Pair{CartesianIndex{4}, Union{Nothing, Vector{UInt8}}}})
    @ Base ./channels.jl:456
  [6] take!
    @ ./channels.jl:450 [inlined]
  [7] readblock!(aout::Array{Float32, 4}, z::ZArray{Float32, 4, Zarr.BloscCompressor, Zarr.ConsolidatedStore{Zarr.HTTPStore}}, r::CartesianIndices{4, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, UnitRange{Int64}, UnitRange{Int64}}})   
    @ Zarr ~/.julia/dev/Zarr/src/ZArray.jl:172
  [8] readblock!(::ZArray{Float32, 4, Zarr.BloscCompressor, Zarr.ConsolidatedStore{Zarr.HTTPStore}}, ::Array{Float32, 4}, ::Base.OneTo{Int64}, ::Vararg{AbstractUnitRange})
    @ Zarr ~/.julia/dev/Zarr/src/ZArray.jl:247
  [9] getindex_disk(::ZArray{Float32, 4, Zarr.BloscCompressor, Zarr.ConsolidatedStore{Zarr.HTTPStore}}, ::Function, ::Vararg{Any})
    @ DiskArrays ~/.julia/dev/DiskArrays/src/diskarray.jl:44
 [10] getindex(::ZArray{Float32, 4, Zarr.BloscCompressor, Zarr.ConsolidatedStore{Zarr.HTTPStore}}, ::Function, ::Function, ::Int64, ::Int64)
    @ DiskArrays ~/.julia/dev/DiskArrays/src/diskarray.jl:215
 [11] top-level scope
    @ REPL[229]:1
 [12] top-level scope
    @ ~/.julia/packages/CUDA/35NC6/src/initialization.jl:190

    nested task error: Error connecting to https://s3.waw3-1.cloudferro.com/mdl-arco-time/arco/MEDSEA_MULTIYEAR_PHY_006_004/med-cmcc-cur-rean-d_202012/timeChunked.zarr :<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><BucketName>mdl-arco-time</BucketName><RequestId>tx0000000000000003c6076-00656d923c-9f064368-default</RequestId><HostId>9f064368-default-waw3-1</HostId></Error>
    Stacktrace:
     [1] error(::String, ::String)
       @ Base ./error.jl:44
     [2] getindex(s::Zarr.HTTPStore, k::String)
       @ Zarr ~/.julia/dev/Zarr/src/Storage/http.jl:24
     [3] getindex
       @ ~/.julia/dev/Zarr/src/Storage/consolidated.jl:27 [inlined]
     [4] getindex
       @ ~/.julia/dev/Zarr/src/Storage/Storage.jl:55 [inlined]
     [5] getindex
       @ ~/.julia/dev/Zarr/src/Storage/Storage.jl:54 [inlined]
     [6] (::Zarr.var"#10#11"{Zarr.ConsolidatedStore{Zarr.HTTPStore}, Channel{Pair{CartesianIndex{4}, Union{Nothing, Vector{UInt8}}}}, String})(ii::CartesianIndex{4})
       @ Zarr ~/.julia/dev/Zarr/src/Storage/Storage.jl:121
     [7] (::Base.var"#978#983"{Zarr.var"#10#11"{Zarr.ConsolidatedStore{Zarr.HTTPStore}, Channel{Pair{CartesianIndex{4}, Union{Nothing, Vector{UInt8}}}}, String}})(r::Base.RefValue{Any}, args::Tuple{CartesianIndex{4}})                                     
       @ Base ./asyncmap.jl:100
     [8] macro expansion
       @ ./asyncmap.jl:234 [inlined]
     [9] (::Base.var"#994#995"{Base.var"#978#983"{Zarr.var"#10#11"{Zarr.ConsolidatedStore{Zarr.HTTPStore}, Channel{Pair{CartesianIndex{4}, Union{Nothing, Vector{UInt8}}}}, String}}, Channel{Any}, Nothing})()
       @ Base ./task.jl:514
    Stacktrace:
      [1] (::Base.var"#988#990")(x::Task)
        @ Base ./asyncmap.jl:177
      [2] foreach(f::Base.var"#988#990", itr::Vector{Any})
        @ Base ./abstractarray.jl:3073
      [3] maptwice(wrapped_f::Function, chnl::Channel{Any}, worker_tasks::Vector{Any}, c
::CartesianIndices{4, NTuple{4, UnitRange{Int64}}})                                    
        @ Base ./asyncmap.jl:177
      [4] wrap_n_exec_twice
        @ ./asyncmap.jl:153 [inlined]
      [5] async_usemap(f::Zarr.var"#10#11"{Zarr.ConsolidatedStore{Zarr.HTTPStore}, Channel{Pair{CartesianIndex{4}, Union{Nothing, Vector{UInt8}}}}, String}, c::CartesianIndices
{4, NTuple{4, UnitRange{Int64}}}; ntasks::Int64, batch_size::Nothing)                  
        @ Base ./asyncmap.jl:103
      [6] async_usemap
        @ ./asyncmap.jl:84 [inlined]
      [7] #asyncmap#972
        @ ./asyncmap.jl:81 [inlined]
      [8] asyncmap
        @ ./asyncmap.jl:80 [inlined]
      [9] read_items!
        @ ~/.julia/dev/Zarr/src/Storage/Storage.jl:119 [inlined]
     [10] read_items!
        @ ~/.julia/dev/Zarr/src/Storage/Storage.jl:109 [inlined]
     [11] macro expansion
        @ ~/.julia/dev/Zarr/src/ZArray.jl:165 [inlined]
     [12] (::Zarr.var"#63#66"{Channel{Pair{CartesianIndex{4}, Union{Nothing, Vector{UInt8}}}}, ZArray{Float32, 4, Zarr.BloscCompressor, Zarr.ConsolidatedStore{Zarr.HTTPStore}}, ZArray{Float32, 4, Zarr.BloscCompressor, Zarr.ConsolidatedStore{Zarr.HTTPStore}}, CartesianIndices{4, NTuple{4, UnitRange{Int64}}}})()
        @ Zarr ./task.jl:514
@Alexander-Barth Alexander-Barth changed the title missing chunks to be filled with fill values missing chunks to be filled with fill values (' Feb 7, 2024
@Alexander-Barth Alexander-Barth changed the title missing chunks to be filled with fill values (' missing chunks to be filled with fill values (when server returns HTTP error 403) Feb 7, 2024
@Alexander-Barth
Copy link
Contributor Author

It turns out that the server returns the error 403 for missing chunks, while Zarr.jl only looks for 404:

if r.status == 404

In python-zarr any error is ignored and leading to a chunk filled with fill values:

https:/zarr-developers/zarr-python/blob/a81db0782535ba04c32c277102a6457d118a73e8/zarr/storage.py#L1417

Maybe we should so the same, at least for all HTTP errors between 400 and 499 (excluding internal server errors 500, ...).

@meggart
Copy link
Collaborator

meggart commented Feb 8, 2024

Fixed by #134

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants