Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when writing zarr file #626

Open
Magic-Ludo opened this issue Jul 29, 2024 · 2 comments
Open

TypeError when writing zarr file #626

Magic-Ludo opened this issue Jul 29, 2024 · 2 comments
Labels
bug The code is not performing according to the design or a design flaw is seriously impacting users. redesign It may be flawed, but the code was working as designed. zarr zarr format related.

Comments

@Magic-Ludo
Copy link

Hi,
I use the following code to create precomputed files from a list of .tif images:

import os
import numpy as np
import tifffile
import imageio.v3 as iio
from PIL import Image
Image.MAX_IMAGE_PIXELS = None
from cloudvolume import CloudVolume
from cloudvolume.lib import mkdir, touch

input_dir = "data_in"
output_dir = "file:///data_out/"

#encoding = "blosc"
chunk = [64, 64, 1]
base_resolution = [1800, 1800, 4000]  # X,Y,Z values in nanometers

mkdir(output_dir.replace("file://", ""))

image_files = sorted(
    [
        os.path.join(input_dir, f)
        for f in os.listdir(input_dir)
        if f.endswith(".tif")
    ]
)

first_image = iio.imread(image_files[0])
img_shape = first_image.shape
volume_size = [img_shape[0], img_shape[1], len(image_files)]

scales = [
    {
        "chunk_sizes": [chunk],
        # "encoding": encoding,
        "key": "1800_1800_4000",
        "resolution": base_resolution,
        "size": volume_size,
        "voxel_offset": [0, 0, 0],
    },
]

info = {
    "num_channels": 1,
    "layer_type": "image",
    "data_type": "uint8",
    "scales": scales,
    "type": "image",
}

vol = CloudVolume(
    "zarr://" + output_dir,
    info=info,
    mip=0,
    fill_missing=True,
    cache=False,
    parallel=False
)
vol.provenance.description = "tdTomatoPrecomputed"
vol.provenance.owners = ["[email protected]"]

vol.commit_info()
vol.commit_provenance()

progress_dir = mkdir(
    output_dir.replace("file://", "") + "progress/"
)

to_upload = list(range(0, len(image_files)))

for z, file_name in enumerate(image_files):
    print("\n Processing ", file_name, " z: ", z)
    image = tifffile.imread(file_name).astype(np.uint8)
    print(image.shape)
    image = image[..., np.newaxis]
    vol[:, :, z] = image
    touch(os.path.join(progress_dir, str(z)))

vol.commit_info()

I recently wanted to test the new feature that allows us to write zarr files (I've replaced precomputed:// by zarr:// in the above code), which avoids generating a huge number of files. However, I'm encountering this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[34], [line 9](vscode-notebook-cell:?execution_count=34&line=9)
      [7](vscode-notebook-cell:?execution_count=34&line=7)     # image = iio.imread(file_name).astype(np.uint[8](vscode-notebook-cell:?execution_count=34&line=8))
      8     image = image[..., np.newaxis]
----> [9](vscode-notebook-cell:?execution_count=34&line=9)     vol[:, :, z] = image
     [10](vscode-notebook-cell:?execution_count=34&line=10)     touch(os.path.join(progress_dir, str(z)))
     [12](vscode-notebook-cell:?execution_count=34&line=12) # def process(z):
     [13](vscode-notebook-cell:?execution_count=34&line=13) #     print("\n Processing ", image_files[z], " z: ", z)
     [14](vscode-notebook-cell:?execution_count=34&line=14) #     image = iio.imread(image_files[z]).astype(np.uint8)
   (...)
     [24](vscode-notebook-cell:?execution_count=34&line=24) # with ProcessPoolExecutor(max_workers=8) as executor:
     [25](vscode-notebook-cell:?execution_count=34&line=25) #     executor.map(process, to_upload)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1013, in CloudVolumePrecomputed.__setitem__(self, slices, img)
   [1010](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1010) if bbox.subvoxel():
   [1011](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1011)   return
-> [1013](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1013) self.image.upload(img, bbox.minpt, self.mip, parallel=self.parallel)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:53, in readonlyguard.<locals>.guardfn(self, *args, **kwargs)
     [51](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:51) if self.readonly:
     [52](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:52)   raise exceptions.ReadOnlyException(self.meta.cloudpath)
---> [53](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:53) return fn(self, *args, **kwargs)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:194, in ZarrImageSource.upload(self, image, offset, mip, parallel, t)
    [191](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:191)     for c in range(self.meta.num_channels):
    [192](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:192)       yield image[ ispt.x:iept.x, ispt.y:iept.y, ispt.z:iept.z, c ]
--> [194](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:194) for filename, imgchunk in zip(all_chunknames, all_chunks_by_channel(all_chunks)):
    [195](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:195)   zarr_imgchunk = np.transpose(imgchunk[..., np.newaxis], axes=axis_mapping)
    [196](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:196)   binary = zarr_imgchunk.tobytes(order)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:229, in ZarrImageSource._chunknames.<locals>.ZarrChunkNamesIterator.__iter__(self)
    [227](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:227) for x,y,z in xyzrange(bbox_grid.minpt, bbox_grid.maxpt):
    [228](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:228)   for c in range(num_channels):
--> [229](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:229)     filename = sep.join([
    [230](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:230)       tchunk, str(c), str(z), str(y), str(x)
    [231](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:231)     ])
    [232](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:232)     yield cf.join(str(mip), filename)

TypeError: sequence item 0: expected str instance, int found

I tried to change line 230 in the cloudvolume/datasource/zarr/image.py file:

        for x,y,z in xyzrange(bbox_grid.minpt, bbox_grid.maxpt):
          for c in range(num_channels):
            filename = sep.join([
              HERE -> str(tchunk), str(c), str(z), str(y), str(x)
            ])
            yield cf.join(str(mip), filename)

But now another error occurs:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[4], [line 9](vscode-notebook-cell:?execution_count=4&line=9)
      [7](vscode-notebook-cell:?execution_count=4&line=7)     # image = iio.imread(file_name).astype(np.uint[8](vscode-notebook-cell:?execution_count=4&line=8))
      8     image = image[..., np.newaxis]
----> [9](vscode-notebook-cell:?execution_count=4&line=9)     vol[:, :, z] = image
     [10](vscode-notebook-cell:?execution_count=4&line=10)     touch(os.path.join(progress_dir, str(z)))
     [12](vscode-notebook-cell:?execution_count=4&line=12) # def process(z):
     [13](vscode-notebook-cell:?execution_count=4&line=13) #     print("\n Processing ", image_files[z], " z: ", z)
     [14](vscode-notebook-cell:?execution_count=4&line=14) #     image = iio.imread(image_files[z]).astype(np.uint8)
   (...)
     [24](vscode-notebook-cell:?execution_count=4&line=24) # with ProcessPoolExecutor(max_workers=8) as executor:
     [25](vscode-notebook-cell:?execution_count=4&line=25) #     executor.map(process, to_upload)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1013, in CloudVolumePrecomputed.__setitem__(self, slices, img)
   [1010](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1010) if bbox.subvoxel():
   [1011](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1011)   return
-> [1013](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1013) self.image.upload(img, bbox.minpt, self.mip, parallel=self.parallel)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:53, in readonlyguard.<locals>.guardfn(self, *args, **kwargs)
     [51](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:51) if self.readonly:
     [52](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:52)   raise exceptions.ReadOnlyException(self.meta.cloudpath)
---> [53](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:53) return fn(self, *args, **kwargs)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:194, in ZarrImageSource.upload(self, image, offset, mip, parallel, t)
    [191](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:191)     for c in range(self.meta.num_channels):
    [192](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:192)       yield image[ ispt.x:iept.x, ispt.y:iept.y, ispt.z:iept.z, c ]
--> [194](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:194) for filename, imgchunk in zip(all_chunknames, all_chunks_by_channel(all_chunks)):
    [195](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:195)   zarr_imgchunk = np.transpose(imgchunk[..., np.newaxis], axes=axis_mapping)
    [196](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:196)   binary = zarr_imgchunk.tobytes(order)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:192, in ZarrImageSource.upload.<locals>.all_chunks_by_channel(all_chunks)
    [190](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:190) for ispt, iept, vol_spt, vol_ept in all_chunks:
    [191](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:191)   for c in range(self.meta.num_channels):
--> [192](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:192)     yield image[ ispt.x:iept.x, ispt.y:iept.y, ispt.z:iept.z, c ]

IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed

I don't know if it's the way I inject my data into the volume, but since this feature is new, I have the impression that the problem goes deeper than that.

Thanks for your help!

@william-silversmith william-silversmith added the bug The code is not performing according to the design or a design flaw is seriously impacting users. label Jul 29, 2024
@william-silversmith
Copy link
Contributor

william-silversmith commented Jul 29, 2024

Hi!

Thank you for pointing out the issues. I pushed a fix to master for the first bug. I'll look into the second but here are some things to consider:

  1. zarr does not generate fewer files than precomputed unless we are talking about > 4D files. Precomputed has a sharded format that will generate many fewer files than zarr. see: https:/seung-lab/cloud-volume/wiki/Creating-a-Sharded-Image-from-Scratch (note: these operations can be performed without Igneous now, I should update the wiki article).
  2. The implementation of zarr in CV is custom and was written specifically for 5D timeseries, color 3D spatial volumes. I should probably update it to be a bit more general.
  3. You could consider using a larger chunk size to reduce the number of files.

@william-silversmith william-silversmith added zarr zarr format related. redesign It may be flawed, but the code was working as designed. labels Jul 29, 2024
@Magic-Ludo
Copy link
Author

I agree, thank you for your reply.
In fact, the data I'm handling is just in 3D color.

I use a chunk size of 64 because when I go to 128, the loading time is much longer.

I'll stick with the precomputed format then, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The code is not performing according to the design or a design flaw is seriously impacting users. redesign It may be flawed, but the code was working as designed. zarr zarr format related.
Projects
None yet
Development

No branches or pull requests

2 participants