TypeError when writing zarr file #626

Magic-Ludo · 2024-07-29T22:03:26Z

Hi,
I use the following code to create precomputed files from a list of .tif images:

import os
import numpy as np
import tifffile
import imageio.v3 as iio
from PIL import Image
Image.MAX_IMAGE_PIXELS = None
from cloudvolume import CloudVolume
from cloudvolume.lib import mkdir, touch

input_dir = "data_in"
output_dir = "file:///data_out/"

#encoding = "blosc"
chunk = [64, 64, 1]
base_resolution = [1800, 1800, 4000]  # X,Y,Z values in nanometers

mkdir(output_dir.replace("file://", ""))

image_files = sorted(
    [
        os.path.join(input_dir, f)
        for f in os.listdir(input_dir)
        if f.endswith(".tif")
    ]
)

first_image = iio.imread(image_files[0])
img_shape = first_image.shape
volume_size = [img_shape[0], img_shape[1], len(image_files)]

scales = [
    {
        "chunk_sizes": [chunk],
        # "encoding": encoding,
        "key": "1800_1800_4000",
        "resolution": base_resolution,
        "size": volume_size,
        "voxel_offset": [0, 0, 0],
    },
]

info = {
    "num_channels": 1,
    "layer_type": "image",
    "data_type": "uint8",
    "scales": scales,
    "type": "image",
}

vol = CloudVolume(
    "zarr://" + output_dir,
    info=info,
    mip=0,
    fill_missing=True,
    cache=False,
    parallel=False
)
vol.provenance.description = "tdTomatoPrecomputed"
vol.provenance.owners = ["[email protected]"]

vol.commit_info()
vol.commit_provenance()

progress_dir = mkdir(
    output_dir.replace("file://", "") + "progress/"
)

to_upload = list(range(0, len(image_files)))

for z, file_name in enumerate(image_files):
    print("\n Processing ", file_name, " z: ", z)
    image = tifffile.imread(file_name).astype(np.uint8)
    print(image.shape)
    image = image[..., np.newaxis]
    vol[:, :, z] = image
    touch(os.path.join(progress_dir, str(z)))

vol.commit_info()

I recently wanted to test the new feature that allows us to write zarr files (I've replaced precomputed:// by zarr:// in the above code), which avoids generating a huge number of files. However, I'm encountering this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[34], [line 9](vscode-notebook-cell:?execution_count=34&line=9)
      [7](vscode-notebook-cell:?execution_count=34&line=7)     # image = iio.imread(file_name).astype(np.uint[8](vscode-notebook-cell:?execution_count=34&line=8))
      8     image = image[..., np.newaxis]
----> [9](vscode-notebook-cell:?execution_count=34&line=9)     vol[:, :, z] = image
     [10](vscode-notebook-cell:?execution_count=34&line=10)     touch(os.path.join(progress_dir, str(z)))
     [12](vscode-notebook-cell:?execution_count=34&line=12) # def process(z):
     [13](vscode-notebook-cell:?execution_count=34&line=13) #     print("\n Processing ", image_files[z], " z: ", z)
     [14](vscode-notebook-cell:?execution_count=34&line=14) #     image = iio.imread(image_files[z]).astype(np.uint8)
   (...)
     [24](vscode-notebook-cell:?execution_count=34&line=24) # with ProcessPoolExecutor(max_workers=8) as executor:
     [25](vscode-notebook-cell:?execution_count=34&line=25) #     executor.map(process, to_upload)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1013, in CloudVolumePrecomputed.__setitem__(self, slices, img)
   [1010](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1010) if bbox.subvoxel():
   [1011](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1011)   return
-> [1013](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1013) self.image.upload(img, bbox.minpt, self.mip, parallel=self.parallel)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:53, in readonlyguard.<locals>.guardfn(self, *args, **kwargs)
     [51](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:51) if self.readonly:
     [52](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:52)   raise exceptions.ReadOnlyException(self.meta.cloudpath)
---> [53](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:53) return fn(self, *args, **kwargs)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:194, in ZarrImageSource.upload(self, image, offset, mip, parallel, t)
    [191](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:191)     for c in range(self.meta.num_channels):
    [192](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:192)       yield image[ ispt.x:iept.x, ispt.y:iept.y, ispt.z:iept.z, c ]
--> [194](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:194) for filename, imgchunk in zip(all_chunknames, all_chunks_by_channel(all_chunks)):
    [195](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:195)   zarr_imgchunk = np.transpose(imgchunk[..., np.newaxis], axes=axis_mapping)
    [196](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:196)   binary = zarr_imgchunk.tobytes(order)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:229, in ZarrImageSource._chunknames.<locals>.ZarrChunkNamesIterator.__iter__(self)
    [227](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:227) for x,y,z in xyzrange(bbox_grid.minpt, bbox_grid.maxpt):
    [228](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:228)   for c in range(num_channels):
--> [229](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:229)     filename = sep.join([
    [230](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:230)       tchunk, str(c), str(z), str(y), str(x)
    [231](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:231)     ])
    [232](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:232)     yield cf.join(str(mip), filename)

TypeError: sequence item 0: expected str instance, int found

I tried to change line 230 in the cloudvolume/datasource/zarr/image.py file:

        for x,y,z in xyzrange(bbox_grid.minpt, bbox_grid.maxpt):
          for c in range(num_channels):
            filename = sep.join([
              HERE -> str(tchunk), str(c), str(z), str(y), str(x)
            ])
            yield cf.join(str(mip), filename)

But now another error occurs:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[4], [line 9](vscode-notebook-cell:?execution_count=4&line=9)
      [7](vscode-notebook-cell:?execution_count=4&line=7)     # image = iio.imread(file_name).astype(np.uint[8](vscode-notebook-cell:?execution_count=4&line=8))
      8     image = image[..., np.newaxis]
----> [9](vscode-notebook-cell:?execution_count=4&line=9)     vol[:, :, z] = image
     [10](vscode-notebook-cell:?execution_count=4&line=10)     touch(os.path.join(progress_dir, str(z)))
     [12](vscode-notebook-cell:?execution_count=4&line=12) # def process(z):
     [13](vscode-notebook-cell:?execution_count=4&line=13) #     print("\n Processing ", image_files[z], " z: ", z)
     [14](vscode-notebook-cell:?execution_count=4&line=14) #     image = iio.imread(image_files[z]).astype(np.uint8)
   (...)
     [24](vscode-notebook-cell:?execution_count=4&line=24) # with ProcessPoolExecutor(max_workers=8) as executor:
     [25](vscode-notebook-cell:?execution_count=4&line=25) #     executor.map(process, to_upload)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1013, in CloudVolumePrecomputed.__setitem__(self, slices, img)
   [1010](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1010) if bbox.subvoxel():
   [1011](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1011)   return
-> [1013](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:1013) self.image.upload(img, bbox.minpt, self.mip, parallel=self.parallel)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:53, in readonlyguard.<locals>.guardfn(self, *args, **kwargs)
     [51](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:51) if self.readonly:
     [52](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:52)   raise exceptions.ReadOnlyException(self.meta.cloudpath)
---> [53](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/__init__.py:53) return fn(self, *args, **kwargs)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:194, in ZarrImageSource.upload(self, image, offset, mip, parallel, t)
    [191](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:191)     for c in range(self.meta.num_channels):
    [192](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:192)       yield image[ ispt.x:iept.x, ispt.y:iept.y, ispt.z:iept.z, c ]
--> [194](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:194) for filename, imgchunk in zip(all_chunknames, all_chunks_by_channel(all_chunks)):
    [195](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:195)   zarr_imgchunk = np.transpose(imgchunk[..., np.newaxis], axes=axis_mapping)
    [196](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:196)   binary = zarr_imgchunk.tobytes(order)

File ~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:192, in ZarrImageSource.upload.<locals>.all_chunks_by_channel(all_chunks)
    [190](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:190) for ispt, iept, vol_spt, vol_ept in all_chunks:
    [191](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:191)   for c in range(self.meta.num_channels):
--> [192](https://file+.vscode-resource.vscode-cdn.net/run/user/1000/gvfs/smb-share%3Aserver%3Dgrid-hs%2Cshare%3Dhou_home/corcos/CODE/facial-muscle-segmentation/Notebooks/~/.conda/envs/neurocvt/lib/python3.10/site-packages/cloudvolume/datasource/zarr/image.py:192)     yield image[ ispt.x:iept.x, ispt.y:iept.y, ispt.z:iept.z, c ]

IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed

I don't know if it's the way I inject my data into the volume, but since this feature is new, I have the impression that the problem goes deeper than that.

Thanks for your help!

The text was updated successfully, but these errors were encountered:

william-silversmith · 2024-07-29T23:22:46Z

Hi!

Thank you for pointing out the issues. I pushed a fix to master for the first bug. I'll look into the second but here are some things to consider:

zarr does not generate fewer files than precomputed unless we are talking about > 4D files. Precomputed has a sharded format that will generate many fewer files than zarr. see: https:/seung-lab/cloud-volume/wiki/Creating-a-Sharded-Image-from-Scratch (note: these operations can be performed without Igneous now, I should update the wiki article).
The implementation of zarr in CV is custom and was written specifically for 5D timeseries, color 3D spatial volumes. I should probably update it to be a bit more general.
You could consider using a larger chunk size to reduce the number of files.

Magic-Ludo · 2024-07-31T15:21:18Z

I agree, thank you for your reply.
In fact, the data I'm handling is just in 3D color.

I use a chunk size of 64 because when I go to 128, the loading time is much longer.

I'll stick with the precomputed format then, thanks!

william-silversmith added the bug The code is not performing according to the design or a design flaw is seriously impacting users. label Jul 29, 2024

william-silversmith added zarr zarr format related. redesign It may be flawed, but the code was working as designed. labels Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError when writing zarr file #626

TypeError when writing zarr file #626

Magic-Ludo commented Jul 29, 2024

william-silversmith commented Jul 29, 2024 •

edited

Loading

Magic-Ludo commented Jul 31, 2024

TypeError when writing zarr file #626

TypeError when writing zarr file #626

Comments

Magic-Ludo commented Jul 29, 2024

william-silversmith commented Jul 29, 2024 • edited Loading

Magic-Ludo commented Jul 31, 2024

william-silversmith commented Jul 29, 2024 •

edited

Loading