Adding support for Spacewalk files in tadbit model #393

iagomaceda · 2024-07-09T08:39:25Z

Hi,

It would be nice to add support for the Spacewalk binary files as an output for 'tadbit model'.

I already coded the function needed using the ensembl of models as a base:

def write_spacewalk(model_ensemble : StructuralModels, sw_filename : str):
    """
    This function writes a spacewalk file from a model ensemble.

    Parameters
    ----------
    model_ensemble : StructuralModels
        A model ensemble object.
    sw_filename : str
        The name of the spacewalk file to be written.
    """

    region_chr = model_ensemble.description["chromosome"]
    region_start = model_ensemble.description["start"]
    region_reso = model_ensemble.description["resolution"]

    metadata = {"format" : "sw1",
                'version' : '1.0.0', 
                'author' : 'tadbit',
                'date' : str(datetime.now()),
                "name" : f"{model_ensemble.description['project']}_{model_ensemble.description['identifier']}",
                "genome" : str(model_ensemble.description['assembly'])}

    with h5py.File(sw_filename, 'w') as swbf:

        header_group = swbf.create_group('Header')
        header_group.attrs.update(metadata)
        header_group.attrs['point_type'] = 'single_point'

        root = swbf.create_group(metadata['name'])
        genomic_position_group = root.create_group('genomic_position')
        spatial_position_group = root.create_group('spatial_position')

        region_list = [[region_chr, str(region_start + (i - 1) * region_reso), str(region_start + i * region_reso)] for i in range(0, model_ensemble.nloci + 1)]

        genomic_position_group.create_dataset('regions', data=region_list)

        live_contact_map_vertices = np.empty((0, 3), dtype=float)
        
        for model_index, model in enumerate(model_ensemble, start=0):
            
            xyz = np.array([model['x'], model['y'], model['z']], dtype=float).T
            live_contact_map_vertices = np.vstack((live_contact_map_vertices, xyz))

            dataset_name = f't_{model_index}'
            spatial_position_group.create_dataset(dataset_name, data=xyz)

        root.create_dataset('live_contact_map_vertices', data=live_contact_map_vertices)

If you could add it (and the argument to get it) it would be very nice. Feel free to change variable names to fit the rest of the code.

Thanks,

Iago

The text was updated successfully, but these errors were encountered:

zozo123 · 2024-07-18T12:30:05Z

Great idea!

Here's the binary format specification: The Spacewalk Binary File Format

iagomaceda added enhancement Modelling labels Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for Spacewalk files in tadbit model #393

Adding support for Spacewalk files in tadbit model #393

iagomaceda commented Jul 9, 2024

zozo123 commented Jul 18, 2024

Adding support for Spacewalk files in tadbit model #393

Adding support for Spacewalk files in tadbit model #393

Comments

iagomaceda commented Jul 9, 2024

zozo123 commented Jul 18, 2024