Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Spacewalk files in tadbit model #393

Open
iagomaceda opened this issue Jul 9, 2024 · 1 comment
Open

Adding support for Spacewalk files in tadbit model #393

iagomaceda opened this issue Jul 9, 2024 · 1 comment

Comments

@iagomaceda
Copy link
Contributor

Hi,

It would be nice to add support for the Spacewalk binary files as an output for 'tadbit model'.

I already coded the function needed using the ensembl of models as a base:

def write_spacewalk(model_ensemble : StructuralModels, sw_filename : str):
    """
    This function writes a spacewalk file from a model ensemble.

    Parameters
    ----------
    model_ensemble : StructuralModels
        A model ensemble object.
    sw_filename : str
        The name of the spacewalk file to be written.
    """

    region_chr = model_ensemble.description["chromosome"]
    region_start = model_ensemble.description["start"]
    region_reso = model_ensemble.description["resolution"]

    metadata = {"format" : "sw1",
                'version' : '1.0.0', 
                'author' : 'tadbit',
                'date' : str(datetime.now()),
                "name" : f"{model_ensemble.description['project']}_{model_ensemble.description['identifier']}",
                "genome" : str(model_ensemble.description['assembly'])}

    with h5py.File(sw_filename, 'w') as swbf:

        header_group = swbf.create_group('Header')
        header_group.attrs.update(metadata)
        header_group.attrs['point_type'] = 'single_point'

        root = swbf.create_group(metadata['name'])
        genomic_position_group = root.create_group('genomic_position')
        spatial_position_group = root.create_group('spatial_position')

        region_list = [[region_chr, str(region_start + (i - 1) * region_reso), str(region_start + i * region_reso)] for i in range(0, model_ensemble.nloci + 1)]

        genomic_position_group.create_dataset('regions', data=region_list)

        live_contact_map_vertices = np.empty((0, 3), dtype=float)
        
        for model_index, model in enumerate(model_ensemble, start=0):
            
            xyz = np.array([model['x'], model['y'], model['z']], dtype=float).T
            live_contact_map_vertices = np.vstack((live_contact_map_vertices, xyz))

            dataset_name = f't_{model_index}'
            spatial_position_group.create_dataset(dataset_name, data=xyz)

        root.create_dataset('live_contact_map_vertices', data=live_contact_map_vertices)

If you could add it (and the argument to get it) it would be very nice. Feel free to change variable names to fit the rest of the code.

Thanks,

Iago

@zozo123
Copy link

zozo123 commented Jul 18, 2024

Great idea!

Here's the binary format specification: The Spacewalk Binary File Format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants