Skip to content

This program converts a simple TSV file into a HuBMAP ASCT+B table.

License

Notifications You must be signed in to change notification settings

emquardokus/asct-b-generator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASCT+B Generator

This program converts a simple TSV file into a HuBMAP ASCT+B table.

The included file "demo-input.txt" was generated by Excel using the "demo-input.xlsx" file (Save As "Tab delimited Text"). The generated output will be a TSV file, although the "demo-output.xlsx" file included in this repository is an Excel file.

This program has only been tested on a Mac OS using Python 3. Although it should work on a Linux system.

Usage

To process the demo input file and generate a TSV file that can be opened by Excel

process.py "<name of top level entity>" <number of anatomical structure levels> <input TSV file> <output TSV file>
process.py "organ" 3 demo-input.txt demo-output.xls

Input file (TSV)

The tab delimited file should contain the following twelve columns:

NAME (REF DOI) LABEL (REF DETAILS) ID (REF NOTES) TYPE CHILDREN GENES PROTEINS PROTEOFORMS LIPIDS METABOLITES FTUs REFERENCES

The Type value needs to be "AS" for anatomical structures and "CT" for cell types. It doesn't matter what type values are used for the other items, so long as it's not either AS or CT.

Children is a comma separated list of child objects. These children need to be either anatomical structures (AS) or cell types (CT). The Genes, Proteins, Proteoforms, etc fields should be comma separated lists of the appropriate objects (e.g., Genes should be a comma separated list of relevant genes). In all cases the objects Name or Ref DOI should be used.

If an anatomical structure contains child structures or cell types, then it can not be assigned biomarkers (e.g., genes, proteins, etc). Biomarkers and references can only be applied to the lowest level of anatomical structures and to cell types.

The first line in the input file is assumed to contain a header and is ignored.

The following example is incomplete and just included to exemplify the field values and usage:

NAME (REF DOI)	LABEL (REF DETAILS)	ID (REF NOTES)	TYPE	CHILDREN	GENES	PROTEINS	PROTEOFORMS	LIPIDS	METABOLITES	FTU	REFERENCES (NAME/DOI)
ovary		UBERON:0000992	AS	central ovary, lateral ovary, medial ovary, mesovarium, ovarian ligament, hilum of ovary
central ovary			AS	central inferior ovary, central superior ovary
lateral ovary			AS	lateral inferior ovary, lateral superior ovary
medial ovary			AS	medial inferior ovary, medial superior ovary
mesovarium		UBERON:0001342	AS	
ovarian ligament		UBERON:0008847	AS	
hilum of ovary			AS	ovarian artery, ovarian vein, pampiniform plexus, rete ovarii, hilar cell
corona radiata		CL:0000713	CT								doi:10.1093/oxfordjournals.humrep.a136365
hilar cell		CL:0002095	CT			alkaline phosphatase, acid phosphatase, non-specific esterase, inhibin, calretinin, melan-A, cholesterol esters					McKay et al 1961, Boss et al 1965, Mills et al 2020, Jungbluth et al 1998, Pelkey et al 1998
mural granulosa cell			CT								doi:10.1093/oxfordjournals.humrep.a136365
primary oocyte		CL:0000654	CT								doi:10.1093/oxfordjournals.humrep.a136365
secondary oocyte		CL:0000655	CT								doi:10.1093/oxfordjournals.humrep.a136365
columnar ovarian surface epithelial columnar cell			CT			calretinin, mesothelin					Mills et al 2020, Reeves et al 1971, Hummitzsch et al 2013, Blaustein et al 1979, McKay et al 1961
flattened cuboidal ovarian surface epithelial cell			CT			oviduct-specific glycoprotein-1, E-cadherin					Mills et al 2020, Reeves et al 1971, Hummitzsch et al 2013, Blaustein et al 1979, McKay et al 1961
oviduct-specific glycoprotein-1			Protein								
mesothelin			Protein								
E-cadherin			Protein								
doi:10.1093/oxfordjournals.humrep.a136365	PMID: 3558758		Reference								
McKay et al 1961	McKay, D., Pinkerton, J., Hertig, A. & Danziger, S. (1961). The Adult Human Ovary: A Histochemical Study. Obstetrics & Gynecology, 18(1), 13-39. 		Reference								

Known problems and limitations

  1. The user needs to know how many levels for the anatomical structures or at least an over estimate of the number of levels.
  2. The program doesn't insert a header line in the output file.

About

This program converts a simple TSV file into a HuBMAP ASCT+B table.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%