Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gapped peak column 13 through 15 #592

Open
ketakibhide opened this issue Aug 25, 2023 · 1 comment
Open

gapped peak column 13 through 15 #592

ketakibhide opened this issue Aug 25, 2023 · 1 comment

Comments

@ketakibhide
Copy link

Hello,

I am using macs3 hmmratac mode for ATAC-Seq samples without control samples. I am getting 0 value for columns 13 through 15 in accessible peaks output. Is this related to no control samples? Also, could you please point me to column headers for all the columns in gappedpeak output file.

Regards
Ketaki Bhide

@mrendleman
Copy link

mrendleman commented Mar 6, 2024

Since MACS3 HMMRATAC doesn't take controls (see macs3 hmmratac --help or the online docs), it's likely an intentional (temporary?) choice of the developers to leave those columns as zeros.

For the column headers as a whole, they're mostly the same as the original HMMRATAC tool, which roughly follow the specification for ENCODE's gappedPeak format but with a few differences. Some of the differences are minor:

  • Columns 5 (score), 13 (signalValue), 14 (pValue), & 15 (qValue) are always set to zero
  • Column 6 (+/- strand) is always set to "."

Some differences are less obvious. The MACS3 HMMRATAC gappedPeaks entries not only contain the open regions, but also the surrounding nucleosomes. This is clearer if you set the --save-states flag and compare the peaks with the corresponding entries in _states.bed:

image

Additionally, some gappedPeak entries actually have more than one open region, e.g. where there's a nuc-open-nuc-open-nuc structure. Columns 7 & 8 denote the start of the first block and the end of the last block within the gappedPeak entry. This is used by visualization tools:

image
image

As in the ENCODE gappedPeak spec, column 10 (blockCounts) tells you how many open regions are in the entry, column 11 (blockSizes) is a comma-separated list of the block sizes, and column 12 (blockStarts) is a comma-separated list of the block start locations relative to column 2 (chromStart). In the original HMMRATAC, columns 10-12 were used a bit differently, something to do with standard- vs high-coverage peaks and visualization, but I don't totally understand how they were using them before.

I do hope they re-implement the signal scoring values rather than just placeholders for that column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants