Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion rules - how to interpret and use them #616

Open
gowthamrao opened this issue Oct 12, 2021 · 3 comments
Open

Inclusion rules - how to interpret and use them #616

gowthamrao opened this issue Oct 12, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@gowthamrao
Copy link
Member

gowthamrao commented Oct 12, 2021

From @chrisknoll

There are three inclusion rules tables

cohort_inclusion = "#cohort_inclusion"

Has rule names and description by cohort. Useful to know what rule are we looking at

cohort_inclusion_stats = "#cohort_inc_stats"

Has person_count, gain_count, person_total

select *
from cohort_inclusion_stats
where cohort_definition_id = 2511

would give something like this:

image
Mode 0 = all events, mode 1 = best event. The beset event is the single event per person that matched the most inclusion criteria. Because the person total for mode_id =0 is 37.6k, but mode 1 is 12.8k, we can tell that this cohort has multiple events per person.

cohort_inclusion_result = "#cohort_inc_result"

select *
from results_optum_extended_dod_v1707.cohort_inclusion_result
where cohort_definition_id = 2511

image

In this table, inclusion_rule_mask is a bitstring of inclusion rules (0 based index) that matched that combination, and the count. So the first row says 26 entry events met mask = 7, which is 111 in binary, which is inclusion rule 1,2 and 3. 29072 people met no criteria (mask = 0), and 57 people had mask = 5, which is 101 which is inclusion rule 1 and 3 (index 0 and 2 of the bits are set)

bit operators to find people with inclusion rule 3 for example (which would be 2^2 = 4), so it’s something like:

WHERE inclusion_rule_mask & 4 = 4

This returns all the rows where that bit is set, and then you can GROUP BY SUM(person_count) on those to tell you number of people who had that inclusion rule.

If you wanted rule 3 and rule 1, that would be 4+1 = 5 so where inclusion_rule_mask & 5 = 5.

If you wanted to check for people that had any of those, then it would just be maxk & 5 > 0 because if any of the bits in ‘5’ are set, you get a > 0 result.

cohort_summary_stats = "#cohort_summary_stats"

This is fourth table, that is a derived table that is only present in Cohort Diagnostics

All four are in Cohort Diagnostics are in version 3 results data model here

@gowthamrao
Copy link
Member Author

as.integer(intToBits(5))

[1] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

as.integer(intToBits(7))

[1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

@gowthamrao
Copy link
Member Author

gowthamrao commented Apr 4, 2024

This function would allow you to get cohort attrition view

getCohortAttritionViewResults <- function(inclusionResultTable,
                                          maxRuleId) {
  numberToBitString <- function(numbers) {
    vapply(numbers, function(number) {
      if (number == 0) {
        return("0")
      }
      
      bitString <- character()
      while (number > 0) {
        bitString <- c(as.character(number %% 2), bitString)
        number <- number %/% 2
      }
      
      paste(bitString, collapse = "")
    }, character(1))
  }
  
  # problem - how to create attrition view
  bitsToMask <- function(bits) {
    positions <- seq_along(bits) - 1
    number <- sum(bits * 2 ^ positions)
    return(number)
  }
  
  ruleToMask <- function(ruleId) {
    bits <- rep(1, ruleId)
    mask <- bitsToMask(bits)
    return(mask)
  }
  
  inclusionResultTable <- inclusionResultTable |>
    dplyr::mutate(inclusionRuleMaskBitString = numberToBitString(inclusionRuleMask))
  
  output <- c()
  
  for (i in (1:maxRuleId)) {
    suffixString <- numberToBitString(ruleToMask(i))
    output[[i]] <- inclusionResultTable |>
      dplyr::filter(endsWith(x = inclusionRuleMaskBitString,
                             suffix = suffixString)) |>
      dplyr::group_by(cohortDefinitionId,
                      modeId) |>
      dplyr::summarise(personCount = sum(personCount), .groups = "drop") |>
      dplyr::ungroup() |>
      dplyr::mutate(id = i)
  }
  
  output <- dplyr::bind_rows(output)
  
  return(output)
}

@gowthamrao
Copy link
Member Author

gowthamrao commented Apr 4, 2024

@chrisknoll and I worked on this problem for many hours today. Key learning is how to handle large numbers. We used the same strategy that is currently used in webapi to process the inclusionResultTable for processing inclusionRuleMask, i.e. use string and string match, instead of bit match.

A simple way to solve it would be the code below, but it fails in base R when the value goes beyond integer range because the used functions only support integer range. This is relevant when we have a lot of inclusion rules e.g. more than 32

ruleToMask <- function(ruleId) {
  bits <- rep(1, ruleId)
  
  bitsToMask <- function(bits) {
    positions <- seq_along(bits) - 1
    number <- sum(bits * 2 ^ positions)
    return(number)
  }
  
  mask <- bitsToMask(bits)
  return(mask)
}

a <- dplyr::tibble(inclusionRuleMask = c(15, 11, 7, 1),
                   personCount = c(20, 20, 20, 20))

ruleId <- 3
maskId <- ruleToMask(ruleId = 3)
a |>
  dplyr::filter(bitwAnd(inclusionRuleMask, maskId) == maskId) |>
  dplyr::summarise(personCount = sum(personCount))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants