Conditional layer #1448

mtamburrano · 2014-11-18T17:41:21Z

This Layer, developed by me and @bhack , adds the possibility to select a subset of the batch whose items pass a conditional test. It includes test cases and doxygen in common_layers.hpp.

The layer consists of:

3 bottom blobs:

bottom[0]: this is the "bottom_IF". For each item, an argmax is computed
and the index of the max value is compared with the layer parameter
conditional_index. If index_argmax == conditional_index, the index
of the item in the blob is stored in a vector, let's call this vector
indices_to_forward.
bottom[1]: this is the "bottom_THEN". Only the items whose indices are contained
by the indices_to_forward vector will be forwarded.
bottom[2]: this is the bottom which contains the labels. If the layer parameter
output_type is equal to FILTERED_LABELS, then only the labels whose indices
are contained by the indices_to_forward vector will be forwarded,
otherwise this bottom is ignored.

2 top blobs:

top[0]: If the layer parameter output_type == FILTERED_LABELS, then this blob
will contains the filtered labels as explained above. If output_type == FILTERED_INDICES
this blob will contains the indices of items that passed the conditional tests
(the same indices contained in the indices_to_forward vector), especially
useful for predicting.
top[1]: This blob will contains the filtered items from bottom[1]

Params:

int conditional_index:
The index which will be compared with the argmax index of each item in the bottom[0]
int output_type:
two different output types for top[0]. FILTERED_LABELS, useful for training/testing,
and FILTERED_INDICES, useful for predicting.
float threshold_value (optional):
If threshold_value is set, the conditional test is considered passed only if argmax_value >= threshold_value

Example:
Let's say we want a net that distinguishes between "cat" and "NOT-cat", and tries to recognize
the breed.
If the net recognizes an input as "NOT-cat", is useless to perform the remaining forward passes that try
to discriminate among the cat breeds.
So we could have something like this:

                  INPUT_DATA
                      |
                  CONV_LAYER1
                      |
                    POOL1
                    /    \
                  /        \
          INNER_PRODUCT1    CONV_LAYER2
          /                           |
      SOFTMAX1                      POOL2
  //outputs 0 or 1 if                 |
  // is "NON-cat" or "cat"        INNER_PRODUCT2
                                      |
                                    SOFTMAX2
                              //outputs the cat's breed

In this net, no matter what SOFTMAX1 produces, all the data input will be always forwarded
to the other branch of the net. So with conditional layer we could build the following net:

       INPUT_DATA
          |
      CONV_LAYER1
          |
        POOL1
          |
      INNER_PRODUCT1
          |
        SOFTMAX1 
//outputs 0 or 1 if is "NON-cat" or "cat"
          |
      CONDITIONAL:
      {
        bottom[0]: SOFTMAX1
        bottom[1]: POOL1
        bottom[2]: INPUT_DATA_labels
        top[0]: FILTERED_INPUT_DATA_labels
        top[1]: CONDITIONAL_POOL1
        conditional_parameters
        {
          conditional_index: 1
          output_type: FILTERED_LABELS
        }
      }
          |
      CONV_LAYER2
          |
      INNER_PRODUCT2
          |
        POOL2
          |
        SOFTMAX2 //outputs the cat's breed

Now, when the net reaches the conditional layer, the output of SOFTMAX1 is evaluated
to find those items that have SOFTMAX[1] >= SOFTMAX[0] (because conditional_index == 1. If
SOFTMAX[1] > SOFTMAX[0] we can presume that the item is a cat). Then we store the indices
of these items and forward from POOL1 to CONV_LAYER2 only the items with those indices
(or else, the cats!).

In the train phase this allows to train only a part of the input data in a specific branch
of the net, maintaining the possibility to have shared layers. In the predict phase a conditional layer
could save a lot of time: in the previous example probably the most computional expensive part
of the net would be the one that discriminates among the breeds, so all the forward passes for
the NON-cats items are spared.
For predicting is useful to use output_type with FILTERED_INDICES, so you can distinguish which of
the input data are been forwarded until the end of the net and which stopped before.

In addition, the conditional_layer can be used as logic gates. For example:

       GENERIC_LAYER1
      /             \
CONDITIONAL_L1    CONDITIONAL_L2
      \             /
       GENERIC_LAYER2

is equivalent to an OR, on the contrary:

   GENERIC_LAYER1
          |
    CONDITIONAL_L1
          |
    CONDITIONAL_L2
          |
   GENERIC_LAYER2

is equivalent to an AND.

NOTES:

If used for predicting, the conditional layer works fine, instead if used in the train/test phase, actually can't works in DEV. This is because each LOSS Layer has the following check:

if (propagate_down[1]) {
LOG(FATAL) << this->type_name()
           << " Layer cannot backpropagate to label inputs.";
}

into the backward function. The problem is that my layer needs to backpropagate Top[1] but not Top[0] (that contains labels) but actually seems this is not possible because a layer needs or not needs backpropagation on ALL its blobs. To test my layer I commented that check in the LOSS layer and everything seems to work fine, the net converges normally. So how can I fix this? It is possible to remove that check in the LOSS layers? I presume it is there just to be sure someone has not made up the net wrong, attaching a non-label blob to a loss layer, right?

EDIT:

I've modified a little bit how the conditional layer works when receives a batch which none of its items pass the conditional test. In this case, before the last commit, because we can't have a Blob with num == 0, the conditional layers allowed the forward pass to the conditional branch for all the items of the batch (in practice, it was as if the conditional layer did not exist). Although this happens very rarely, I just applied a patch to disable forward and backward when none of the items pass the conditional test.

Now when this occurs, the conditional_layer effectively reshapes the conditional Top Blob with num == 0, but in the net.cpp in the ForwardFromTo loop there is a check named ForwardIsAllowed, that checks if some of the bottom blobs of the Layer[i] has num == 0 and if so, reshape a top of that layer with num = 0 and doesn't forward anything.
In this way the forward passes for all the layers on this branch of the net are blocked, because each one of these layers will have at least a bottom blob with num == 0.
The same thing is applied on the the BackwardFromTo function, checking this time the top blobs of the layers looking for those with num == 0 and therefore denying the backward pass they don't need.

In this way to block all the forward passes in a specific branch of the net, is just enough to reshape a blob with num == 0 to be sure nothing from there will be forwarded without breaking the remaining net.
I found this as the cleaner way to block unwanted subsequent forward/backward calls without big changes in the net/layers structure, but if you have a better idea, suggestions are welcome :)

Conflicts: include/caffe/common_layers.hpp src/caffe/layers/conditional_layer.cpp src/caffe/proto/caffe.proto

Conflicts: include/caffe/common_layers.hpp

mtamburrano · 2014-11-19T19:15:39Z

I've modified a little bit how the conditional layer works when receives a batch which none of its items pass the conditional test. In this case, before the last commit, because we can't have a Blob with num == 0, the conditional layers allowed the forward pass to the conditional branch for all the items of the batch (in practice, it was as if the conditional layer did not exist). Although this happens very rarely, I just applied a patch to disable forward and backward when none of the items pass the conditional test.

Now when this occurs, the conditional_layer effectively reshapes the conditional Top Blob with num == 0, but in the net.cpp in the ForwardFromTo loop there is a check named ForwardIsAllowed, that checks if some of the bottom blobs of the Layer[i] has num == 0 and if so, reshape a top of that layer with num = 0 and doesn't forward anything.
In this way the forward passes for all the layers on this branch of the net are blocked, because each one of these layers will have at least a bottom blob with num == 0.
The same thing is applied on the the BackwardFromTo function, checking this time the top blobs of the layers looking for those with *num == 0" and therefore denying the backward pass they don't need.

In this way to block all the forward passes in a specific branch of the net, is just enough to reshape a blob with num == 0 to be sure nothing from there will be forwarded without breaking the remaining net.
I found this as the cleaner way to block unwanted subsequent forward/backward calls without big changes in the net/layers structure, but if you have a better idea, suggestions are welcome :)

mtamburrano · 2014-11-20T18:44:05Z

added an optional threshold_value param. Now if threshold_value is set, the conditional test is considered passed only if the argmax_value>= threshold_value (as addition to argmax_index == conditional_index)

sguada · 2014-11-20T19:58:56Z

@mtamburrano I think this layer is getting very convoluted. A Filter layer could be useful, but it should be simpler, a bottom blob containing the data to be filtered and a bottom blob with 0,1 for skipping or including the data in the top blob.
How the selector is computed should be outside this layer, so if the condition is that the soft-max is equal to the label, or if it pass some threshold that will be done by other layers.

sguada · 2014-11-20T20:00:20Z

There is one problem I can foresee, about setting the dimensions of the top blob, since it will depend on how many data points are selected.

bhack · 2014-11-20T20:37:48Z

@sguada Ok could make sense to create a simpler filter layer and let users to contribute different conditional layers. For your second comment what could be the problem when we dynamically resize the blob (we already do it)?

sguada · 2014-11-20T20:53:05Z

@bhack What I tried to mean is that there will be some extra cases to consider within Resize, like what should be the shape the first time the layer is created, when selector values have an undefined value or zeros, should it be (0,channels,height,width) or (num,channels,height,width).

sguada · 2014-11-20T20:56:20Z

Also, I'm not sure if we should allow to filter more than one bottom at the time or not?
Or if we should allow the filter to output two top blobs one with those that fail the condition and another top blob with those that pass the condition.

mtamburrano · 2014-11-20T21:25:44Z

Hi Sguada,
I agree that would be better to allow different selectors, but I'm wrong or actually they don't exists? I mean, let's say to build a filter layer like you suggested and then we want to make a net like the one I described on my first comment (cats/NotCats and it's bride)... how would you provide to the filter layer a blob with 0,1 indexing the refused/accepted items? A selector layer should be created, right?

About the issues you mentioned with Reshape, I think all the case you 've listed are provided:
if all the items pass the conditional test, the new size would be (num,channels,height,width), at the contrary if all the items failed the test, the new top shape would be (0,channels,height,width). With the two little changes in net.cpp, now if a blob has num == 0, will not be forwarded and all subsequent layer will skip forward and backward, too.
I can't think now to other special cases, I hope I are not missing anything.

At last, your last suggestion would be easy to implement, if you think this would be useful I'll apply the needed changes

sguada · 2014-11-20T23:08:35Z

@mtamburrano, for each example how to build the selector will be different, and that's what we want. We want to isolate responsibilities, this layer assumes somebody else have already computed a selector and then do it.

For you specific case or "cats vs no-cats" it is not clear what you want to do during training and during testing. For instance during training, what I would do is use binary classifier, for example you can use a inner_product with one out, and the use sigmoid_cross_entropy with binary labels, then you can use a sigmoid layer followed by a threshold layer (threshold 0.5) to get the selector.
Then you use the selector to filter the data and the labels, and keep processing.

It's not clear if you want to learn to distinguish "cats vs no-cats" at the same time that you learn to distinguish different breeds of cats.

Maybe there are other layers needed, like "and" "or", "not", ... or "equal"

mtamburrano · 2014-11-20T23:54:39Z

@sguada, I get the point, I still think that a simple selector like the one actually inside conditional_layer (a built-in argmax and threshold check) could be useful and flexible enough to simplify a good amount of different nets... Maybe can I leave the conditional_index as an optional parameter and at the same time to accept a "0,1" blob to allow anyone to build his own selector?
By the way if you think that so made would be too intricate, I'll transform the layer from "conditonal" to "filter".

Instead, what do you think about the way a reshape with num == 0 is handled by the net? I hope the functions "ForwardIsAllowed" and "BackwardIsAllowed" are good enough

sguada · 2014-11-21T00:14:22Z

@mtamburrano I think having your "simple" selector will be difficult to maintain and too specific, but you are welcome to create your own layers in your own branch.

Agreed about changing the name and focus of the layer to "filter" layer. It has more clear definition and implementation.

We are working in adding modules that can encapsulate multiple layers, like argmax, thresholds, equality, ... into one module and reuse it. See #1169

I'm still not sure about resize when num==0, it is not that simple one blob can got to multiple layers, and one layer can read from multiple blobs, therefore it is not easy for the net to decide when it needs to compute forward or backward. Maybe it should be up to the layer to avoid computations when bottom.num() == 0.

mtamburrano · 2014-11-21T00:38:10Z

@sguada, I would have preferred to delegate to layers the task of choose if to compute backward or forward, but that would mean to add for each layer a check when num == 0, and I didn't think this would be accepted ;)
It is true that a blob can go to multiple layers and a layer can accept multiple blobs, but if I understand correctly the actual code, a num == 0 would cause almost all layers to crash, for one reason or another. So I think that not forwarding anything with a blob with num == 0 and reshaping with num = 0 all subsequent layer is actually safe... As I said I tested it with a pretty complex net and it converges fine...

sguada · 2014-11-21T00:45:40Z

@mtamburrano let's see what others @shelhamer @longjon @jeffdonahue @Yangqing opinion is on this. I'm not sure what will be the best way to handle this case.

bhack · 2014-11-21T07:18:55Z

@sguada Is you last comment for another issue/PR?
If I understand correctly core team opinion is to isolate a simpler filter layer that can handle a binary masking bottom blob and split num in two top. Then conditional layers logics will produce the binary masking top blob.

Probably there will be checks and other things that prevents to handle num=0 blobs in the layers so I think that a filtering forward and backward check is still needed when num=0 in net.cpp.

When we open a PR we generally want to invest work (and not spare time) hours to let a feature integrate so we are more interested to find a way to integrate the feature that a solution where we need to maintain our own branch.

bhack · 2014-11-22T11:58:44Z

In future there is also some interesting impact in Graph flow parallelization this a feature detector example.
See also #876

sguada · 2014-11-22T16:53:41Z

@bhack my comment was to bring attention to this PR from other core members regarding how to handle the num==0 case.

Regarding the layer itself, I think having a layer that can filter (or route) the data will be a welcome addition. Isolating the layer from the logic of how to generate the filter or selector from the layer itself will improve its usability.

Given a bottom data and a bottom selector (containing 0s and 1s) will copy the corresponding data to the top[0] or top[1]. But I will suggest to generalize a bit the layer, allowing to have more tops and the selector have a vector values s={-1, 0, .. n} which would cause to copy data to the corresponding top[s] if s in [0,n-1] and skipping the data if s=-1 or s=n (that is if s falls within the range of tops it copies data there otherwise it skips the data).

@bhack @mtamburrano we appreciate PRs that are interesting and have a broad applicability, but some people prefer to modify their own copy of Caffe.

I'm developing a merging layer that takes 2 or more bottoms and produce 1 top by choosing which one to copy based on a selector.

bhack · 2014-11-22T17:11:35Z

Ok you've selected the wrong nick suggestion with mohomran. We are on hold so that other core members could give an opinion on how to go ahead. We want to follow a direction that let this work merged.

longjon · 2014-11-22T20:16:21Z

I'll try to have a look at this on Sunday.

bhack · 2014-11-24T09:45:33Z

There was also some idea from the 90's that could be revived in a deep 10's fashion.

longjon · 2014-11-25T09:44:26Z

Okay, I think I mostly agree with what @sguada has said here.

This PR does or attempts a bunch of different things, which I'll attempt to list:

Compute an argmax. We already have a layer for that, so unless there's a compelling performance reason to embed it in this layer, better to stick with modularity.
Filter. This a simple, well-defined function with obvious use cases, so we'd like to have a layer that does this. There seem to be two different variations which have been mentioned:
1. Treat bottom[0] as boolean, and filter bottom[1] along the batch dimension. This has the advantages that
  1. argmax input by itself has a reasonable meaning when casted to boolean
  2. this could be extended to additional bottom blobs with corresponding additional top blobs
2. Treat bottom[0] as an index, and deal out bottom[1] to the corresponding tops (but, maybe optionally, discard index -1). This has the advantage that
  1. there is an obvious use case for this where one selects among many processing streams
  but the disadvantage that
  1. in order to discard certain output, one has to ignore a top, or rely on some magic number (-1), and all these magic numbers are making me uneasy.
Either of these seem like reasonable layers to have.
Filter labels. There's no need for this to be explicit, just use an extra filter layer with the same selection as input, or use the implementation 2(i)(b) above.
Compute indices where some condition holds (the FILTERED_INDICES option). I don't get the use case for this, but this could be implemented by an optional extra top blob (the way, e.g., max pooling works, but note that this is incompatible with 2(i)(b)), or it could be another layer.
Allow layers to produce zero-size blobs. Blobs with zero dimensions are currently disallowed, but there are cases (e.g., this one) where zero dimensions make sense. One would have to make sure existing layers behave reasonably with zero dimensions where that should be valid.
Prevent forward propagation of zero-size blobs. It's not clear that this is necessary, since forward computation with empty batches should be trivial.
Acknowledge that computation on labels should be allowed, while currently it throws a superfluous error.

At a first pass, it seems like: 1 and 3 are not needed, 2 and 7 are compelling, 4 is a reasonable thing to compute but I don't see the use case yet, 5 will probably need to happen to make 2 possible, and 6 is probably unnecessary given a proper implementation of 5.

So, I hope you see why it's difficult to review and agree to all of these changes at once! My suggestion, to reinforce what @sguada was saying, is to split up this PR into smaller ones, each of which is the smallest diff needed to do one clearly useful thing. Review effort is superlinear in PR size, so smaller PRs are more efficient for all of us. If you don't know them already, git rebase -i and git cherry-pick are helpful tools for this. (You can also make PRs that depend on each other rather than waiting; just note it in the summary, and be prepared to rebase.)

Concretely, I suggest: PR item 5 (perhaps just allowing zero-sized batches, and checking that existing layers are fine with that), and (separately) item 2, a filter layer, depending on it. Meanwhile PR or make an issue of item 7. If there's a use case for item 4 that I'm not seeing, add that in another PR.

sguada · 2014-11-25T10:06:38Z

@longjon Thanks for your thoughtful comment.

Maybe we should keep filter_layer with the common logic of using a boolean selector that indicate which elements have to be kept. So lets do 2(i). We could add later a different layer for routing that can take index selector.

One thing that it wasn't clear to me is that for item 5, are you proposing that in case some of the current layers don't work with zero-sized batches they should be changed to handle that case?

bhack · 2014-11-25T11:20:12Z

Just a reference to pylearn CompositeLayer with Composite space if we want to get some other routing idea.

mtamburrano · 2014-11-25T11:54:08Z

thank you long @longjon for your review.
About 4, I don't think we'll need anymore, because the selector could work exactly like the FILTERED_INDICES case. An use case could be in the predict phase: let's say we feed the net with a bunch of images of cats and non-cats... the filter layer doesn't allow the non-cats to reach the longest branch, so at the end we need to know which input indexes were filtered to correctly understand the output of the net. But like I said, with a proper selector that could be splitted with one split that ends in the filter layer and with the other left "unplugged", the result is the same.

I'll open a PR for 2(i) and for 5, but about 5 I still have some doubt (pretty the same as @sguada).
My actual solution was to slightly change how the forwardFromTo and backwardFromTo functions work, looking for zero-sized batches and consecutively disabling further forward and backward passes.
I see as advantage that in this way we don't need to worry about layers implementation for correctly treating "0-sized batches" special case, because the net itself deals with them.
The disadvantage could be that any layer with a blob with num == 0 can't propagate anything. Actually I don't see it as an issue because blobs with zero dimension are disallowed and I don't see any use other than disabling computation...

Regarding 7, with the changes we are going to make, this could be or could not be an issue.
If we allow the filter_layer to output more tops that end in a loss layer, and among these tops there is a label top, an error will be thrown.
If instead when we build the net, we make 2 filter_layers, one for normal computation and another for labels, the loss_layer should work fine because it will not try to backpropagate on labels.
By the way for ease of use I still think we should remove that superflous check in the loss layers, so I think I'll open a PR also for that

mtamburrano · 2014-11-26T18:59:38Z

@sguada, @longjon, me and @bhack just opened 3 new PR to meet 7(#1483), 5(#1484) and 2(i) (#1482)

longjon · 2014-11-26T20:58:50Z

@mtamburrano okay, thanks, will look at these as I have time over the next days.

longjon · 2014-12-01T06:30:41Z

Closing this as it seems to have successfully evolved into other PRs.

mtamburrano and others added 6 commits November 17, 2014 14:59

implemented conditional layer

294e857

Conflicts: include/caffe/common_layers.hpp src/caffe/layers/conditional_layer.cpp src/caffe/proto/caffe.proto

added conditional_layer.cu

71ceee7

caffe_copy in conditional

f4d313a

restructured convolutional layer

e0a9bfe

Conflicts: include/caffe/common_layers.hpp

implemented backpropagation on conditional layer

c74ea09

Added tests, doxygen and various fixes

d8c4482

bhack mentioned this pull request Nov 18, 2014

Skip unneeded forward computation during learning? #1360

Closed

ForwardIsAllowed and BackwardIsAllowed features

50b305c

mtamburrano force-pushed the conditional_layer branch from a45db40 to 50b305c Compare November 19, 2014 19:48

added threshold_value param

0da7a32

This was referenced Nov 26, 2014

Filter layer #1482

Closed

Removed propagate labels check in loss layers #1483

Closed

net.cpp now allows zero-sized batches #1484

Closed

longjon closed this Dec 1, 2014

This was referenced Mar 6, 2015

Removed propagate labels check in loss layers #2052

Closed

net.cpp now allows zero-sized batches #2053

Closed

Filter layer rebased #2054

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditional layer #1448

Conditional layer #1448

mtamburrano commented Nov 18, 2014

mtamburrano commented Nov 19, 2014

mtamburrano commented Nov 20, 2014

sguada commented Nov 20, 2014

sguada commented Nov 20, 2014

bhack commented Nov 20, 2014

sguada commented Nov 20, 2014

sguada commented Nov 20, 2014

mtamburrano commented Nov 20, 2014

sguada commented Nov 20, 2014

mtamburrano commented Nov 20, 2014

sguada commented Nov 21, 2014

mtamburrano commented Nov 21, 2014

sguada commented Nov 21, 2014

bhack commented Nov 21, 2014

bhack commented Nov 22, 2014

sguada commented Nov 22, 2014

bhack commented Nov 22, 2014

longjon commented Nov 22, 2014

bhack commented Nov 24, 2014

longjon commented Nov 25, 2014

sguada commented Nov 25, 2014

bhack commented Nov 25, 2014

mtamburrano commented Nov 25, 2014

mtamburrano commented Nov 26, 2014

longjon commented Nov 26, 2014

longjon commented Dec 1, 2014

Conditional layer #1448

Conditional layer #1448

Conversation

mtamburrano commented Nov 18, 2014

NOTES:

EDIT:

mtamburrano commented Nov 19, 2014

mtamburrano commented Nov 20, 2014

sguada commented Nov 20, 2014

sguada commented Nov 20, 2014

bhack commented Nov 20, 2014

sguada commented Nov 20, 2014

sguada commented Nov 20, 2014

mtamburrano commented Nov 20, 2014

sguada commented Nov 20, 2014

mtamburrano commented Nov 20, 2014

sguada commented Nov 21, 2014

mtamburrano commented Nov 21, 2014

sguada commented Nov 21, 2014

bhack commented Nov 21, 2014

bhack commented Nov 22, 2014

sguada commented Nov 22, 2014

bhack commented Nov 22, 2014

longjon commented Nov 22, 2014

bhack commented Nov 24, 2014

longjon commented Nov 25, 2014

sguada commented Nov 25, 2014

bhack commented Nov 25, 2014

mtamburrano commented Nov 25, 2014

mtamburrano commented Nov 26, 2014

longjon commented Nov 26, 2014

longjon commented Dec 1, 2014