-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional layer #1448
Conditional layer #1448
Conversation
Conflicts: include/caffe/common_layers.hpp src/caffe/layers/conditional_layer.cpp src/caffe/proto/caffe.proto
Conflicts: include/caffe/common_layers.hpp
I've modified a little bit how the conditional layer works when receives a batch which none of its items pass the conditional test. In this case, before the last commit, because we can't have a Blob with num == 0, the conditional layers allowed the forward pass to the conditional branch for all the items of the batch (in practice, it was as if the conditional layer did not exist). Although this happens very rarely, I just applied a patch to disable forward and backward when none of the items pass the conditional test. Now when this occurs, the conditional_layer effectively reshapes the conditional Top Blob with num == 0, but in the net.cpp in the ForwardFromTo loop there is a check named ForwardIsAllowed, that checks if some of the bottom blobs of the Layer[i] has num == 0 and if so, reshape a top of that layer with num = 0 and doesn't forward anything. In this way to block all the forward passes in a specific branch of the net, is just enough to reshape a blob with num == 0 to be sure nothing from there will be forwarded without breaking the remaining net. |
a45db40
to
50b305c
Compare
added an optional threshold_value param. Now if threshold_value is set, the conditional test is considered passed only if the argmax_value>= threshold_value (as addition to argmax_index == conditional_index) |
@mtamburrano I think this layer is getting very convoluted. A Filter layer could be useful, but it should be simpler, a bottom blob containing the data to be filtered and a bottom blob with 0,1 for skipping or including the data in the top blob. |
There is one problem I can foresee, about setting the dimensions of the top blob, since it will depend on how many data points are selected. |
@sguada Ok could make sense to create a simpler filter layer and let users to contribute different conditional layers. For your second comment what could be the problem when we dynamically resize the blob (we already do it)? |
@bhack What I tried to mean is that there will be some extra cases to consider within Resize, like what should be the shape the first time the layer is created, when selector values have an undefined value or zeros, should it be (0,channels,height,width) or (num,channels,height,width). |
Also, I'm not sure if we should allow to filter more than one bottom at the time or not? |
Hi Sguada, About the issues you mentioned with Reshape, I think all the case you 've listed are provided: At last, your last suggestion would be easy to implement, if you think this would be useful I'll apply the needed changes |
@mtamburrano, for each example how to build the selector will be different, and that's what we want. We want to isolate responsibilities, this layer assumes somebody else have already computed a selector and then do it. For you specific case or "cats vs no-cats" it is not clear what you want to do during training and during testing. For instance during training, what I would do is use binary classifier, for example you can use a inner_product with one out, and the use sigmoid_cross_entropy with binary labels, then you can use a sigmoid layer followed by a threshold layer (threshold 0.5) to get the selector. It's not clear if you want to learn to distinguish "cats vs no-cats" at the same time that you learn to distinguish different breeds of cats. Maybe there are other layers needed, like "and" "or", "not", ... or "equal" |
@sguada, I get the point, I still think that a simple selector like the one actually inside conditional_layer (a built-in argmax and threshold check) could be useful and flexible enough to simplify a good amount of different nets... Maybe can I leave the conditional_index as an optional parameter and at the same time to accept a "0,1" blob to allow anyone to build his own selector? Instead, what do you think about the way a reshape with num == 0 is handled by the net? I hope the functions "ForwardIsAllowed" and "BackwardIsAllowed" are good enough |
@mtamburrano I think having your "simple" selector will be difficult to maintain and too specific, but you are welcome to create your own layers in your own branch. Agreed about changing the name and focus of the layer to "filter" layer. It has more clear definition and implementation. We are working in adding modules that can encapsulate multiple layers, like argmax, thresholds, equality, ... into one module and reuse it. See #1169 I'm still not sure about resize when num==0, it is not that simple one blob can got to multiple layers, and one layer can read from multiple blobs, therefore it is not easy for the net to decide when it needs to compute forward or backward. Maybe it should be up to the layer to avoid computations when bottom.num() == 0. |
@sguada, I would have preferred to delegate to layers the task of choose if to compute backward or forward, but that would mean to add for each layer a check when num == 0, and I didn't think this would be accepted ;) |
@mtamburrano let's see what others @shelhamer @longjon @jeffdonahue @Yangqing opinion is on this. I'm not sure what will be the best way to handle this case. |
@sguada Is you last comment for another issue/PR? Probably there will be checks and other things that prevents to handle num=0 blobs in the layers so I think that a filtering forward and backward check is still needed when num=0 in net.cpp. When we open a PR we generally want to invest work (and not spare time) hours to let a feature integrate so we are more interested to find a way to integrate the feature that a solution where we need to maintain our own branch. |
In future there is also some interesting impact in Graph flow parallelization this a feature detector example. |
@bhack my comment was to bring attention to this PR from other core members regarding how to handle the num==0 case. Regarding the layer itself, I think having a layer that can filter (or route) the data will be a welcome addition. Isolating the layer from the logic of how to generate the filter or selector from the layer itself will improve its usability. Given a bottom data and a bottom selector (containing 0s and 1s) will copy the corresponding data to the top[0] or top[1]. But I will suggest to generalize a bit the layer, allowing to have more tops and the selector have a vector values s={-1, 0, .. n} which would cause to copy data to the corresponding top[s] if s in [0,n-1] and skipping the data if s=-1 or s=n (that is if s falls within the range of tops it copies data there otherwise it skips the data). @bhack @mtamburrano we appreciate PRs that are interesting and have a broad applicability, but some people prefer to modify their own copy of Caffe. I'm developing a merging layer that takes 2 or more bottoms and produce 1 top by choosing which one to copy based on a selector. |
Ok you've selected the wrong nick suggestion with mohomran. We are on hold so that other core members could give an opinion on how to go ahead. We want to follow a direction that let this work merged. |
I'll try to have a look at this on Sunday. |
There was also some idea from the 90's that could be revived in a deep 10's fashion. |
Okay, I think I mostly agree with what @sguada has said here. This PR does or attempts a bunch of different things, which I'll attempt to list:
At a first pass, it seems like: 1 and 3 are not needed, 2 and 7 are compelling, 4 is a reasonable thing to compute but I don't see the use case yet, 5 will probably need to happen to make 2 possible, and 6 is probably unnecessary given a proper implementation of 5. So, I hope you see why it's difficult to review and agree to all of these changes at once! My suggestion, to reinforce what @sguada was saying, is to split up this PR into smaller ones, each of which is the smallest diff needed to do one clearly useful thing. Review effort is superlinear in PR size, so smaller PRs are more efficient for all of us. If you don't know them already, Concretely, I suggest: PR item 5 (perhaps just allowing zero-sized batches, and checking that existing layers are fine with that), and (separately) item 2, a filter layer, depending on it. Meanwhile PR or make an issue of item 7. If there's a use case for item 4 that I'm not seeing, add that in another PR. |
@longjon Thanks for your thoughtful comment. Maybe we should keep filter_layer with the common logic of using a boolean selector that indicate which elements have to be kept. So lets do 2(i). We could add later a different layer for routing that can take index selector. One thing that it wasn't clear to me is that for item 5, are you proposing that in case some of the current layers don't work with zero-sized batches they should be changed to handle that case? |
Just a reference to pylearn CompositeLayer with Composite space if we want to get some other routing idea. |
thank you long @longjon for your review. I'll open a PR for 2(i) and for 5, but about 5 I still have some doubt (pretty the same as @sguada). Regarding 7, with the changes we are going to make, this could be or could not be an issue. |
@mtamburrano okay, thanks, will look at these as I have time over the next days. |
Closing this as it seems to have successfully evolved into other PRs. |
This Layer, developed by me and @bhack , adds the possibility to select a subset of the batch whose items pass a conditional test. It includes test cases and doxygen in common_layers.hpp.
The layer consists of:
3 bottom blobs:
and the index of the max value is compared with the layer parameter
conditional_index. If index_argmax == conditional_index, the index
of the item in the blob is stored in a vector, let's call this vector
indices_to_forward.
by the indices_to_forward vector will be forwarded.
output_type is equal to FILTERED_LABELS, then only the labels whose indices
are contained by the indices_to_forward vector will be forwarded,
otherwise this bottom is ignored.
2 top blobs:
will contains the filtered labels as explained above. If output_type == FILTERED_INDICES
this blob will contains the indices of items that passed the conditional tests
(the same indices contained in the indices_to_forward vector), especially
useful for predicting.
Params:
The index which will be compared with the argmax index of each item in the bottom[0]
two different output types for top[0]. FILTERED_LABELS, useful for training/testing,
and FILTERED_INDICES, useful for predicting.
If threshold_value is set, the conditional test is considered passed only if argmax_value >= threshold_value
Example:
Let's say we want a net that distinguishes between "cat" and "NOT-cat", and tries to recognize
the breed.
If the net recognizes an input as "NOT-cat", is useless to perform the remaining forward passes that try
to discriminate among the cat breeds.
So we could have something like this:
In this net, no matter what SOFTMAX1 produces, all the data input will be always forwarded
to the other branch of the net. So with conditional layer we could build the following net:
Now, when the net reaches the conditional layer, the output of SOFTMAX1 is evaluated
to find those items that have SOFTMAX[1] >= SOFTMAX[0] (because conditional_index == 1. If
SOFTMAX[1] > SOFTMAX[0] we can presume that the item is a cat). Then we store the indices
of these items and forward from POOL1 to CONV_LAYER2 only the items with those indices
(or else, the cats!).
In the train phase this allows to train only a part of the input data in a specific branch
of the net, maintaining the possibility to have shared layers. In the predict phase a conditional layer
could save a lot of time: in the previous example probably the most computional expensive part
of the net would be the one that discriminates among the breeds, so all the forward passes for
the NON-cats items are spared.
For predicting is useful to use output_type with FILTERED_INDICES, so you can distinguish which of
the input data are been forwarded until the end of the net and which stopped before.
In addition, the conditional_layer can be used as logic gates. For example:
is equivalent to an OR, on the contrary:
is equivalent to an AND.
NOTES:
If used for predicting, the conditional layer works fine, instead if used in the train/test phase, actually can't works in DEV. This is because each LOSS Layer has the following check:
into the backward function. The problem is that my layer needs to backpropagate Top[1] but not Top[0] (that contains labels) but actually seems this is not possible because a layer needs or not needs backpropagation on ALL its blobs. To test my layer I commented that check in the LOSS layer and everything seems to work fine, the net converges normally. So how can I fix this? It is possible to remove that check in the LOSS layers? I presume it is there just to be sure someone has not made up the net wrong, attaching a non-label blob to a loss layer, right?
EDIT:
I've modified a little bit how the conditional layer works when receives a batch which none of its items pass the conditional test. In this case, before the last commit, because we can't have a Blob with num == 0, the conditional layers allowed the forward pass to the conditional branch for all the items of the batch (in practice, it was as if the conditional layer did not exist). Although this happens very rarely, I just applied a patch to disable forward and backward when none of the items pass the conditional test.
Now when this occurs, the conditional_layer effectively reshapes the conditional Top Blob with num == 0, but in the net.cpp in the ForwardFromTo loop there is a check named ForwardIsAllowed, that checks if some of the bottom blobs of the Layer[i] has num == 0 and if so, reshape a top of that layer with num = 0 and doesn't forward anything.
In this way the forward passes for all the layers on this branch of the net are blocked, because each one of these layers will have at least a bottom blob with num == 0.
The same thing is applied on the the BackwardFromTo function, checking this time the top blobs of the layers looking for those with num == 0 and therefore denying the backward pass they don't need.
In this way to block all the forward passes in a specific branch of the net, is just enough to reshape a blob with num == 0 to be sure nothing from there will be forwarded without breaking the remaining net.
I found this as the cleaner way to block unwanted subsequent forward/backward calls without big changes in the net/layers structure, but if you have a better idea, suggestions are welcome :)