Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support tensorflow debugger #214

Open
theedge456 opened this issue Sep 5, 2018 · 7 comments
Open

Support tensorflow debugger #214

theedge456 opened this issue Sep 5, 2018 · 7 comments

Comments

@theedge456
Copy link

To debug my model, I thought I could connect my program to tensorboard to decipher the cryptic msg:

TensorFlowException TF_INVALID_ARGUMENT "In[0] is not a matrix\n\t [[Node: MatMul_70 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device=\"/job:localhost/replica:0/task:0/device:CPU:0\"](Const_41, Mean_69)]]"

I could not find the equivalent to the python function:

tf_debug.TensorBoardDebugWrapperSession("machine:7000")

is it implemented ? If not, is it in the pipeline ?
Fabien

@fkm3
Copy link
Contributor

fkm3 commented Sep 5, 2018

There isn't any support for the tensorflow debugger right now. I'm not sure what work is required to support it.

A short-term workaround might be to use asGraphDef to get the graph as a proto, then write it to a file and load it into tensorboard so that you can more easily inspect the graph to figure out what part of your code that MatMul is coming from.

For the cryptic error messages: We should prioritize #24 so that these look like nice compiler errors that point to the line of code causing an issue.

@fkm3 fkm3 changed the title connection to tensorboard Support tensorflow debugger Sep 5, 2018
@fkm3
Copy link
Contributor

fkm3 commented Sep 5, 2018

Actually, instead of asGraphDef, you can use logGraph to write to a tensorboard log file directly:
https://tensorflow.github.io/haskell/haddock/tensorflow-logging-0.2.0.0/TensorFlow-Logging.html#v:logGraph
Just make sure to do that before you try to build the graph, otherwise you'll get the tensorflow runtime exception first.

@theedge456
Copy link
Author

logGraph allows to start tensorboard. Unfortunately, the graph loading process hangs at about 30% with the message:
Data: Parsing graph.pbtxt

I made a little progress but I don't understand the following message:
TensorFlowException TF_INVALID_ARGUMENT "Incompatible shapes: [784,500] vs. [500,784]\n\t [[Node: Mul_43 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_40, Transpose_42)]]"

Does it mean that the dimensions in the node Mul_43 are incorrect ?
Thanks for the effort anyway.

@fkm3
Copy link
Contributor

fkm3 commented Sep 6, 2018

Hmm. You may need to make sure the withEventWriter call exits before the error happens, otherwise it may not have flushed the file write yet and so the graph.pbtxt will be incomplete.

TF.withEventWriter "/path/to/logs" $ \eventWriter -> TF.logGraph eventWriter graph

-- Other code that actually runs the graph.

Does it mean that the dimensions in the node Mul_43 are incorrect ?

That does seem to be what it is saying, but the dimension look compatible to me... If you have any code you can share I can take a look.

@theedge456
Copy link
Author

theedge456 commented Sep 7, 2018

code.tar.gz
I tried to remove all the un-necessary code from the file.
The cabal project is built in a sandbox.
The error is:
TensorFlowException TF_INVALID_ARGUMENT "Incompatible shapes: [500,784] vs. [784,500]\n\t [[Node: Mul_7 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_inputToto/XTiti_0_0, ReadVariableOp_6)]]"

@fkm3
Copy link
Contributor

fkm3 commented Sep 11, 2018

I had to make a few edits to get the code to compile, e.g. I got this error

.../src/RBM.hs:117:45: error:
    Variable not in scope: h0 :: TFT.Tensor v0 t0
    |
117 |         TFL.scalarSummary (pack "update_w") h0 -- update_w
    |                                             ^^

After renaming h_sampleProbArg to h0 and adding a Main module, I was able to build. I couldn't reproduce the error though, it ran fine for me.

@theedge456
Copy link
Author

I switched to the python version of the code as it runs flawlessly.
Thanks for your support anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants