You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why ?I used model:8 batch 1 settings and 8 batchsize {input:1024, output:512}, there is still a OOM。
But I see Pytorch can train T5-3b at V100 32G * 8 , whether It was caused by Mesh-tensorflow‘s less efficient than deepspeed??
I want to figure out the problem?? actually I want use T5 because Tensorflow is easy for deployment. Why deepspeed can train larger model than mesh_tensorflow????
The text was updated successfully, but these errors were encountered:
Why ?I used model:8 batch 1 settings and 8 batchsize {input:1024, output:512}, there is still a OOM。
But I see Pytorch can train T5-3b at V100 32G * 8 , whether It was caused by Mesh-tensorflow‘s less efficient than deepspeed??
I want to figure out the problem?? actually I want use T5 because Tensorflow is easy for deployment. Why deepspeed can train larger model than mesh_tensorflow????
The text was updated successfully, but these errors were encountered: