Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] TensorDictCatView #1037

Open
wants to merge 2 commits into
base: gh/vmoens/28/base
Choose a base branch
from
Open

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 10, 2024

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 10, 2024
ghstack-source-id: eda35393bab9de459fc01c6e33a872ffb1b1672a
Pull Request resolved: #1037
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2024
Copy link

github-actions bot commented Oct 10, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 60.6330μs 24.9939μs 40.0098 KOps/s 39.9619 KOps/s $\color{#35bf28}+0.12\%$
test_plain_set_stack_nested 58.7490μs 25.2740μs 39.5664 KOps/s 39.2135 KOps/s $\color{#35bf28}+0.90\%$
test_plain_set_nested_inplace 70.0000μs 27.4718μs 36.4009 KOps/s 36.5925 KOps/s $\color{#d91a1a}-0.52\%$
test_plain_set_stack_nested_inplace 65.6420μs 27.4093μs 36.4839 KOps/s 36.7818 KOps/s $\color{#d91a1a}-0.81\%$
test_items 24.4660μs 4.1839μs 239.0108 KOps/s 236.7527 KOps/s $\color{#35bf28}+0.95\%$
test_items_nested 0.4745ms 0.3917ms 2.5533 KOps/s 2.5763 KOps/s $\color{#d91a1a}-0.89\%$
test_items_nested_locked 0.6300ms 0.3917ms 2.5531 KOps/s 2.5883 KOps/s $\color{#d91a1a}-1.36\%$
test_items_nested_leaf 0.1451ms 81.9545μs 12.2019 KOps/s 12.4064 KOps/s $\color{#d91a1a}-1.65\%$
test_items_stack_nested 0.8350ms 0.3922ms 2.5496 KOps/s 2.5787 KOps/s $\color{#d91a1a}-1.13\%$
test_items_stack_nested_leaf 0.1521ms 83.7155μs 11.9452 KOps/s 12.1643 KOps/s $\color{#d91a1a}-1.80\%$
test_items_stack_nested_locked 0.6407ms 0.3986ms 2.5091 KOps/s 2.5877 KOps/s $\color{#d91a1a}-3.04\%$
test_keys 32.1900μs 3.6189μs 276.3243 KOps/s 286.6842 KOps/s $\color{#d91a1a}-3.61\%$
test_keys_nested 0.2282ms 0.1369ms 7.3067 KOps/s 7.4664 KOps/s $\color{#d91a1a}-2.14\%$
test_keys_nested_locked 0.6492ms 0.1431ms 6.9875 KOps/s 7.2253 KOps/s $\color{#d91a1a}-3.29\%$
test_keys_nested_leaf 0.2127ms 0.1203ms 8.3123 KOps/s 8.3585 KOps/s $\color{#d91a1a}-0.55\%$
test_keys_stack_nested 0.5361ms 0.1420ms 7.0446 KOps/s 7.4483 KOps/s $\textbf{\color{#d91a1a}-5.42\%}$
test_keys_stack_nested_leaf 0.1843ms 0.1197ms 8.3542 KOps/s 8.5344 KOps/s $\color{#d91a1a}-2.11\%$
test_keys_stack_nested_locked 0.2396ms 0.1414ms 7.0724 KOps/s 7.1881 KOps/s $\color{#d91a1a}-1.61\%$
test_values 7.0372μs 1.0522μs 950.4051 KOps/s 957.0757 KOps/s $\color{#d91a1a}-0.70\%$
test_values_nested 0.1557ms 95.2079μs 10.5033 KOps/s 10.4245 KOps/s $\color{#35bf28}+0.76\%$
test_values_nested_locked 0.3051ms 95.3930μs 10.4830 KOps/s 10.5562 KOps/s $\color{#d91a1a}-0.69\%$
test_values_nested_leaf 0.1485ms 81.5929μs 12.2560 KOps/s 12.5215 KOps/s $\color{#d91a1a}-2.12\%$
test_values_stack_nested 0.2140ms 93.8951μs 10.6502 KOps/s 10.7123 KOps/s $\color{#d91a1a}-0.58\%$
test_values_stack_nested_leaf 0.1341ms 79.2534μs 12.6178 KOps/s 12.6195 KOps/s $\color{#d91a1a}-0.01\%$
test_values_stack_nested_locked 0.1708ms 95.2544μs 10.4982 KOps/s 10.6905 KOps/s $\color{#d91a1a}-1.80\%$
test_membership 24.6860μs 0.9139μs 1.0943 MOps/s 1.1588 MOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_membership_nested 20.8190μs 2.8447μs 351.5292 KOps/s 366.9674 KOps/s $\color{#d91a1a}-4.21\%$
test_membership_nested_leaf 36.7190μs 2.8554μs 350.2127 KOps/s 350.1097 KOps/s $\color{#35bf28}+0.03\%$
test_membership_stacked_nested 20.5580μs 2.8365μs 352.5476 KOps/s 365.3951 KOps/s $\color{#d91a1a}-3.52\%$
test_membership_stacked_nested_leaf 42.2080μs 2.8556μs 350.1899 KOps/s 368.7424 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_membership_nested_last 37.0290μs 4.2981μs 232.6625 KOps/s 238.8191 KOps/s $\color{#d91a1a}-2.58\%$
test_membership_nested_leaf_last 32.3800μs 4.3654μs 229.0731 KOps/s 240.8519 KOps/s $\color{#d91a1a}-4.89\%$
test_membership_stacked_nested_last 25.8580μs 4.3296μs 230.9673 KOps/s 238.4344 KOps/s $\color{#d91a1a}-3.13\%$
test_membership_stacked_nested_leaf_last 34.3240μs 4.3044μs 232.3188 KOps/s 243.3476 KOps/s $\color{#d91a1a}-4.53\%$
test_nested_getleaf 43.4500μs 10.7183μs 93.2980 KOps/s 95.1960 KOps/s $\color{#d91a1a}-1.99\%$
test_nested_get 37.7810μs 10.3022μs 97.0663 KOps/s 99.1101 KOps/s $\color{#d91a1a}-2.06\%$
test_stacked_getleaf 44.5030μs 10.6371μs 94.0103 KOps/s 94.5479 KOps/s $\color{#d91a1a}-0.57\%$
test_stacked_get 39.8840μs 10.2416μs 97.6407 KOps/s 99.0488 KOps/s $\color{#d91a1a}-1.42\%$
test_nested_getitemleaf 33.3220μs 11.4035μs 87.6922 KOps/s 90.9459 KOps/s $\color{#d91a1a}-3.58\%$
test_nested_getitem 44.6430μs 10.5524μs 94.7655 KOps/s 99.5729 KOps/s $\color{#d91a1a}-4.83\%$
test_stacked_getitemleaf 38.9130μs 10.9906μs 90.9866 KOps/s 91.3768 KOps/s $\color{#d91a1a}-0.43\%$
test_stacked_getitem 0.5722ms 10.2911μs 97.1709 KOps/s 98.7044 KOps/s $\color{#d91a1a}-1.55\%$
test_lock_nested 4.9771ms 0.5379ms 1.8591 KOps/s 2.0005 KOps/s $\textbf{\color{#d91a1a}-7.07\%}$
test_lock_stack_nested 0.9117ms 0.4944ms 2.0227 KOps/s 2.1145 KOps/s $\color{#d91a1a}-4.34\%$
test_unlock_nested 0.1022s 0.5506ms 1.8163 KOps/s 2.4094 KOps/s $\textbf{\color{#d91a1a}-24.62\%}$
test_unlock_stack_nested 1.3462ms 0.4052ms 2.4682 KOps/s 2.5635 KOps/s $\color{#d91a1a}-3.72\%$
test_flatten_speed 0.2287ms 0.1015ms 9.8547 KOps/s 10.0965 KOps/s $\color{#d91a1a}-2.39\%$
test_unflatten_speed 1.0516ms 0.5252ms 1.9040 KOps/s 2.0007 KOps/s $\color{#d91a1a}-4.83\%$
test_common_ops 3.4345ms 1.1564ms 864.7750 Ops/s 846.6582 Ops/s $\color{#35bf28}+2.14\%$
test_creation 0.1198ms 2.2482μs 444.7945 KOps/s 486.5028 KOps/s $\textbf{\color{#d91a1a}-8.57\%}$
test_creation_empty 55.3940μs 19.1510μs 52.2165 KOps/s 49.1066 KOps/s $\textbf{\color{#35bf28}+6.33\%}$
test_creation_nested_1 76.1720μs 23.0830μs 43.3220 KOps/s 42.3875 KOps/s $\color{#35bf28}+2.20\%$
test_creation_nested_2 76.1320μs 27.6818μs 36.1248 KOps/s 36.2274 KOps/s $\color{#d91a1a}-0.28\%$
test_clone 93.1330μs 16.9640μs 58.9485 KOps/s 57.1630 KOps/s $\color{#35bf28}+3.12\%$
test_getitem[int] 0.9389ms 16.9662μs 58.9408 KOps/s 60.3052 KOps/s $\color{#d91a1a}-2.26\%$
test_getitem[slice_int] 0.1801ms 31.6195μs 31.6260 KOps/s 32.0329 KOps/s $\color{#d91a1a}-1.27\%$
test_getitem[range] 0.5499ms 59.3622μs 16.8457 KOps/s 17.2583 KOps/s $\color{#d91a1a}-2.39\%$
test_getitem[tuple] 0.1338ms 25.6139μs 39.0413 KOps/s 39.1521 KOps/s $\color{#d91a1a}-0.28\%$
test_getitem[list] 0.4006ms 53.9559μs 18.5337 KOps/s 18.9235 KOps/s $\color{#d91a1a}-2.06\%$
test_setitem_dim[int] 64.6210μs 32.5527μs 30.7194 KOps/s 30.6024 KOps/s $\color{#35bf28}+0.38\%$
test_setitem_dim[slice_int] 0.1168ms 60.6923μs 16.4766 KOps/s 16.2198 KOps/s $\color{#35bf28}+1.58\%$
test_setitem_dim[range] 0.1618ms 85.1056μs 11.7501 KOps/s 11.9766 KOps/s $\color{#d91a1a}-1.89\%$
test_setitem_dim[tuple] 0.1041ms 50.0711μs 19.9716 KOps/s 20.3466 KOps/s $\color{#d91a1a}-1.84\%$
test_setitem 0.2107ms 31.3189μs 31.9296 KOps/s 30.9466 KOps/s $\color{#35bf28}+3.18\%$
test_set 0.1324ms 30.3797μs 32.9167 KOps/s 31.6886 KOps/s $\color{#35bf28}+3.88\%$
test_set_shared 3.7634ms 0.2197ms 4.5513 KOps/s 4.5338 KOps/s $\color{#35bf28}+0.39\%$
test_update 0.1930ms 39.3902μs 25.3870 KOps/s 24.5088 KOps/s $\color{#35bf28}+3.58\%$
test_update_nested 0.2557ms 50.1345μs 19.9463 KOps/s 19.6060 KOps/s $\color{#35bf28}+1.74\%$
test_update__nested 0.8167ms 44.6026μs 22.4202 KOps/s 22.1690 KOps/s $\color{#35bf28}+1.13\%$
test_set_nested 0.2182ms 33.9591μs 29.4471 KOps/s 28.6354 KOps/s $\color{#35bf28}+2.83\%$
test_set_nested_new 0.2071ms 39.0745μs 25.5921 KOps/s 25.3087 KOps/s $\color{#35bf28}+1.12\%$
test_select 0.1249ms 57.0224μs 17.5370 KOps/s 17.5466 KOps/s $\color{#d91a1a}-0.05\%$
test_select_nested 0.1330ms 61.1437μs 16.3549 KOps/s 16.9287 KOps/s $\color{#d91a1a}-3.39\%$
test_exclude_nested 0.1612ms 76.2054μs 13.1224 KOps/s 13.3820 KOps/s $\color{#d91a1a}-1.94\%$
test_empty[True] 0.9334ms 0.3650ms 2.7394 KOps/s 2.8539 KOps/s $\color{#d91a1a}-4.01\%$
test_empty[False] 12.6835μs 1.3459μs 742.9843 KOps/s 828.8382 KOps/s $\textbf{\color{#d91a1a}-10.36\%}$
test_unbind_speed 0.5927ms 0.3205ms 3.1202 KOps/s 3.1872 KOps/s $\color{#d91a1a}-2.10\%$
test_unbind_speed_stack0 0.5217ms 0.3200ms 3.1253 KOps/s 3.4434 KOps/s $\textbf{\color{#d91a1a}-9.24\%}$
test_unbind_speed_stack1 0.1066s 0.8939ms 1.1187 KOps/s 1.3239 KOps/s $\textbf{\color{#d91a1a}-15.50\%}$
test_split 0.1105s 2.2062ms 453.2651 Ops/s 446.6058 Ops/s $\color{#35bf28}+1.49\%$
test_chunk 2.2593ms 1.9894ms 502.6708 Ops/s 450.9001 Ops/s $\textbf{\color{#35bf28}+11.48\%}$
test_creation[device0] 0.2488ms 0.1175ms 8.5084 KOps/s 8.2680 KOps/s $\color{#35bf28}+2.91\%$
test_creation_from_tensor 2.4658ms 0.1176ms 8.5040 KOps/s 8.5309 KOps/s $\color{#d91a1a}-0.32\%$
test_add_one[memmap_tensor0] 0.3475ms 7.3433μs 136.1779 KOps/s 132.1683 KOps/s $\color{#35bf28}+3.03\%$
test_contiguous[memmap_tensor0] 25.5470μs 1.8578μs 538.2644 KOps/s 514.5438 KOps/s $\color{#35bf28}+4.61\%$
test_stack[memmap_tensor0] 38.5920μs 5.7597μs 173.6202 KOps/s 166.0962 KOps/s $\color{#35bf28}+4.53\%$
test_memmaptd_index 1.1201ms 0.4156ms 2.4062 KOps/s 2.4456 KOps/s $\color{#d91a1a}-1.61\%$
test_memmaptd_index_astensor 0.8266ms 0.5182ms 1.9299 KOps/s 1.9521 KOps/s $\color{#d91a1a}-1.14\%$
test_memmaptd_index_op 1.9167ms 1.0631ms 940.6159 Ops/s 906.8116 Ops/s $\color{#35bf28}+3.73\%$
test_serialize_model 0.1274s 0.1184s 8.4428 Ops/s 8.3237 Ops/s $\color{#35bf28}+1.43\%$
test_serialize_model_pickle 0.4391s 0.4026s 2.4837 Ops/s 2.5012 Ops/s $\color{#d91a1a}-0.70\%$
test_serialize_weights 0.1264s 0.1168s 8.5641 Ops/s 7.4171 Ops/s $\textbf{\color{#35bf28}+15.46\%}$
test_serialize_weights_returnearly 0.1716s 0.1608s 6.2200 Ops/s 6.2190 Ops/s $\color{#35bf28}+0.02\%$
test_serialize_weights_pickle 1.2491s 0.7543s 1.3257 Ops/s 2.4801 Ops/s $\textbf{\color{#d91a1a}-46.55\%}$
test_serialize_weights_filesystem 0.1518s 0.1468s 6.8116 Ops/s 6.8018 Ops/s $\color{#35bf28}+0.14\%$
test_serialize_model_filesystem 0.1626s 0.1488s 6.7202 Ops/s 5.9997 Ops/s $\textbf{\color{#35bf28}+12.01\%}$
test_reshape_pytree 83.2550μs 38.5140μs 25.9646 KOps/s 26.3922 KOps/s $\color{#d91a1a}-1.62\%$
test_reshape_td 0.1458ms 47.4982μs 21.0534 KOps/s 21.3533 KOps/s $\color{#d91a1a}-1.40\%$
test_view_pytree 0.1137ms 38.4895μs 25.9811 KOps/s 26.4645 KOps/s $\color{#d91a1a}-1.83\%$
test_view_td 0.1495ms 53.5453μs 18.6758 KOps/s 19.6637 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_unbind_pytree 81.6720μs 35.5487μs 28.1304 KOps/s 28.0865 KOps/s $\color{#35bf28}+0.16\%$
test_unbind_td 0.3669ms 46.6757μs 21.4244 KOps/s 22.8422 KOps/s $\textbf{\color{#d91a1a}-6.21\%}$
test_split_pytree 0.1022ms 37.9528μs 26.3485 KOps/s 26.9855 KOps/s $\color{#d91a1a}-2.36\%$
test_split_td 0.6620ms 58.8184μs 17.0015 KOps/s 17.4697 KOps/s $\color{#d91a1a}-2.68\%$
test_add_pytree 0.1057ms 44.6144μs 22.4143 KOps/s 21.2840 KOps/s $\textbf{\color{#35bf28}+5.31\%}$
test_add_td 0.3086ms 87.2721μs 11.4584 KOps/s 10.6622 KOps/s $\textbf{\color{#35bf28}+7.47\%}$
test_compile_add_one_nested[tensordict-compile] 0.1292ms 58.8733μs 16.9856 KOps/s 17.0892 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_add_one_nested[tensordict-eager] 1.2885ms 0.2058ms 4.8587 KOps/s 5.0873 KOps/s $\color{#d91a1a}-4.49\%$
test_compile_add_one_nested[pytree-compile] 0.1460ms 57.8573μs 17.2839 KOps/s 17.5063 KOps/s $\color{#d91a1a}-1.27\%$
test_compile_add_one_nested[pytree-eager] 0.2633ms 0.1400ms 7.1410 KOps/s 7.0326 KOps/s $\color{#35bf28}+1.54\%$
test_compile_copy_nested[tensordict-compile] 60.2820μs 23.5428μs 42.4758 KOps/s 42.7893 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_copy_nested[tensordict-eager] 0.1870ms 73.7357μs 13.5619 KOps/s 13.7223 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_copy_nested[pytree-compile] 0.3019ms 76.7849μs 13.0234 KOps/s 13.2391 KOps/s $\color{#d91a1a}-1.63\%$
test_compile_copy_nested[pytree-eager] 0.1237ms 69.2665μs 14.4370 KOps/s 14.7260 KOps/s $\color{#d91a1a}-1.96\%$
test_compile_add_one_flat[tensordict-compile] 0.2954ms 0.1839ms 5.4391 KOps/s 5.4539 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_add_one_flat[tensordict-eager] 0.4052ms 0.2403ms 4.1622 KOps/s 4.1529 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_one_flat[tensorclass-compile] 0.1083ms 48.7732μs 20.5031 KOps/s 20.4375 KOps/s $\color{#35bf28}+0.32\%$
test_compile_add_one_flat[tensorclass-eager] 0.5342ms 77.8095μs 12.8519 KOps/s 12.4605 KOps/s $\color{#35bf28}+3.14\%$
test_compile_add_one_flat[pytree-compile] 0.2885ms 0.1766ms 5.6616 KOps/s 5.7683 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_add_one_flat[pytree-eager] 0.5164ms 0.2868ms 3.4869 KOps/s 3.4779 KOps/s $\color{#35bf28}+0.26\%$
test_compile_add_self_flat[tensordict-eager] 0.5106ms 0.2759ms 3.6250 KOps/s 3.5920 KOps/s $\color{#35bf28}+0.92\%$
test_compile_add_self_flat[tensordict-compile] 0.3356ms 0.1874ms 5.3365 KOps/s 5.5811 KOps/s $\color{#d91a1a}-4.38\%$
test_compile_add_self_flat[tensorclass-eager] 0.2515ms 75.7161μs 13.2072 KOps/s 13.3171 KOps/s $\color{#d91a1a}-0.83\%$
test_compile_add_self_flat[tensorclass-compile] 0.1947ms 49.1591μs 20.3421 KOps/s 20.7412 KOps/s $\color{#d91a1a}-1.92\%$
test_compile_add_self_flat[pytree-eager] 0.5415ms 0.2346ms 4.2633 KOps/s 4.2622 KOps/s $\color{#35bf28}+0.02\%$
test_compile_add_self_flat[pytree-compile] 0.2861ms 0.1761ms 5.6784 KOps/s 5.5835 KOps/s $\color{#35bf28}+1.70\%$
test_compile_copy_flat[tensordict-compile] 0.2247ms 0.1118ms 8.9447 KOps/s 8.9869 KOps/s $\color{#d91a1a}-0.47\%$
test_compile_copy_flat[tensordict-eager] 0.1474ms 79.2196μs 12.6231 KOps/s 12.8831 KOps/s $\color{#d91a1a}-2.02\%$
test_compile_copy_flat[pytree-compile] 0.1441ms 79.1601μs 12.6326 KOps/s 13.2677 KOps/s $\color{#d91a1a}-4.79\%$
test_compile_copy_flat[pytree-eager] 0.1242ms 70.5874μs 14.1668 KOps/s 14.9946 KOps/s $\textbf{\color{#d91a1a}-5.52\%}$
test_compile_assign_and_add[tensordict-compile] 0.2867ms 0.1969ms 5.0800 KOps/s 5.2132 KOps/s $\color{#d91a1a}-2.56\%$
test_compile_assign_and_add[tensordict-eager] 2.7342ms 1.7540ms 570.1185 Ops/s 584.3041 Ops/s $\color{#d91a1a}-2.43\%$
test_compile_assign_and_add[pytree-compile] 0.2793ms 0.1950ms 5.1270 KOps/s 5.1753 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_assign_and_add[pytree-eager] 1.2529ms 1.1068ms 903.4927 Ops/s 912.7660 Ops/s $\color{#d91a1a}-1.02\%$
test_compile_assign_and_add_stack[compile] 0.5628ms 0.4257ms 2.3492 KOps/s 2.3969 KOps/s $\color{#d91a1a}-1.99\%$
test_compile_assign_and_add_stack[eager] 6.3882ms 4.2024ms 237.9618 Ops/s 231.0427 Ops/s $\color{#35bf28}+2.99\%$
test_compile_indexing[tensor-tensordict-compile] 92.0810μs 35.0407μs 28.5382 KOps/s 28.8084 KOps/s $\color{#d91a1a}-0.94\%$
test_compile_indexing[tensor-tensordict-eager] 0.6303ms 48.7879μs 20.4969 KOps/s 20.5725 KOps/s $\color{#d91a1a}-0.37\%$
test_compile_indexing[tensor-tensorclass-compile] 76.2820μs 31.5544μs 31.6913 KOps/s 32.4633 KOps/s $\color{#d91a1a}-2.38\%$
test_compile_indexing[tensor-tensorclass-eager] 98.4630μs 29.7289μs 33.6373 KOps/s 35.1193 KOps/s $\color{#d91a1a}-4.22\%$
test_compile_indexing[tensor-pytree-compile] 87.1520μs 31.5923μs 31.6532 KOps/s 32.5847 KOps/s $\color{#d91a1a}-2.86\%$
test_compile_indexing[tensor-pytree-eager] 89.2960μs 29.7613μs 33.6007 KOps/s 35.4145 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_compile_indexing[slice-tensordict-compile] 0.1400ms 74.1429μs 13.4875 KOps/s 13.5755 KOps/s $\color{#d91a1a}-0.65\%$
test_compile_indexing[slice-tensordict-eager] 0.5566ms 27.8144μs 35.9525 KOps/s 36.1683 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_indexing[slice-tensorclass-compile] 0.1747ms 69.8109μs 14.3244 KOps/s 14.4711 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_indexing[slice-tensorclass-eager] 68.1460μs 23.4632μs 42.6200 KOps/s 43.9078 KOps/s $\color{#d91a1a}-2.93\%$
test_compile_indexing[slice-pytree-compile] 0.1482ms 70.0763μs 14.2702 KOps/s 14.4778 KOps/s $\color{#d91a1a}-1.43\%$
test_compile_indexing[slice-pytree-eager] 80.2300μs 23.1115μs 43.2685 KOps/s 44.3936 KOps/s $\color{#d91a1a}-2.53\%$
test_compile_indexing[int-tensordict-compile] 0.1491ms 73.3826μs 13.6272 KOps/s 13.7837 KOps/s $\color{#d91a1a}-1.14\%$
test_compile_indexing[int-tensordict-eager] 0.9074ms 27.3117μs 36.6143 KOps/s 36.1108 KOps/s $\color{#35bf28}+1.39\%$
test_compile_indexing[int-tensorclass-compile] 0.1463ms 69.1847μs 14.4541 KOps/s 14.4714 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_indexing[int-tensorclass-eager] 65.0810μs 23.3005μs 42.9175 KOps/s 44.3649 KOps/s $\color{#d91a1a}-3.26\%$
test_compile_indexing[int-pytree-compile] 0.1511ms 69.3337μs 14.4230 KOps/s 14.6782 KOps/s $\color{#d91a1a}-1.74\%$
test_compile_indexing[int-pytree-eager] 86.9820μs 23.3123μs 42.8959 KOps/s 44.9228 KOps/s $\color{#d91a1a}-4.51\%$
test_mod_add[eager] 73.9080μs 26.9842μs 37.0587 KOps/s 36.4473 KOps/s $\color{#35bf28}+1.68\%$
test_mod_add[compile] 95.2870μs 39.6057μs 25.2489 KOps/s 25.8747 KOps/s $\color{#d91a1a}-2.42\%$
test_mod_add[compile-overhead] 0.1114ms 40.6875μs 24.5776 KOps/s 25.7934 KOps/s $\color{#d91a1a}-4.71\%$
test_mod_wrap[eager] 0.3912ms 0.2102ms 4.7565 KOps/s 4.7771 KOps/s $\color{#d91a1a}-0.43\%$
test_mod_wrap[compile] 0.4998ms 0.2381ms 4.1993 KOps/s 4.2884 KOps/s $\color{#d91a1a}-2.08\%$
test_mod_wrap[compile-overhead] 0.4787ms 0.2376ms 4.2084 KOps/s 4.4131 KOps/s $\color{#d91a1a}-4.64\%$
test_mod_wrap_and_backward[eager] 13.0462ms 11.0922ms 90.1531 Ops/s 76.9059 Ops/s $\textbf{\color{#35bf28}+17.23\%}$
test_mod_wrap_and_backward[compile] 12.5540ms 11.1391ms 89.7737 Ops/s 79.8768 Ops/s $\textbf{\color{#35bf28}+12.39\%}$
test_mod_wrap_and_backward[compile-overhead] 14.7358ms 11.3310ms 88.2537 Ops/s 79.4496 Ops/s $\textbf{\color{#35bf28}+11.08\%}$
test_seq_add[eager] 0.2062ms 96.8401μs 10.3263 KOps/s 10.4430 KOps/s $\color{#d91a1a}-1.12\%$
test_seq_add[compile] 0.1333ms 66.5799μs 15.0196 KOps/s 14.9757 KOps/s $\color{#35bf28}+0.29\%$
test_seq_add[compile-overhead] 0.1355ms 65.4874μs 15.2701 KOps/s 15.6930 KOps/s $\color{#d91a1a}-2.69\%$
test_seq_wrap[eager] 1.0723ms 0.3959ms 2.5261 KOps/s 2.5205 KOps/s $\color{#35bf28}+0.22\%$
test_seq_wrap[compile] 0.5143ms 0.2764ms 3.6175 KOps/s 3.6772 KOps/s $\color{#d91a1a}-1.62\%$
test_seq_wrap[compile-overhead] 0.6283ms 0.2772ms 3.6075 KOps/s 3.6767 KOps/s $\color{#d91a1a}-1.88\%$
test_func_call_runtime[False-eager] 0.8074ms 0.5281ms 1.8936 KOps/s 1.8606 KOps/s $\color{#35bf28}+1.77\%$
test_func_call_runtime[False-compile] 0.6316ms 0.5078ms 1.9693 KOps/s 1.9879 KOps/s $\color{#d91a1a}-0.93\%$
test_func_call_runtime[False-compile-overhead] 0.6264ms 0.5052ms 1.9793 KOps/s 2.0119 KOps/s $\color{#d91a1a}-1.62\%$
test_func_call_runtime[True-eager] 1.3818ms 0.7564ms 1.3220 KOps/s 1.3137 KOps/s $\color{#35bf28}+0.63\%$
test_func_call_runtime[True-compile] 0.7303ms 0.5201ms 1.9227 KOps/s 1.9199 KOps/s $\color{#35bf28}+0.15\%$
test_func_call_runtime[True-compile-overhead] 0.8476ms 0.5270ms 1.8976 KOps/s 1.9029 KOps/s $\color{#d91a1a}-0.28\%$
test_func_call_cm_runtime[False-eager] 1.0016ms 0.5318ms 1.8803 KOps/s 1.8406 KOps/s $\color{#35bf28}+2.16\%$
test_func_call_cm_runtime[False-compile] 0.6768ms 0.5081ms 1.9679 KOps/s 1.9492 KOps/s $\color{#35bf28}+0.96\%$
test_func_call_cm_runtime[False-compile-overhead] 0.7148ms 0.5090ms 1.9645 KOps/s 1.9807 KOps/s $\color{#d91a1a}-0.82\%$
test_func_call_cm_runtime[True-eager] 1.4040ms 0.9205ms 1.0864 KOps/s 1.0871 KOps/s $\color{#d91a1a}-0.07\%$
test_func_call_cm_runtime[True-compile] 1.0283ms 0.7456ms 1.3413 KOps/s 1.3311 KOps/s $\color{#35bf28}+0.76\%$
test_func_call_cm_runtime[True-compile-overhead] 0.9318ms 0.7390ms 1.3531 KOps/s 1.3198 KOps/s $\color{#35bf28}+2.53\%$
test_vmap_func_call_cm_runtime[eager] 2.7383ms 1.9315ms 517.7386 Ops/s 506.8652 Ops/s $\color{#35bf28}+2.15\%$
test_vmap_func_call_cm_runtime[compile] 2.7934ms 1.9870ms 503.2639 Ops/s 489.7777 Ops/s $\color{#35bf28}+2.75\%$
test_vmap_func_call_cm_runtime[compile-overhead] 3.3023ms 2.0034ms 499.1483 Ops/s 488.3736 Ops/s $\color{#35bf28}+2.21\%$
test_distributed 0.3073ms 0.1297ms 7.7120 KOps/s 7.6736 KOps/s $\color{#35bf28}+0.50\%$
test_tdmodule 90.0670μs 17.9980μs 55.5619 KOps/s 50.3022 KOps/s $\textbf{\color{#35bf28}+10.46\%}$
test_tdmodule_dispatch 69.5800μs 36.8207μs 27.1586 KOps/s 24.3357 KOps/s $\textbf{\color{#35bf28}+11.60\%}$
test_tdseq 46.5560μs 20.6232μs 48.4890 KOps/s 44.1162 KOps/s $\textbf{\color{#35bf28}+9.91\%}$
test_tdseq_dispatch 74.6190μs 42.7420μs 23.3962 KOps/s 22.3521 KOps/s $\color{#35bf28}+4.67\%$
test_instantiation_functorch 2.5894ms 1.5830ms 631.7114 Ops/s 627.2070 Ops/s $\color{#35bf28}+0.72\%$
test_exec_functorch 0.4337ms 0.1892ms 5.2861 KOps/s 5.3827 KOps/s $\color{#d91a1a}-1.79\%$
test_exec_functional_call 0.2688ms 0.1772ms 5.6430 KOps/s 5.6205 KOps/s $\color{#35bf28}+0.40\%$
test_exec_td_decorator 0.6035ms 0.2350ms 4.2546 KOps/s 4.3080 KOps/s $\color{#d91a1a}-1.24\%$
test_vmap_mlp_speed_decorator[True-True] 1.4562ms 0.6590ms 1.5174 KOps/s 1.5309 KOps/s $\color{#d91a1a}-0.89\%$
test_vmap_mlp_speed_decorator[True-False] 1.1141ms 0.6570ms 1.5222 KOps/s 1.4814 KOps/s $\color{#35bf28}+2.75\%$
test_vmap_mlp_speed_decorator[False-True] 0.8519ms 0.5397ms 1.8530 KOps/s 1.8583 KOps/s $\color{#d91a1a}-0.28\%$
test_vmap_mlp_speed_decorator[False-False] 0.9726ms 0.5374ms 1.8608 KOps/s 1.8655 KOps/s $\color{#d91a1a}-0.25\%$
test_to_module_speed[True] 2.5408ms 1.4536ms 687.9291 Ops/s 700.6840 Ops/s $\color{#d91a1a}-1.82\%$
test_to_module_speed[False] 1.5200ms 1.3934ms 717.6720 Ops/s 728.8604 Ops/s $\color{#d91a1a}-1.54\%$
test_tc_init 98.7640μs 47.5293μs 21.0396 KOps/s 19.7238 KOps/s $\textbf{\color{#35bf28}+6.67\%}$
test_tc_init_nested 0.1757ms 96.0922μs 10.4067 KOps/s 9.9529 KOps/s $\color{#35bf28}+4.56\%$
test_tc_first_layer_tensor 33.8430μs 1.5471μs 646.3876 KOps/s 647.4782 KOps/s $\color{#d91a1a}-0.17\%$
test_tc_first_layer_nontensor 29.2240μs 4.6780μs 213.7655 KOps/s 220.5462 KOps/s $\color{#d91a1a}-3.07\%$
test_tc_second_layer_tensor 25.2170μs 2.8414μs 351.9343 KOps/s 358.0101 KOps/s $\color{#d91a1a}-1.70\%$
test_tc_second_layer_nontensor 0.1693ms 6.0018μs 166.6175 KOps/s 169.6431 KOps/s $\color{#d91a1a}-1.78\%$
test_unbind 8.3194ms 7.7398ms 129.2020 Ops/s 73.2024 Ops/s $\textbf{\color{#35bf28}+76.50\%}$
test_full_like 23.2310ms 13.9526ms 71.6714 Ops/s 119.8505 Ops/s $\textbf{\color{#d91a1a}-40.20\%}$
test_zeros_like 15.5192ms 8.0245ms 124.6191 Ops/s 313.2111 Ops/s $\textbf{\color{#d91a1a}-60.21\%}$
test_ones_like 17.0311ms 8.1184ms 123.1768 Ops/s 120.9727 Ops/s $\color{#35bf28}+1.82\%$
test_clone 16.5694ms 10.2050ms 97.9913 Ops/s 97.2555 Ops/s $\color{#35bf28}+0.76\%$
test_squeeze 72.6950μs 12.3614μs 80.8970 KOps/s 82.6306 KOps/s $\color{#d91a1a}-2.10\%$
test_unsqueeze 0.1742ms 92.0910μs 10.8588 KOps/s 10.7289 KOps/s $\color{#35bf28}+1.21\%$
test_split 0.4974ms 0.1921ms 5.2052 KOps/s 4.9972 KOps/s $\color{#35bf28}+4.16\%$
test_permute 0.3670ms 0.2178ms 4.5903 KOps/s 4.4803 KOps/s $\color{#35bf28}+2.45\%$
test_stack 34.6362ms 28.4559ms 35.1421 Ops/s 36.3687 Ops/s $\color{#d91a1a}-3.37\%$
test_cat 36.8149ms 28.2119ms 35.4461 Ops/s 35.9415 Ops/s $\color{#d91a1a}-1.38\%$

Copy link

github-actions bot commented Oct 10, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 218. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1455ms 16.9090μs 59.1401 KOps/s 59.9299 KOps/s $\color{#d91a1a}-1.32\%$
test_plain_set_stack_nested 45.4400μs 17.0521μs 58.6437 KOps/s 59.4081 KOps/s $\color{#d91a1a}-1.29\%$
test_plain_set_nested_inplace 48.6500μs 18.1886μs 54.9795 KOps/s 55.5015 KOps/s $\color{#d91a1a}-0.94\%$
test_plain_set_stack_nested_inplace 48.8400μs 18.2093μs 54.9169 KOps/s 55.9320 KOps/s $\color{#d91a1a}-1.81\%$
test_items 25.7610μs 2.8956μs 345.3459 KOps/s 340.5744 KOps/s $\color{#35bf28}+1.40\%$
test_items_nested 0.3721ms 0.3406ms 2.9356 KOps/s 2.9664 KOps/s $\color{#d91a1a}-1.04\%$
test_items_nested_locked 0.3749ms 0.3432ms 2.9137 KOps/s 2.9372 KOps/s $\color{#d91a1a}-0.80\%$
test_items_nested_leaf 0.1137ms 63.7953μs 15.6751 KOps/s 15.9089 KOps/s $\color{#d91a1a}-1.47\%$
test_items_stack_nested 0.5136ms 0.3440ms 2.9068 KOps/s 2.9190 KOps/s $\color{#d91a1a}-0.42\%$
test_items_stack_nested_leaf 93.4410μs 64.6424μs 15.4697 KOps/s 15.5473 KOps/s $\color{#d91a1a}-0.50\%$
test_items_stack_nested_locked 0.3901ms 0.3419ms 2.9245 KOps/s 2.9203 KOps/s $\color{#35bf28}+0.15\%$
test_keys 23.3700μs 3.4059μs 293.6054 KOps/s 273.1073 KOps/s $\textbf{\color{#35bf28}+7.51\%}$
test_keys_nested 97.1610μs 71.6385μs 13.9590 KOps/s 14.1039 KOps/s $\color{#d91a1a}-1.03\%$
test_keys_nested_locked 0.8463ms 76.9576μs 12.9942 KOps/s 12.9290 KOps/s $\color{#35bf28}+0.50\%$
test_keys_nested_leaf 93.5310μs 62.2056μs 16.0757 KOps/s 16.1785 KOps/s $\color{#d91a1a}-0.64\%$
test_keys_stack_nested 0.1105ms 71.0869μs 14.0673 KOps/s 14.0431 KOps/s $\color{#35bf28}+0.17\%$
test_keys_stack_nested_leaf 91.2110μs 64.4255μs 15.5218 KOps/s 15.7982 KOps/s $\color{#d91a1a}-1.75\%$
test_keys_stack_nested_locked 0.1096ms 77.8614μs 12.8433 KOps/s 12.8803 KOps/s $\color{#d91a1a}-0.29\%$
test_values 5.1150μs 0.8373μs 1.1944 MOps/s 1.1734 MOps/s $\color{#35bf28}+1.79\%$
test_values_nested 89.9500μs 49.0904μs 20.3706 KOps/s 20.4056 KOps/s $\color{#d91a1a}-0.17\%$
test_values_nested_locked 84.7900μs 50.9008μs 19.6460 KOps/s 19.8632 KOps/s $\color{#d91a1a}-1.09\%$
test_values_nested_leaf 70.5010μs 42.8154μs 23.3561 KOps/s 23.4080 KOps/s $\color{#d91a1a}-0.22\%$
test_values_stack_nested 95.2500μs 50.7651μs 19.6986 KOps/s 19.9352 KOps/s $\color{#d91a1a}-1.19\%$
test_values_stack_nested_leaf 79.0400μs 44.2564μs 22.5956 KOps/s 22.9851 KOps/s $\color{#d91a1a}-1.69\%$
test_values_stack_nested_locked 79.1810μs 52.1548μs 19.1737 KOps/s 19.4990 KOps/s $\color{#d91a1a}-1.67\%$
test_membership 2.0945μs 0.5021μs 1.9917 MOps/s 1.9896 MOps/s $\color{#35bf28}+0.11\%$
test_membership_nested 19.6900μs 1.8538μs 539.4205 KOps/s 524.0761 KOps/s $\color{#35bf28}+2.93\%$
test_membership_nested_leaf 18.4603μs 1.8281μs 547.0020 KOps/s 553.0605 KOps/s $\color{#d91a1a}-1.10\%$
test_membership_stacked_nested 16.4610μs 1.9054μs 524.8125 KOps/s 526.3816 KOps/s $\color{#d91a1a}-0.30\%$
test_membership_stacked_nested_leaf 25.9400μs 1.9251μs 519.4411 KOps/s 520.3190 KOps/s $\color{#d91a1a}-0.17\%$
test_membership_nested_last 29.9000μs 2.9446μs 339.6010 KOps/s 335.0828 KOps/s $\color{#35bf28}+1.35\%$
test_membership_nested_leaf_last 34.7100μs 2.9578μs 338.0859 KOps/s 332.2136 KOps/s $\color{#35bf28}+1.77\%$
test_membership_stacked_nested_last 30.4700μs 2.9335μs 340.8888 KOps/s 263.8504 KOps/s $\textbf{\color{#35bf28}+29.20\%}$
test_membership_stacked_nested_leaf_last 30.8210μs 2.9597μs 337.8752 KOps/s 267.7475 KOps/s $\textbf{\color{#35bf28}+26.19\%}$
test_nested_getleaf 28.4400μs 6.1052μs 163.7942 KOps/s 165.8281 KOps/s $\color{#d91a1a}-1.23\%$
test_nested_get 36.1100μs 5.7628μs 173.5255 KOps/s 175.3144 KOps/s $\color{#d91a1a}-1.02\%$
test_stacked_getleaf 31.2200μs 5.9871μs 167.0271 KOps/s 165.9284 KOps/s $\color{#35bf28}+0.66\%$
test_stacked_get 33.4010μs 5.5842μs 179.0753 KOps/s 177.5318 KOps/s $\color{#35bf28}+0.87\%$
test_nested_getitemleaf 27.5810μs 6.1545μs 162.4827 KOps/s 163.6222 KOps/s $\color{#d91a1a}-0.70\%$
test_nested_getitem 35.7300μs 5.7886μs 172.7541 KOps/s 174.0278 KOps/s $\color{#d91a1a}-0.73\%$
test_stacked_getitemleaf 36.9400μs 6.1335μs 163.0386 KOps/s 165.0611 KOps/s $\color{#d91a1a}-1.23\%$
test_stacked_getitem 30.4900μs 5.7489μs 173.9466 KOps/s 174.6411 KOps/s $\color{#d91a1a}-0.40\%$
test_lock_nested 0.8723ms 0.4336ms 2.3061 KOps/s 2.3084 KOps/s $\color{#d91a1a}-0.10\%$
test_lock_stack_nested 0.4480ms 0.3960ms 2.5250 KOps/s 2.5353 KOps/s $\color{#d91a1a}-0.41\%$
test_unlock_nested 0.8427ms 0.3739ms 2.6742 KOps/s 2.7107 KOps/s $\color{#d91a1a}-1.35\%$
test_unlock_stack_nested 0.4002ms 0.3356ms 2.9799 KOps/s 2.9997 KOps/s $\color{#d91a1a}-0.66\%$
test_flatten_speed 0.1548ms 76.6394μs 13.0481 KOps/s 13.0112 KOps/s $\color{#35bf28}+0.28\%$
test_unflatten_speed 0.3661ms 0.3246ms 3.0811 KOps/s 3.1332 KOps/s $\color{#d91a1a}-1.66\%$
test_common_ops 1.7620ms 1.3077ms 764.7159 Ops/s 771.9915 Ops/s $\color{#d91a1a}-0.94\%$
test_creation 27.7300μs 1.5014μs 666.0490 KOps/s 668.7669 KOps/s $\color{#d91a1a}-0.41\%$
test_creation_empty 45.6000μs 16.1936μs 61.7529 KOps/s 64.0031 KOps/s $\color{#d91a1a}-3.52\%$
test_creation_nested_1 47.2100μs 18.1614μs 55.0620 KOps/s 58.3236 KOps/s $\textbf{\color{#d91a1a}-5.59\%}$
test_creation_nested_2 50.0400μs 20.9522μs 47.7277 KOps/s 50.4056 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_clone 0.1078ms 30.3450μs 32.9544 KOps/s 33.4907 KOps/s $\color{#d91a1a}-1.60\%$
test_getitem[int] 1.2715ms 15.8808μs 62.9691 KOps/s 61.9973 KOps/s $\color{#35bf28}+1.57\%$
test_getitem[slice_int] 0.1167ms 27.4818μs 36.3878 KOps/s 36.2327 KOps/s $\color{#35bf28}+0.43\%$
test_getitem[range] 0.1565ms 0.1101ms 9.0799 KOps/s 8.9529 KOps/s $\color{#35bf28}+1.42\%$
test_getitem[tuple] 0.1161ms 24.1641μs 41.3837 KOps/s 40.6976 KOps/s $\color{#35bf28}+1.69\%$
test_getitem[list] 0.2052ms 0.1011ms 9.8903 KOps/s 9.9140 KOps/s $\color{#d91a1a}-0.24\%$
test_setitem_dim[int] 72.6110μs 45.9547μs 21.7606 KOps/s 21.8131 KOps/s $\color{#d91a1a}-0.24\%$
test_setitem_dim[slice_int] 90.8410μs 67.6977μs 14.7716 KOps/s 14.6474 KOps/s $\color{#35bf28}+0.85\%$
test_setitem_dim[range] 0.1586ms 0.1300ms 7.6915 KOps/s 7.7157 KOps/s $\color{#d91a1a}-0.31\%$
test_setitem_dim[tuple] 0.1023ms 61.8220μs 16.1755 KOps/s 16.0679 KOps/s $\color{#35bf28}+0.67\%$
test_setitem 90.4100μs 44.5168μs 22.4634 KOps/s 23.2550 KOps/s $\color{#d91a1a}-3.40\%$
test_set 88.7210μs 47.0298μs 21.2631 KOps/s 24.1544 KOps/s $\textbf{\color{#d91a1a}-11.97\%}$
test_set_shared 0.3858ms 60.0430μs 16.6547 KOps/s 18.2958 KOps/s $\textbf{\color{#d91a1a}-8.97\%}$
test_update 95.1600μs 56.3266μs 17.7536 KOps/s 19.5420 KOps/s $\textbf{\color{#d91a1a}-9.15\%}$
test_update_nested 0.1045ms 64.3558μs 15.5386 KOps/s 17.0307 KOps/s $\textbf{\color{#d91a1a}-8.76\%}$
test_update__nested 0.1871ms 63.1737μs 15.8294 KOps/s 15.8930 KOps/s $\color{#d91a1a}-0.40\%$
test_set_nested 97.5310μs 45.2791μs 22.0852 KOps/s 22.3337 KOps/s $\color{#d91a1a}-1.11\%$
test_set_nested_new 99.6510μs 49.6334μs 20.1477 KOps/s 20.9941 KOps/s $\color{#d91a1a}-4.03\%$
test_select 0.1033ms 62.5070μs 15.9982 KOps/s 16.4412 KOps/s $\color{#d91a1a}-2.69\%$
test_select_nested 78.6900μs 41.6091μs 24.0332 KOps/s 23.7177 KOps/s $\color{#35bf28}+1.33\%$
test_exclude_nested 0.5184ms 60.7567μs 16.4591 KOps/s 17.1078 KOps/s $\color{#d91a1a}-3.79\%$
test_empty[True] 0.2988ms 0.2594ms 3.8545 KOps/s 3.8590 KOps/s $\color{#d91a1a}-0.12\%$
test_empty[False] 3.6480μs 0.8437μs 1.1853 MOps/s 1.3587 MOps/s $\textbf{\color{#d91a1a}-12.76\%}$
test_to 56.7500μs 27.6397μs 36.1798 KOps/s 37.6510 KOps/s $\color{#d91a1a}-3.91\%$
test_to_nonblocking 56.8800μs 26.9051μs 37.1677 KOps/s 40.8404 KOps/s $\textbf{\color{#d91a1a}-8.99\%}$
test_unbind_speed 1.2464ms 0.2854ms 3.5039 KOps/s 3.5394 KOps/s $\color{#d91a1a}-1.00\%$
test_unbind_speed_stack0 0.3210ms 0.2799ms 3.5727 KOps/s 3.5468 KOps/s $\color{#35bf28}+0.73\%$
test_unbind_speed_stack1 92.4051ms 0.7214ms 1.3861 KOps/s 1.4032 KOps/s $\color{#d91a1a}-1.22\%$
test_split 93.6898ms 2.1896ms 456.7096 Ops/s 456.7079 Ops/s $+0.00\%$
test_chunk 95.7193ms 2.1716ms 460.4883 Ops/s 449.1022 Ops/s $\color{#35bf28}+2.54\%$
test_creation[device0] 0.3299ms 0.1283ms 7.7955 KOps/s 7.3847 KOps/s $\textbf{\color{#35bf28}+5.56\%}$
test_creation_from_tensor 0.3534ms 0.1297ms 7.7075 KOps/s 7.3874 KOps/s $\color{#35bf28}+4.33\%$
test_add_one[memmap_tensor0] 0.2947ms 9.1390μs 109.4206 KOps/s 112.3311 KOps/s $\color{#d91a1a}-2.59\%$
test_contiguous[memmap_tensor0] 29.9600μs 2.1996μs 454.6291 KOps/s 446.2509 KOps/s $\color{#35bf28}+1.88\%$
test_stack[memmap_tensor0] 34.7400μs 6.9111μs 144.6940 KOps/s 147.8753 KOps/s $\color{#d91a1a}-2.15\%$
test_memmaptd_index 1.1135ms 0.4357ms 2.2951 KOps/s 2.2887 KOps/s $\color{#35bf28}+0.28\%$
test_memmaptd_index_astensor 0.7584ms 0.5087ms 1.9656 KOps/s 1.9775 KOps/s $\color{#d91a1a}-0.60\%$
test_memmaptd_index_op 1.4869ms 1.0628ms 940.9472 Ops/s 957.4025 Ops/s $\color{#d91a1a}-1.72\%$
test_serialize_model 0.1320s 0.1302s 7.6818 Ops/s 7.6702 Ops/s $\color{#35bf28}+0.15\%$
test_serialize_model_pickle 1.3477s 1.2149s 0.8231 Ops/s 0.8217 Ops/s $\color{#35bf28}+0.17\%$
test_serialize_weights 0.1304s 0.1298s 7.7024 Ops/s 7.7090 Ops/s $\color{#d91a1a}-0.09\%$
test_serialize_weights_returnearly 0.2338s 63.8411ms 15.6639 Ops/s 21.7070 Ops/s $\textbf{\color{#d91a1a}-27.84\%}$
test_serialize_weights_pickle 1.3464s 1.1869s 0.8426 Ops/s 0.8212 Ops/s $\color{#35bf28}+2.60\%$
test_reshape_pytree 68.9510μs 35.9509μs 27.8157 KOps/s 27.2994 KOps/s $\color{#35bf28}+1.89\%$
test_reshape_td 64.7700μs 41.6206μs 24.0266 KOps/s 23.5485 KOps/s $\color{#35bf28}+2.03\%$
test_view_pytree 68.1410μs 35.4729μs 28.1905 KOps/s 27.7521 KOps/s $\color{#35bf28}+1.58\%$
test_view_td 87.3110μs 47.9377μs 20.8604 KOps/s 21.5910 KOps/s $\color{#d91a1a}-3.38\%$
test_unbind_pytree 62.5610μs 33.5538μs 29.8029 KOps/s 28.9633 KOps/s $\color{#35bf28}+2.90\%$
test_unbind_td 0.4897ms 43.8448μs 22.8077 KOps/s 22.6734 KOps/s $\color{#35bf28}+0.59\%$
test_split_pytree 73.4500μs 44.9214μs 22.2611 KOps/s 21.8305 KOps/s $\color{#35bf28}+1.97\%$
test_split_td 0.6885ms 55.7707μs 17.9306 KOps/s 17.8082 KOps/s $\color{#35bf28}+0.69\%$
test_add_pytree 0.1034ms 58.9430μs 16.9655 KOps/s 17.1917 KOps/s $\color{#d91a1a}-1.32\%$
test_add_td 0.1348ms 96.8157μs 10.3289 KOps/s 10.5965 KOps/s $\color{#d91a1a}-2.53\%$
test_compile_add_one_nested[tensordict-compile] 0.2103ms 0.1615ms 6.1908 KOps/s 6.1999 KOps/s $\color{#d91a1a}-0.15\%$
test_compile_add_one_nested[tensordict-eager] 0.3412ms 0.1615ms 6.1909 KOps/s 6.1510 KOps/s $\color{#35bf28}+0.65\%$
test_compile_add_one_nested[pytree-compile] 0.2078ms 0.1553ms 6.4382 KOps/s 6.4377 KOps/s $+0.01\%$
test_compile_add_one_nested[pytree-eager] 0.5916ms 0.1941ms 5.1510 KOps/s 5.3417 KOps/s $\color{#d91a1a}-3.57\%$
test_compile_copy_nested[tensordict-compile] 0.4134ms 21.9716μs 45.5132 KOps/s 46.5544 KOps/s $\color{#d91a1a}-2.24\%$
test_compile_copy_nested[tensordict-eager] 0.4353ms 49.5398μs 20.1858 KOps/s 20.7673 KOps/s $\color{#d91a1a}-2.80\%$
test_compile_copy_nested[pytree-compile] 0.4489ms 65.4155μs 15.2869 KOps/s 15.4423 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_copy_nested[pytree-eager] 0.4339ms 49.5775μs 20.1705 KOps/s 20.0369 KOps/s $\color{#35bf28}+0.67\%$
test_compile_add_one_flat[tensordict-compile] 0.3693ms 0.3211ms 3.1142 KOps/s 3.1267 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_add_one_flat[tensordict-eager] 0.6151ms 0.2325ms 4.3016 KOps/s 4.2608 KOps/s $\color{#35bf28}+0.96\%$
test_compile_add_one_flat[tensorclass-compile] 0.1834ms 0.1289ms 7.7551 KOps/s 7.7670 KOps/s $\color{#d91a1a}-0.15\%$
test_compile_add_one_flat[tensorclass-eager] 0.4560ms 66.6373μs 15.0066 KOps/s 14.8355 KOps/s $\color{#35bf28}+1.15\%$
test_compile_add_one_flat[pytree-compile] 0.7377ms 0.3302ms 3.0280 KOps/s 3.0441 KOps/s $\color{#d91a1a}-0.53\%$
test_compile_add_one_flat[pytree-eager] 1.0559ms 0.6605ms 1.5139 KOps/s 1.5582 KOps/s $\color{#d91a1a}-2.84\%$
test_compile_add_self_flat[tensordict-eager] 0.6944ms 0.2854ms 3.5037 KOps/s 3.5109 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_add_self_flat[tensordict-compile] 0.3986ms 0.3242ms 3.0846 KOps/s 3.1045 KOps/s $\color{#d91a1a}-0.64\%$
test_compile_add_self_flat[tensorclass-eager] 0.4973ms 78.1709μs 12.7925 KOps/s 12.6191 KOps/s $\color{#35bf28}+1.37\%$
test_compile_add_self_flat[tensorclass-compile] 0.1934ms 0.1318ms 7.5890 KOps/s 7.5102 KOps/s $\color{#35bf28}+1.05\%$
test_compile_add_self_flat[pytree-eager] 0.6651ms 0.5569ms 1.7956 KOps/s 1.8717 KOps/s $\color{#d91a1a}-4.06\%$
test_compile_add_self_flat[pytree-compile] 0.3799ms 0.3279ms 3.0501 KOps/s 3.0493 KOps/s $\color{#35bf28}+0.02\%$
test_compile_copy_flat[tensordict-compile] 52.2000μs 21.1580μs 47.2634 KOps/s 46.1194 KOps/s $\color{#35bf28}+2.48\%$
test_compile_copy_flat[tensordict-eager] 94.0310μs 38.8919μs 25.7123 KOps/s 26.0708 KOps/s $\color{#d91a1a}-1.38\%$
test_compile_copy_flat[pytree-compile] 0.1142ms 70.7920μs 14.1259 KOps/s 14.2819 KOps/s $\color{#d91a1a}-1.09\%$
test_compile_copy_flat[pytree-eager] 86.8700μs 52.0999μs 19.1939 KOps/s 19.3094 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_assign_and_add[tensordict-compile] 2.3703ms 0.8291ms 1.2061 KOps/s 1.1142 KOps/s $\textbf{\color{#35bf28}+8.25\%}$
test_compile_assign_and_add[tensordict-eager] 3.3489ms 3.2448ms 308.1824 Ops/s 301.8191 Ops/s $\color{#35bf28}+2.11\%$
test_compile_assign_and_add[pytree-compile] 2.3956ms 0.8404ms 1.1899 KOps/s 1.0914 KOps/s $\textbf{\color{#35bf28}+9.03\%}$
test_compile_assign_and_add[pytree-eager] 3.4542ms 3.3412ms 299.2960 Ops/s 302.1734 Ops/s $\color{#d91a1a}-0.95\%$
test_compile_indexing[tensor-tensordict-compile] 0.2623ms 0.1207ms 8.2878 KOps/s 8.4021 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_indexing[tensor-tensordict-eager] 0.1983ms 64.7590μs 15.4419 KOps/s 15.2296 KOps/s $\color{#35bf28}+1.39\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1604ms 0.1154ms 8.6663 KOps/s 8.3779 KOps/s $\color{#35bf28}+3.44\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1087ms 47.0703μs 21.2448 KOps/s 20.5564 KOps/s $\color{#35bf28}+3.35\%$
test_compile_indexing[tensor-pytree-compile] 0.1605ms 0.1189ms 8.4136 KOps/s 8.3278 KOps/s $\color{#35bf28}+1.03\%$
test_compile_indexing[tensor-pytree-eager] 91.9010μs 46.8301μs 21.3538 KOps/s 20.6768 KOps/s $\color{#35bf28}+3.27\%$
test_compile_indexing[slice-tensordict-compile] 0.1898ms 0.1482ms 6.7498 KOps/s 6.8547 KOps/s $\color{#d91a1a}-1.53\%$
test_compile_indexing[slice-tensordict-eager] 0.1505ms 24.2609μs 41.2186 KOps/s 39.7948 KOps/s $\color{#35bf28}+3.58\%$
test_compile_indexing[slice-tensorclass-compile] 0.1910ms 0.1436ms 6.9628 KOps/s 7.1399 KOps/s $\color{#d91a1a}-2.48\%$
test_compile_indexing[slice-tensorclass-eager] 60.0510μs 20.9214μs 47.7981 KOps/s 46.9555 KOps/s $\color{#35bf28}+1.79\%$
test_compile_indexing[slice-pytree-compile] 0.2016ms 0.1452ms 6.8862 KOps/s 7.0965 KOps/s $\color{#d91a1a}-2.96\%$
test_compile_indexing[slice-pytree-eager] 66.1210μs 20.6572μs 48.4094 KOps/s 47.6845 KOps/s $\color{#35bf28}+1.52\%$
test_compile_indexing[int-tensordict-compile] 0.2798ms 0.1506ms 6.6421 KOps/s 6.7805 KOps/s $\color{#d91a1a}-2.04\%$
test_compile_indexing[int-tensordict-eager] 0.5104ms 24.0576μs 41.5669 KOps/s 40.4454 KOps/s $\color{#35bf28}+2.77\%$
test_compile_indexing[int-tensorclass-compile] 0.1908ms 0.1458ms 6.8570 KOps/s 6.7937 KOps/s $\color{#35bf28}+0.93\%$
test_compile_indexing[int-tensorclass-eager] 52.3010μs 20.7712μs 48.1435 KOps/s 48.1111 KOps/s $\color{#35bf28}+0.07\%$
test_compile_indexing[int-pytree-compile] 0.1875ms 0.1410ms 7.0915 KOps/s 6.8702 KOps/s $\color{#35bf28}+3.22\%$
test_compile_indexing[int-pytree-eager] 54.1600μs 20.6073μs 48.5264 KOps/s 45.3173 KOps/s $\textbf{\color{#35bf28}+7.08\%}$
test_mod_add[eager] 75.4110μs 34.0144μs 29.3993 KOps/s 29.8975 KOps/s $\color{#d91a1a}-1.67\%$
test_mod_add[compile] 0.1300ms 83.9008μs 11.9188 KOps/s 12.3315 KOps/s $\color{#d91a1a}-3.35\%$
test_mod_add[compile-overhead] 0.3086ms 0.1541ms 6.4910 KOps/s 6.3617 KOps/s $\color{#35bf28}+2.03\%$
test_mod_wrap[eager] 0.3678ms 0.2459ms 4.0664 KOps/s 3.9926 KOps/s $\color{#35bf28}+1.85\%$
test_mod_wrap[compile] 1.4557ms 0.3086ms 3.2405 KOps/s 3.2780 KOps/s $\color{#d91a1a}-1.14\%$
test_mod_wrap[compile-overhead] 7.7718ms 4.0832ms 244.9030 Ops/s 244.4356 Ops/s $\color{#35bf28}+0.19\%$
test_mod_wrap_and_backward[eager] 1.7722ms 1.3872ms 720.8765 Ops/s 677.7776 Ops/s $\textbf{\color{#35bf28}+6.36\%}$
test_mod_wrap_and_backward[compile] 1.7679ms 1.3409ms 745.7887 Ops/s 687.8552 Ops/s $\textbf{\color{#35bf28}+8.42\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3393ms 0.9089ms 1.1003 KOps/s 986.1842 Ops/s $\textbf{\color{#35bf28}+11.57\%}$
test_seq_add[eager] 0.1702ms 0.1018ms 9.8186 KOps/s 10.0245 KOps/s $\color{#d91a1a}-2.05\%$
test_seq_add[compile] 0.1392ms 91.0130μs 10.9874 KOps/s 10.9788 KOps/s $\color{#35bf28}+0.08\%$
test_seq_add[compile-overhead] 0.1685ms 0.1245ms 8.0337 KOps/s 8.0065 KOps/s $\color{#35bf28}+0.34\%$
test_seq_wrap[eager] 0.4517ms 0.3832ms 2.6099 KOps/s 2.4080 KOps/s $\textbf{\color{#35bf28}+8.38\%}$
test_seq_wrap[compile] 0.3635ms 0.3150ms 3.1750 KOps/s 3.0712 KOps/s $\color{#35bf28}+3.38\%$
test_seq_wrap[compile-overhead] 0.2637ms 0.2195ms 4.5561 KOps/s 4.4608 KOps/s $\color{#35bf28}+2.14\%$
test_func_call_runtime[False-eager] 0.8499ms 0.7504ms 1.3326 KOps/s 1.3174 KOps/s $\color{#35bf28}+1.15\%$
test_func_call_runtime[False-compile] 0.9698ms 0.7881ms 1.2689 KOps/s 1.2396 KOps/s $\color{#35bf28}+2.36\%$
test_func_call_runtime[False-compile-overhead] 0.4178ms 0.3629ms 2.7554 KOps/s 2.7390 KOps/s $\color{#35bf28}+0.60\%$
test_func_call_runtime[True-eager] 0.9675ms 0.9067ms 1.1029 KOps/s 1.0766 KOps/s $\color{#35bf28}+2.45\%$
test_func_call_runtime[True-compile] 0.9460ms 0.8134ms 1.2294 KOps/s 1.1638 KOps/s $\textbf{\color{#35bf28}+5.64\%}$
test_func_call_runtime[True-compile-overhead] 0.4321ms 0.3847ms 2.5992 KOps/s 2.6032 KOps/s $\color{#d91a1a}-0.15\%$
test_func_call_cm_runtime[False-eager] 0.8175ms 0.7386ms 1.3538 KOps/s 1.3133 KOps/s $\color{#35bf28}+3.09\%$
test_func_call_cm_runtime[False-compile] 0.8833ms 0.7911ms 1.2641 KOps/s 1.2408 KOps/s $\color{#35bf28}+1.88\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4141ms 0.3637ms 2.7497 KOps/s 2.7413 KOps/s $\color{#35bf28}+0.31\%$
test_func_call_cm_runtime[True-eager] 1.1194ms 1.0165ms 983.7746 Ops/s 959.3121 Ops/s $\color{#35bf28}+2.55\%$
test_func_call_cm_runtime[True-compile] 0.8951ms 0.8382ms 1.1931 KOps/s 1.1559 KOps/s $\color{#35bf28}+3.22\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4557ms 0.4093ms 2.4435 KOps/s 2.4247 KOps/s $\color{#35bf28}+0.77\%$
test_vmap_func_call_cm_runtime[eager] 2.5795ms 2.1220ms 471.2546 Ops/s 463.3930 Ops/s $\color{#35bf28}+1.70\%$
test_vmap_func_call_cm_runtime[compile] 0.9134ms 0.8552ms 1.1693 KOps/s 1.1600 KOps/s $\color{#35bf28}+0.80\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4682ms 0.4137ms 2.4174 KOps/s 2.4096 KOps/s $\color{#35bf28}+0.32\%$
test_distributed 0.8823ms 0.1581ms 6.3264 KOps/s 8.8268 KOps/s $\textbf{\color{#d91a1a}-28.33\%}$
test_tdmodule 35.0110μs 14.8771μs 67.2174 KOps/s 64.6674 KOps/s $\color{#35bf28}+3.94\%$
test_tdmodule_dispatch 49.6900μs 29.4736μs 33.9287 KOps/s 32.4082 KOps/s $\color{#35bf28}+4.69\%$
test_tdseq 37.7400μs 16.2875μs 61.3967 KOps/s 62.7185 KOps/s $\color{#d91a1a}-2.11\%$
test_tdseq_dispatch 55.1910μs 32.6905μs 30.5900 KOps/s 31.2895 KOps/s $\color{#d91a1a}-2.24\%$
test_instantiation_functorch 2.0362ms 1.8941ms 527.9653 Ops/s 522.6955 Ops/s $\color{#35bf28}+1.01\%$
test_exec_functorch 0.2582ms 0.2120ms 4.7170 KOps/s 4.6857 KOps/s $\color{#35bf28}+0.67\%$
test_exec_functional_call 0.2894ms 0.2075ms 4.8193 KOps/s 4.6728 KOps/s $\color{#35bf28}+3.14\%$
test_exec_td_decorator 0.4476ms 0.2629ms 3.8039 KOps/s 3.7677 KOps/s $\color{#35bf28}+0.96\%$
test_vmap_mlp_speed_decorator[True-True] 0.8220ms 0.6920ms 1.4451 KOps/s 1.4228 KOps/s $\color{#35bf28}+1.57\%$
test_vmap_mlp_speed_decorator[True-False] 0.8375ms 0.6951ms 1.4386 KOps/s 1.3882 KOps/s $\color{#35bf28}+3.63\%$
test_vmap_mlp_speed_decorator[False-True] 0.7909ms 0.6287ms 1.5906 KOps/s 1.5921 KOps/s $\color{#d91a1a}-0.10\%$
test_vmap_mlp_speed_decorator[False-False] 0.7496ms 0.6189ms 1.6157 KOps/s 1.6178 KOps/s $\color{#d91a1a}-0.13\%$
test_vmap_transformer_speed_decorator[True-True] 20.6934ms 19.9301ms 50.1752 Ops/s 49.9762 Ops/s $\color{#35bf28}+0.40\%$
test_vmap_transformer_speed_decorator[True-False] 20.1204ms 19.9235ms 50.1919 Ops/s 49.8920 Ops/s $\color{#35bf28}+0.60\%$
test_vmap_transformer_speed_decorator[False-True] 20.6135ms 19.8065ms 50.4884 Ops/s 50.2952 Ops/s $\color{#35bf28}+0.38\%$
test_vmap_transformer_speed_decorator[False-False] 20.1576ms 19.8101ms 50.4794 Ops/s 50.2582 Ops/s $\color{#35bf28}+0.44\%$
test_to_module_speed[True] 1.3865ms 0.9945ms 1.0055 KOps/s 981.5413 Ops/s $\color{#35bf28}+2.44\%$
test_to_module_speed[False] 1.4250ms 0.9870ms 1.0131 KOps/s 1.0040 KOps/s $\color{#35bf28}+0.91\%$
test_tc_init 71.3310μs 35.4989μs 28.1699 KOps/s 28.5094 KOps/s $\color{#d91a1a}-1.19\%$
test_tc_init_nested 0.1119ms 70.3091μs 14.2229 KOps/s 14.4859 KOps/s $\color{#d91a1a}-1.82\%$
test_tc_first_layer_tensor 5.6757μs 0.6759μs 1.4796 MOps/s 1.4945 MOps/s $\color{#d91a1a}-1.00\%$
test_tc_first_layer_nontensor 31.8300μs 2.2672μs 441.0716 KOps/s 451.2286 KOps/s $\color{#d91a1a}-2.25\%$
test_tc_second_layer_tensor 7.1275μs 1.3699μs 729.9750 KOps/s 723.9189 KOps/s $\color{#35bf28}+0.84\%$
test_tc_second_layer_nontensor 27.9900μs 2.9766μs 335.9504 KOps/s 337.3669 KOps/s $\color{#d91a1a}-0.42\%$
test_unbind 0.1913s 9.6159ms 103.9942 Ops/s 92.4687 Ops/s $\textbf{\color{#35bf28}+12.46\%}$
test_full_like 0.6570ms 0.5744ms 1.7410 KOps/s 1.7429 KOps/s $\color{#d91a1a}-0.11\%$
test_zeros_like 0.2611ms 0.1980ms 5.0511 KOps/s 5.0481 KOps/s $\color{#35bf28}+0.06\%$
test_ones_like 0.2421ms 0.1977ms 5.0569 KOps/s 5.0530 KOps/s $\color{#35bf28}+0.08\%$
test_clone 0.4489ms 0.4148ms 2.4106 KOps/s 2.4096 KOps/s $\color{#35bf28}+0.04\%$
test_squeeze 39.7200μs 9.7707μs 102.3469 KOps/s 99.7941 KOps/s $\color{#35bf28}+2.56\%$
test_unsqueeze 0.2202ms 73.9952μs 13.5144 KOps/s 13.3027 KOps/s $\color{#35bf28}+1.59\%$
test_split 0.3929ms 0.1578ms 6.3364 KOps/s 6.2583 KOps/s $\color{#35bf28}+1.25\%$
test_permute 0.2327ms 0.1875ms 5.3331 KOps/s 5.6208 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_stack 1.2855ms 0.8591ms 1.1640 KOps/s 1.1712 KOps/s $\color{#d91a1a}-0.62\%$
test_cat 1.2556ms 1.2313ms 812.1371 Ops/s 811.9746 Ops/s $\color{#35bf28}+0.02\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 11, 2024
ghstack-source-id: 672dc8b82c0b025feab98d061b5241536fa040c0
Pull Request resolved: #1037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants