-
Notifications
You must be signed in to change notification settings - Fork 957
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Passing gpt_variant to model conversion
build
triaged
Issue has been triaged by maintainers
#2352
opened Oct 18, 2024 by
tonylek
Loading…
README.md: Add 3rd Party Inference Speed Dashboard
documentation
Improvements or additions to documentation
triaged
Issue has been triaged by maintainers
#2244
opened Sep 22, 2024 by
matichon-vultureprime
Loading…
Modify small-batched weight only quantization
quantization
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2213
opened Sep 10, 2024 by
dasistwo
Loading…
[examples/bert/build.py]: Load weights for BertModel and RobertaModel if Issue has been triaged by maintainers
--model_dir
is provided
triaged
#2187
opened Sep 3, 2024 by
tkhanipov
Loading…
Add workaround instruction for a known issue of v0.11 on Windows
Merged
#2146
opened Aug 23, 2024 by
pamelap-nvidia
Loading…
fix wrong buffer for
oneShotAllReduceKernel
under PUSH_MODE
#2099
opened Aug 8, 2024 by
YconquestY
Loading…
decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…
#2085
opened Aug 5, 2024 by
lishicheng1996
Loading…
fix wrong arg in Engine Building Command in docs/source/performance/perf-overview.md
documentation
Improvements or additions to documentation
#2057
opened Jul 30, 2024 by
RuibaiXu
Loading…
Fix default min length
triaged
Issue has been triaged by maintainers
#1935
opened Jul 11, 2024 by
akhoroshev
Loading…
Bump transformers from 4.36.2 to 4.38.0 in /examples/multimodal
bug
Something isn't working
dependencies
Pull requests that update a dependency file
triaged
Issue has been triaged by maintainers
waiting for feedback
#1689
opened May 28, 2024 by
dependabot
bot
Loading…
add cached generation buffer
triaged
Issue has been triaged by maintainers
waiting for feedback
#1685
opened May 28, 2024 by
michael200892458
Loading…
Fix CUDA OOM when creating Mixtral checkpoint
triaged
Issue has been triaged by maintainers
waiting for feedback
#1629
opened May 19, 2024 by
VivekBits2210
Loading…
[feat]: Support weight only gemm with 2bit
triaged
Issue has been triaged by maintainers
waiting for feedback
#1568
opened May 9, 2024 by
gavinchen430
Loading…
Support SDXL and its distributed inference
waiting for feedback
#1514
opened Apr 28, 2024 by
Zars19
Loading…
fix: correct cudaSetDevice error when GPUs per node are fewer than their ranks in inter-node inference
#1495
opened Apr 24, 2024 by
littlefatfat
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.