Update summary.md #125

qihqi · 2024-06-13T17:44:16Z

No description provided.

wang2yn84

Any reason why we don't run int8 on v5e-1?

FanhaiLu1 · 2024-06-13T18:08:09Z

benchmarks/summary.md

@@ -22,6 +22,8 @@ Date | Device | dtype | batch size | cache length |max input length |max output
 ----| ------- | ------ |---------- | -------------|-----------------|------------------|----------------------
 2024-05-14 | TPU v5e-8 | bfloat16 | 512 | 2048 | 1024 | 1024 | 8700
 2024-05-14 | TPU v5e-8 | int8 | 1024 | 2048 | 1024 | 1024 | 8746
+2024-06-13 | TPU v5e-1 | bfloat16 | 1024 | 2048 | 1024 | 1024 | 4249


The v5e-1 number looks great! The gap is huge, I'm wondering the v5-8 data parallel didn't compute each token locally.

+1. Since we are sharding on batch for Gemma 2B, even at naive case (duplicate weights) v5e-8 should have 8 * 4249 = 33k toks/sec

The prefills are replicated (8 chips computing the same thing) because we are sharding on batch and prefill is of batch=1. There might be other issues. Overall we should recommand v5e-1 for gemma-2b instead.

qihqi · 2024-06-17T17:22:12Z

Any reason why we don't run int8 on v5e-1?

Yeah, we should, we havnet gotten to it yet.

Update summary.md

178551b

qihqi requested review from wang2yn84 and FanhaiLu1 June 13, 2024 17:44

wang2yn84 reviewed Jun 13, 2024

View reviewed changes

FanhaiLu1 approved these changes Jun 13, 2024

View reviewed changes

qihqi merged commit 7526a90 into main Jun 17, 2024
4 checks passed

qihqi deleted the qihqi-patch-1 branch June 17, 2024 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update summary.md #125

Update summary.md #125

qihqi commented Jun 13, 2024

wang2yn84 left a comment

FanhaiLu1 Jun 13, 2024

wang2yn84 Jun 13, 2024

qihqi Jun 17, 2024

qihqi commented Jun 17, 2024

Update summary.md #125

Update summary.md #125

Conversation

qihqi commented Jun 13, 2024

wang2yn84 left a comment

Choose a reason for hiding this comment

FanhaiLu1 Jun 13, 2024

Choose a reason for hiding this comment

wang2yn84 Jun 13, 2024

Choose a reason for hiding this comment

qihqi Jun 17, 2024

Choose a reason for hiding this comment

qihqi commented Jun 17, 2024