Google JetStream Versions Save

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

v0.2.1

1 month ago

Key Changes

Support Llama3 tokenizer
JetStream Tokenizer refactor
Disaggregation preparation work

What's Changed

add sample_idx in InputRequest for debugging by @morgandu in https://github.com/google/JetStream/pull/32
Update README.md with user guides by @JoeZijunZhou in https://github.com/google/JetStream/pull/34
Update README.md with PT user guide by @JoeZijunZhou in https://github.com/google/JetStream/pull/35
Reorganize unit tests and update CICD by @JoeZijunZhou in https://github.com/google/JetStream/pull/37
Add badges for JetStream by @JoeZijunZhou in https://github.com/google/JetStream/pull/38
Bump idna from 3.6 to 3.7 by @dependabot in https://github.com/google/JetStream/pull/39
Reformat benchmark metrics by @yeandy in https://github.com/google/JetStream/pull/42
Update server host default value by @JoeZijunZhou in https://github.com/google/JetStream/pull/43
Refactor readme by @FanhaiLu1 in https://github.com/google/JetStream/pull/41
Add missing Documentation by @FanhaiLu1 in https://github.com/google/JetStream/pull/47
Update README.md to fix broken link by @charbull in https://github.com/google/JetStream/pull/50
Add np padded token support by @FanhaiLu1 in https://github.com/google/JetStream/pull/49
Format token utils and test by @FanhaiLu1 in https://github.com/google/JetStream/pull/51
Align Tokenizer in JetStream by @JoeZijunZhou in https://github.com/google/JetStream/pull/40
Do nothing for nd array in copy_to_host_async by @FanhaiLu1 in https://github.com/google/JetStream/pull/52
Add jax_padding support driver and server lib by @FanhaiLu1 in https://github.com/google/JetStream/pull/54
Update maxtext user guide by @JoeZijunZhou in https://github.com/google/JetStream/pull/56
Fix benchmark script type issue by @JoeZijunZhou in https://github.com/google/JetStream/pull/59
Fix requester flag default value by @JoeZijunZhou in https://github.com/google/JetStream/pull/60
Fix float division by zero in benchmark by @FanhaiLu1 in https://github.com/google/JetStream/pull/62
Register IFRT proxy backend when proxy is defined in the jax_platforms by @zhihaoshan-google in https://github.com/google/JetStream/pull/63
Add an abstract class for Tokenizer by @bhavya01 in https://github.com/google/JetStream/pull/53
refactor slice_to_num_chips to adapt to Cloud config by @zhihaoshan-google in https://github.com/google/JetStream/pull/65
Support llama3 tokenizer by @bhavya01 in https://github.com/google/JetStream/pull/67
Prerequisite work for supporting disaggregation: by @zhihaoshan-google in https://github.com/google/JetStream/pull/68
Create init.py in Jetstream/third_party by @bhavya01 in https://github.com/google/JetStream/pull/69
Add tokenize_and_pad function to backward compatible by @FanhaiLu1 in https://github.com/google/JetStream/pull/70
Release v0.2.1 by @JoeZijunZhou in https://github.com/google/JetStream/pull/72
Bump tqdm from 4.66.1 to 4.66.3 in the pip group across 1 directory by @dependabot in https://github.com/google/JetStream/pull/73
Release v0.2.1 with docs update by @JoeZijunZhou in https://github.com/google/JetStream/pull/74

New Contributors

@dependabot made their first contribution in https://github.com/google/JetStream/pull/39
@yeandy made their first contribution in https://github.com/google/JetStream/pull/42
@charbull made their first contribution in https://github.com/google/JetStream/pull/50
@zhihaoshan-google made their first contribution in https://github.com/google/JetStream/pull/63
@bhavya01 made their first contribution in https://github.com/google/JetStream/pull/53

Full Changelog: https://github.com/google/JetStream/compare/v0.2.0...v0.2.1

v0.2.0

2 months ago

Major Changes

Support JetStream MaxText inference on Cloud TPU VM
Support JetStream Pytorch inference on Cloud TPU VM
Support Continuous Batching with interleaved mode in JetStream
Support online serving benchmarking

What's Changed

Add unit tests CI github action by @JoeZijunZhou in https://github.com/google/JetStream/pull/1
Refine thread in orchestrator by @JoeZijunZhou in https://github.com/google/JetStream/pull/2
Optimize maximum threads to saturate decoding capacity by @JoeZijunZhou in https://github.com/google/JetStream/pull/3
Add benchmarks maximum threads config by @JoeZijunZhou in https://github.com/google/JetStream/pull/4
First support necessary for MaxText by @rwitten in https://github.com/google/JetStream/pull/5
Support gracefully stopping orchestrator and server by @JoeZijunZhou in https://github.com/google/JetStream/pull/6
Save request outputs and add eval accuracy support by @FanhaiLu1 in https://github.com/google/JetStream/pull/8
Use parameter based num as inference request max output length by @FanhaiLu1 in https://github.com/google/JetStream/pull/10
Fix output token drop issue by @JoeZijunZhou in https://github.com/google/JetStream/pull/9
Add option to warm up by @qihqi in https://github.com/google/JetStream/pull/11
Replace token_list with generated_text in saved outputs by @FanhaiLu1 in https://github.com/google/JetStream/pull/12
Refine requester util by @JoeZijunZhou in https://github.com/google/JetStream/pull/15
Adds filtering for sharegpt based on conversation starter. by @patemotter in https://github.com/google/JetStream/pull/17
Allows more requests than available data. by @patemotter in https://github.com/google/JetStream/pull/19
Fix starvation with async server and interleaving optimization by @JoeZijunZhou in https://github.com/google/JetStream/pull/13
Add Token util unit test by @FanhaiLu1 in https://github.com/google/JetStream/pull/20
Fix llama2 decode bug in tokenizer by @FanhaiLu1 in https://github.com/google/JetStream/pull/22
Fix whitespace replacement bug by @FanhaiLu1 in https://github.com/google/JetStream/pull/24
Update benchmark to run openorca dataset by @morgandu in https://github.com/google/JetStream/pull/21
Add model ckpt conversion and AQT scripts for JetStream MaxText Serving by @JoeZijunZhou in https://github.com/google/JetStream/pull/23
Refactor to sample before tokenize by @morgandu in https://github.com/google/JetStream/pull/26
Update ckpt conversion scripts by @JoeZijunZhou in https://github.com/google/JetStream/pull/25
move tokenizer model to third party llama2 by @FanhaiLu1 in https://github.com/google/JetStream/pull/27
Support JetStream MaxText user guide by @JoeZijunZhou in https://github.com/google/JetStream/pull/28
Enable pylint linter and pyink formatter by @JoeZijunZhou in https://github.com/google/JetStream/pull/29
Update README by @JoeZijunZhou in https://github.com/google/JetStream/pull/30
Release v0.2.0 by @JoeZijunZhou in https://github.com/google/JetStream/pull/31

New Contributors

@JoeZijunZhou made their first contribution in https://github.com/google/JetStream/pull/1
@rwitten made their first contribution in https://github.com/google/JetStream/pull/5
@FanhaiLu1 made their first contribution in https://github.com/google/JetStream/pull/8
@qihqi made their first contribution in https://github.com/google/JetStream/pull/11
@patemotter made their first contribution in https://github.com/google/JetStream/pull/17
@morgandu made their first contribution in https://github.com/google/JetStream/pull/21

Full Changelog: https://github.com/google/JetStream/commits/v0.2.0