Alpaca prompt template #515

RdoubleA · 2024-03-18T18:53:16Z

Context

We need to have a standardized set of prompt templates for our flagship datasets and to enable users to configure their own custom dataset, as discussed in the RFC (#493).

First, we create the PromptTemplate interface which all templates will be based on. AlpacaTemplate is added to demonstrate the interface and the AlpacaDataset is refactored to use this.

Test plan

pytest tests/torchtune/datasets/test_alpaca_dataset.py
pytest tests/torchtune/data/test_templates.py

pytorch-bot · 2024-03-18T18:53:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/515

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a8be21b with merge base e164402 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

netlify · 2024-03-18T18:53:35Z

✅ Deploy Preview for torchtune-preview ready!

Name	Link
🔨 Latest commit	`a8be21b`
🔍 Latest deploy log	https://app.netlify.com/sites/torchtune-preview/deploys/65f8b882a010500008fed3cf
😎 Deploy Preview	https://deploy-preview-515--torchtune-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

joecummings · 2024-03-18T19:14:28Z

torchtune/data/templates.py

@@ -0,0 +1,87 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


This should have a leading underscore and an init.py file that exposes AlpacaPromptTemplate

oops, forgot to add that

kartikayk · 2024-03-18T20:37:26Z

tests/torchtune/data/test_templates.py

+
+
+class TestAlpacaInstructTemplate:
+ def test_format(self):


I actually configured the original test to include real data from the alpaca test. Can you do the same? It helps to verify that the test passes on real data.

In fact why not just refactor data from that test directly?

kartikayk · 2024-03-18T20:38:23Z

torchtune/data/_templates.py

+
+ Args:
+ sample (Mapping): a single data sample with instruction
+ column_map (Optional[Dict[str, str]]): a mapping from the expected


column_map seems like a nice generalization, but I don't quite understand the use case. Can you expand on this?

see all the prompting strategies inheriting from InstructionPromptTokenizingStrategy here in Axolotl: https:/OpenAccess-AI-Collective/axolotl/blob/2ea70ebbd8f1d8d46e692afd05773dcf06626601/src/axolotl/prompt_tokenizers.py#L148

column map serves the purpose of making sure the right columns are used for instruction and input. For samsum and grammar datasets, we will need to use this because the datasets on the hub will not be using "instruction" or "input" as their column names since they are specific types of instruct tasks

SLR722 · 2024-03-18T20:57:16Z

Shall we move dataset folder under data folder?

RdoubleA · 2024-03-18T21:00:56Z

Shall we move dataset folder under data folder?

Saving this for a later PR, as that will require considerable refactoring

kartikayk

Thanks for the quick change!

add template, refactor alpaca, add test

1824f0c

RdoubleA requested review from gokulavasan, msaroufim, joecummings, ebsmothers, SLR722 and kartikayk March 18, 2024 18:53

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2024

joecummings reviewed Mar 18, 2024

View reviewed changes

RdoubleA added 2 commits March 18, 2024 12:35

add init.py

15e41a3

add underscore to file

43f5127

kartikayk reviewed Mar 18, 2024

View reviewed changes

include real data in tests

67d08f0

SLR722 approved these changes Mar 18, 2024

View reviewed changes

kartikayk approved these changes Mar 18, 2024

View reviewed changes

clearer type annotation

a8be21b

RdoubleA merged commit e145b16 into main Mar 18, 2024
21 checks passed

RdoubleA deleted the rafiayub/template_interface branch March 18, 2024 22:05

joecummings mentioned this pull request Mar 18, 2024

AC does not work w/ LoRA finetunes #521

Closed

SLR722 mentioned this pull request Mar 19, 2024

Add 2 other instruction dataset templates #523

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alpaca prompt template #515

Alpaca prompt template #515

RdoubleA commented Mar 18, 2024

pytorch-bot bot commented Mar 18, 2024 •

edited

Loading

netlify bot commented Mar 18, 2024 •

edited

Loading

joecummings Mar 18, 2024

RdoubleA Mar 18, 2024

kartikayk Mar 18, 2024

kartikayk Mar 18, 2024

kartikayk Mar 18, 2024

RdoubleA Mar 18, 2024

SLR722 commented Mar 18, 2024

RdoubleA commented Mar 18, 2024

kartikayk left a comment

		@@ -0,0 +1,87 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.

Alpaca prompt template #515

Alpaca prompt template #515

Conversation

RdoubleA commented Mar 18, 2024

Context

Test plan

pytorch-bot bot commented Mar 18, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/515

✅ No Failures

netlify bot commented Mar 18, 2024 • edited Loading

✅ Deploy Preview for torchtune-preview ready!

joecummings Mar 18, 2024

Choose a reason for hiding this comment

RdoubleA Mar 18, 2024

Choose a reason for hiding this comment

kartikayk Mar 18, 2024

Choose a reason for hiding this comment

kartikayk Mar 18, 2024

Choose a reason for hiding this comment

kartikayk Mar 18, 2024

Choose a reason for hiding this comment

RdoubleA Mar 18, 2024

Choose a reason for hiding this comment

SLR722 commented Mar 18, 2024

RdoubleA commented Mar 18, 2024

kartikayk left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Mar 18, 2024 •

edited

Loading

netlify bot commented Mar 18, 2024 •

edited

Loading