diff --git a/README.md b/README.md index 4bb715ae9..52a39c02c 100644 --- a/README.md +++ b/README.md @@ -22,15 +22,15 @@ The general purpose micro-orchestration framework for creating [dataflows](https://en.wikipedia.org/wiki/Dataflow) from python functions! That is, your single tool to express things like data, ML, LLM pipelines/workflows, and even web request logic! -Hamilton is a novel paradigm for specifying a flow of delayed execution in python. It works on python objects of any type and dataflows of any complexity. Core to the design of Hamilton is a clear mapping of function name to artifact, allowing you to quickly grok the relationship between the code you write and the data you produce. +Hamilton is a novel paradigm for specifying a flow of delayed execution in python. It works on python objects of any type and dataflows of any complexity. Core to the design of Hamilton is a clear mapping of function name to artifact, allowing you to quickly grok the relationship between the code you write and the data you produce. This paradigm makes modifications easy to build and track, ensures code is self-documenting, and makes it natural to unit test your data transformations. When connected together, these functions form a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAG), which the Hamilton framework can execute, optimize, and report on. ## Problems Hamilton Solves -✅ Model a dataflow -- If you can model your problem as a DAG in python, Hamilton is the cleanest way to build it. +✅ Model a dataflow -- If you can model your problem as a DAG in python, Hamilton is the cleanest way to build it. ✅ Unmaintainable spaghetti code -- Hamilton dataflows are unit testable, self-documenting, and provide lineage. ✅ Long iteration/experimentation cycles -- Hamilton provides a clear, quick, and methodical path to debugging/modifying/extending your code. -✅ Reusing code across contexts -- Hamilton encourages code that is independent of infrastructure and can run regardless of execution setting. +✅ Reusing code across contexts -- Hamilton encourages code that is independent of infrastructure and can run regardless of execution setting. ## Problems Hamilton Does not Solve ❌ Provisioning infrastructure -- you want a macro-orchestration system (see airflow, kubeflow, sagemaker, etc...). diff --git a/contrib/docs/docs/Intro.md b/contrib/docs/docs/README.md similarity index 93% rename from contrib/docs/docs/Intro.md rename to contrib/docs/docs/README.md index 87ff16949..342d83a0d 100644 --- a/contrib/docs/docs/Intro.md +++ b/contrib/docs/docs/README.md @@ -5,7 +5,7 @@ sidebar_position: 1 # Hamilton Dataflows -Welcome! +

Welcome!

Here you'll find a website that curates a collection of Hamilton Dataflows that are ready to be used in your own projects. They are user-contributed and maintained, with @@ -15,6 +15,10 @@ We expect this collection to grow over time, so check back often! As dataflows b will move them into the official sub-package of this site and become maintained by the Hamilton team. +## Navigation +👈 On the left hand you'll have the ability to find user and official dataflows. +COMING SOON: search & filtering by tags. + ## Usage There are two methods to get access to dataflows presented here. @@ -23,8 +27,8 @@ Assumptions: 1. You are familiar with Hamilton and have it installed. If not, take [15 minutes to learn Hamilton in your browser](https://www.tryhamilton.dev/) and then `pip install sf-hamilton` to get started. Come back here when you're ready to use Hamilton. -2. The assumption is that you have the requisite python dependencies installed on your system. -You'll get import errors if you don't. Don't know what you need? We have convenience functions to help! +2. You have the requisite python dependencies installed on your system. +You'll get import errors if you don't. Don't know what you need? Scroll to the bottom of a dataflow to find the requirements. We're working on convenience functions to help! For more extensive documentation, please see [Hamilton User Contrib documentation](https://hamilton.dagworks.io). diff --git a/contrib/docs/sidebars.js b/contrib/docs/sidebars.js index 4d1f909e1..ec5673d4e 100644 --- a/contrib/docs/sidebars.js +++ b/contrib/docs/sidebars.js @@ -14,7 +14,14 @@ /** @type {import('@docusaurus/plugin-content-docs').SidebarsConfig} */ const sidebars = { // By default, Docusaurus generates a sidebar from the docs folder structure - dataflowSidebar: [{type: 'autogenerated', dirName: '.'}], + dataflowSidebar: [ + { + type: 'html', + value: 'Dataflows:', + className: 'sidebar-title', + }, + {type: 'autogenerated', dirName: '.'} + ], // But you can create a sidebar manually /* diff --git a/docs/how-tos/use-for-feature-engineering.rst b/docs/how-tos/use-for-feature-engineering.rst index eff88798c..9fceadf2c 100644 --- a/docs/how-tos/use-for-feature-engineering.rst +++ b/docs/how-tos/use-for-feature-engineering.rst @@ -19,7 +19,6 @@ reading the Offline Feature Engineering section first, since it's the most commo python module structure you should be going for with Hamilton. If you need more guidance here, please reach out to us on `slack `__. - Offline Feature Engineering --------------------------- To use Hamilton for offline feature engineering, a common pattern is: @@ -53,7 +52,7 @@ Here is a sketch of the above pattern: Hamilton Example -__________________ +^^^^^^^^^^^^^^^^ We do not provide a specific example here, since most of the examples in the examples folder fall under this category. Some examples to browse: @@ -63,7 +62,7 @@ Some examples to browse: runtime data quality checks into your feature engineering pipeline. * `Time-series Kaggle Example `__ shows one way to structure your code to ingest, create features, and fit a model. -* `Feature engineering in multiple contexts `__ +* `Feature engineering in multiple contexts `__ helps show how you can use Hamilton in multiple contexts reusing code where possible, e.g. offline, & online. * `PySpark UDF Map Examples `__ shows how to use Hamilton to encode map operations for use with PySpark. @@ -97,7 +96,7 @@ Here's a sketch of how you might use Hamilton in conjunction with a Kafka Client Hamilton Example -__________________ +^^^^^^^^^^^^^^^^ Currently we don't have a streaming example. But we are working on it. We direct users to look at the online example for now, since conceptually from a modularity stand point, things would be set up in a similar way. @@ -121,17 +120,22 @@ the `@config.*` decorator, to help you segment your feature computation dataflow We skip showing a sketch of structure here, and invite you to look at the examples below. Hamilton Example -__________________ -We direct users to look at `Feature engineering in multiple contexts `__ +^^^^^^^^^^^^^^^^ +We direct users to look at `Feature engineering in multiple contexts `__ that currently describes two scenarios around how you could incorporate Hamilton into an online web-service, and have it aligned with your batch offline processes. Note, these examples should give you the high level first principles view of how to do things. Since having something running in production , we didn't want to get too specific. +Write once, run anywhere blog post: +----------------------------------- +For a comprehensive post on writing a feature once and using it anywhere see `this blog `__. +The companion example code can be found `here `__. + FAQ ---- Q. Can I use Hamilton for feature engineering with Feast? -__________________________________________________________ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, you can use Hamilton with Feast. See our [Feast example](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/feast) and accompanying [blog post](https://blog.dagworks.io/p/featurization-integrating-hamilton). Typically people use Hamilton on the offline side to compute features that then get pushed to Feast. For the online side it varies as to how to integrate the two.