Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs fixes #396

Merged
merged 3 commits into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@

The general purpose micro-orchestration framework for creating [dataflows](https://en.wikipedia.org/wiki/Dataflow) from python functions! That is, your single tool to express things like data, ML, LLM pipelines/workflows, and even web request logic!

Hamilton is a novel paradigm for specifying a flow of delayed execution in python. It works on python objects of any type and dataflows of any complexity. Core to the design of Hamilton is a clear mapping of function name to artifact, allowing you to quickly grok the relationship between the code you write and the data you produce.
Hamilton is a novel paradigm for specifying a flow of delayed execution in python. It works on python objects of any type and dataflows of any complexity. Core to the design of Hamilton is a clear mapping of function name to artifact, allowing you to quickly grok the relationship between the code you write and the data you produce.

This paradigm makes modifications easy to build and track, ensures code is self-documenting, and makes it natural to unit test your data transformations. When connected together, these functions form a [Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAG), which the Hamilton framework can execute, optimize, and report on.

## Problems Hamilton Solves
✅ Model a dataflow -- If you can model your problem as a DAG in python, Hamilton is the cleanest way to build it.
✅ Model a dataflow -- If you can model your problem as a DAG in python, Hamilton is the cleanest way to build it.
✅ Unmaintainable spaghetti code -- Hamilton dataflows are unit testable, self-documenting, and provide lineage.
✅ Long iteration/experimentation cycles -- Hamilton provides a clear, quick, and methodical path to debugging/modifying/extending your code.
✅ Reusing code across contexts -- Hamilton encourages code that is independent of infrastructure and can run regardless of execution setting.
✅ Reusing code across contexts -- Hamilton encourages code that is independent of infrastructure and can run regardless of execution setting.

## Problems Hamilton Does not Solve
❌ Provisioning infrastructure -- you want a macro-orchestration system (see airflow, kubeflow, sagemaker, etc...).
Expand Down
10 changes: 7 additions & 3 deletions contrib/docs/docs/Intro.md → contrib/docs/docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sidebar_position: 1

# Hamilton Dataflows

Welcome!
<h3> Welcome!</h3>

Here you'll find a website that curates a collection of Hamilton Dataflows that are
ready to be used in your own projects. They are user-contributed and maintained, with
Expand All @@ -15,6 +15,10 @@ We expect this collection to grow over time, so check back often! As dataflows b
will move them into the official sub-package of this site and become maintained by the
Hamilton team.

## Navigation
👈 On the left hand you'll have the ability to find user and official dataflows.
COMING SOON: search & filtering by tags.

## Usage
There are two methods to get access to dataflows presented here.

Expand All @@ -23,8 +27,8 @@ Assumptions:
1. You are familiar with Hamilton and have it installed. If not, take
[15 minutes to learn Hamilton in your browser](https://www.tryhamilton.dev/) and then `pip install sf-hamilton` to get started.
Come back here when you're ready to use Hamilton.
2. The assumption is that you have the requisite python dependencies installed on your system.
You'll get import errors if you don't. Don't know what you need? We have convenience functions to help!
2. You have the requisite python dependencies installed on your system.
You'll get import errors if you don't. Don't know what you need? Scroll to the bottom of a dataflow to find the requirements. We're working on convenience functions to help!

For more extensive documentation, please see [Hamilton User Contrib documentation](https://hamilton.dagworks.io).

Expand Down
9 changes: 8 additions & 1 deletion contrib/docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@
/** @type {import('@docusaurus/plugin-content-docs').SidebarsConfig} */
const sidebars = {
// By default, Docusaurus generates a sidebar from the docs folder structure
dataflowSidebar: [{type: 'autogenerated', dirName: '.'}],
dataflowSidebar: [
{
type: 'html',
value: 'Dataflows:',
className: 'sidebar-title',
},
{type: 'autogenerated', dirName: '.'}
],

// But you can create a sidebar manually
/*
Expand Down
18 changes: 11 additions & 7 deletions docs/how-tos/use-for-feature-engineering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ reading the Offline Feature Engineering section first, since it's the most commo
python module structure you should be going for with Hamilton. If you need more guidance here, please reach out to us on
`slack <https://join.slack.com/t/hamilton-opensource/shared_invite/zt-1bjs72asx-wcUTgH7q7QX1igiQ5bbdcg>`__.


Offline Feature Engineering
---------------------------
To use Hamilton for offline feature engineering, a common pattern is:
Expand Down Expand Up @@ -53,7 +52,7 @@ Here is a sketch of the above pattern:


Hamilton Example
__________________
^^^^^^^^^^^^^^^^
We do not provide a specific example here, since most of the examples in the examples folder fall under this category.
Some examples to browse:

Expand All @@ -63,7 +62,7 @@ Some examples to browse:
runtime data quality checks into your feature engineering pipeline.
* `Time-series Kaggle Example <https:/DAGWorks-Inc/hamilton/tree/main/examples/model_examples/time-series>`__
shows one way to structure your code to ingest, create features, and fit a model.
* `Feature engineering in multiple contexts <https:/DAGWorks-Inc/hamilton/tree/main/examples/feature_engineering_multiple_contexts>`__
* `Feature engineering in multiple contexts <https:/DAGWorks-Inc/hamilton/tree/main/examples/feature_engineering/feature_engineering_multiple_contexts>`__
helps show how you can use Hamilton in multiple contexts reusing code where possible, e.g. offline, & online.
* `PySpark UDF Map Examples <https:/DAGWorks-Inc/hamilton/tree/main/examples/spark/pyspark_udfs>`__
shows how to use Hamilton to encode map operations for use with PySpark.
Expand Down Expand Up @@ -97,7 +96,7 @@ Here's a sketch of how you might use Hamilton in conjunction with a Kafka Client


Hamilton Example
__________________
^^^^^^^^^^^^^^^^
Currently we don't have a streaming example. But we are working on it. We direct users to look at the online example
for now, since conceptually from a modularity stand point, things would be set up in a similar way.

Expand All @@ -121,17 +120,22 @@ the `@config.*` decorator, to help you segment your feature computation dataflow
We skip showing a sketch of structure here, and invite you to look at the examples below.

Hamilton Example
__________________
We direct users to look at `Feature engineering in multiple contexts <https:/DAGWorks-Inc/hamilton/tree/main/examples/feature_engineering_multiple_contexts>`__
^^^^^^^^^^^^^^^^
We direct users to look at `Feature engineering in multiple contexts <https:/DAGWorks-Inc/hamilton/tree/main/examples/feature_engineering/feature_engineering_multiple_contexts>`__
that currently describes two scenarios around how you could incorporate Hamilton into an online web-service, and have
it aligned with your batch offline processes. Note, these examples should give you the high level first principles
view of how to do things. Since having something running in production , we didn't want to get too specific.

Write once, run anywhere blog post:
-----------------------------------
For a comprehensive post on writing a feature once and using it anywhere see `this blog <https://blog.dagworks.io/p/feature-engineering-with-hamilton>`__.
The companion example code can be found `here <https:/DAGWorks-Inc/hamilton/tree/main/examples/feature_engineering/write_once_run_everywhere_blog_post>`__.


FAQ
----

Q. Can I use Hamilton for feature engineering with Feast?
__________________________________________________________
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Yes, you can use Hamilton with Feast. See our [Feast example](https:/DAGWorks-Inc/hamilton/tree/main/examples/feast) and accompanying [blog post](https://blog.dagworks.io/p/featurization-integrating-hamilton). Typically people use Hamilton on the offline side to compute features that then
get pushed to Feast. For the online side it varies as to how to integrate the two.