Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Smart 404 Feature #801

Merged
merged 8 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .wp-env.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{

Check warning on line 1 in .wp-env.json

View workflow job for this annotation

GitHub Actions / eslint

File ignored by default.
"plugins": [".", "./tests/test-plugin", "https://downloads.wordpress.org/plugin/classic-editor.zip"],
"plugins": [".", "./tests/test-plugin", "https://downloads.wordpress.org/plugin/classic-editor.zip", "https://downloads.wordpress.org/plugin/elasticpress.zip"],
"env": {
"tests": {
"mappings": {
Expand Down
127 changes: 124 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Tap into leading cloud-based services like [OpenAI](https://openai.com/), [Micro
* Moderate incoming comments for sensitive content using [OpenAI's Moderation API](https://platform.openai.com/docs/guides/moderation)
* Convert text content into audio and output a "read-to-me" feature on the front-end to play this audio using [Microsoft Azure's Text to Speech API](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/text-to-speech), [Amazon Polly](https://aws.amazon.com/polly/) or [OpenAI's Text to Speech API](https://platform.openai.com/docs/guides/text-to-speech)
* Classify post content using [IBM Watson's Natural Language Understanding API](https://www.ibm.com/watson/services/natural-language-understanding/), [OpenAI's Embedding API](https://platform.openai.com/docs/guides/embeddings) or [Microsoft Azure's OpenAI service](https://azure.microsoft.com/en-us/products/ai-services/openai-service)
* Create a smart 404 page that has a recommended results section that suggests relevant content to the user based on the page URL they were trying to access using either [OpenAI's Embedding API](https://platform.openai.com/docs/guides/embeddings) or [Microsoft Azure's OpenAI service](https://azure.microsoft.com/en-us/products/ai-services/openai-service) in combination with [ElasticPress](https:/10up/ElasticPress)
* BETA: Recommend content based on overall site traffic via [Microsoft Azure's AI Personalizer API](https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/) *(note that this service has been [deprecated by Microsoft](https://learn.microsoft.com/en-us/azure/ai-services/personalizer/) and as such, will no longer work. We are looking to replace this with a new provider to maintain the same functionality (see [issue#392](https:/10up/classifai/issues/392))*
* Generate image alt text, image tags, and smartly crop images using [Microsoft Azure's AI Vision API](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/)
* Scan images and PDF files for embedded text and save for use in post meta using [Microsoft Azure's AI Vision API](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/)
Expand Down Expand Up @@ -55,6 +56,7 @@ Tap into leading cloud-based services like [OpenAI](https://openai.com/), [Micro
* To utilize the Azure OpenAI Language Processing functionality, you will need an active [Microsoft Azure](https://signup.azure.com/signup) account and you will need to [apply](https://aka.ms/oai/access) for OpenAI access.
* To utilize the Google Gemini Language Processing functionality, you will need an active [Google Gemini](https://ai.google.dev/tutorials/setup) account.
* To utilize the AWS Language Processing functionality, you will need an active [AWS](https://console.aws.amazon.com/) account.
* To utilize the Smart 404 feature, you will need to use [ElasticPress](https:/10up/ElasticPress) 5.0.0+ and [Elasticsearch](https://www.elastic.co/elasticsearch) 7.0+.

## Pricing

Expand Down Expand Up @@ -111,10 +113,10 @@ Add this repository to composer.json, specifying a release version, as shown bel
"type": "package",
"package": {
"name": "10up/classifai",
"version": "2.0.0",
"version": "3.1.1",
"type": "wordpress-plugin",
"dist": {
"url": "https:/10up/classifai/archive/refs/tags/2.0.0.zip",
"url": "https:/10up/classifai/archive/refs/tags/3.1.1.zip",
"type": "zip"
}
}
Expand All @@ -126,7 +128,7 @@ Finally, require the plugin, using the version number you specified in the previ

```json
"require": {
"10up/classifai": "3.0.0"
"10up/classifai": "3.1.1"
}
```

Expand Down Expand Up @@ -440,6 +442,125 @@ Note that [OpenAI](https://platform.openai.com/docs/guides/speech-to-text) can c
* Click the button to preview the generated speech audio for the post.
* View the post on the front-end and see a read-to-me feature has been added

## Set Up the Smart 404 Feature

### 1. Decide on Provider

* This Feature is powered by either OpenAI or Azure OpenAI.
* Once you've chosen a Provider, you'll need to create an account and get authentication details.
* When setting things up on the Azure side, ensure you choose either the `text-embedding-3-small` or `text-embedding-3-large` model. The Feature will not work with other models.

### 2. Configure Settings under Tools > ClassifAI > Language Processing > Smart 404

* Select the proper Provider in the provider dropdown.
* Enter your authentication details.
* Configure any other settings as desired.

### 3. ElasticPress configuration

Once the Smart 404 Feature is configured, you can then proceed to get ElasticPress set up to index the data.

If on a standard WordPress installation:

* Install and activate the [ElasticPress](https:/10up/elasticpress) plugin.
* Set your Elasticsearch URL in the ElasticPress settings (`ElasticPress > Settings`).
* Go to the `ElasticPress > Sync` settings page and trigger a sync, ensuring this is set to run a sync from scratch. This will send over the new schema to Elasticsearch and index all content, including creating vector embeddings for each post.

If on a WordPress VIP hosted environment:

* [Enable Enterprise Search](https://docs.wpvip.com/enterprise-search/enable/)
* [Run the VIP-CLI `index` command](https://docs.wpvip.com/enterprise-search/index/). This sends the new schema to Elasticsearch and indexes all content, including creating vector embeddings for each post. Note you may need to use the `--setup` flag to ensure the schema is created correctly.

At this point all of your content should be indexed, along with the embeddings data. You'll then need to update your 404 template to display the recommended results.

### 4. Display the recommended results

The Smart 404 Feature comes with a few helper functions that can be used to display the recommended results on your 404 page:

* Directly display the results using the `Classifai\render_smart_404_results()` function.
* Get the data and then display it in your own way using the `Classifai\get_smart_404_results()` function.

You will need to directly integrate these functions into your 404 template where desired. The plugin does not automatically display the results on the 404 page for you.

Both functions support the following arguments. If any argument is not provided, the default value set on the settings page will be used:

* `$index` (string) - The ElasticPress index to search in. Default is `post`.
* `$num` (int) - Maximum number of results to display. Default is `5`.
* `$num_candidates` (int) - Maximum number of results to search over. Default is `5000`.
* `$rescore` (bool) - Whether to run a rescore query or not. Can give better results but often is slower. Default is `false`.
* `$score_function` (string) - The [vector scoring function](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-script-score-query.html#vector-functions) to use. Default is `cosine`. Options are `cosine`, `dot_product`, `l1_norm` and `l2_norm`.

The `Classifai\render_smart_404_results()` function also supports the following additional arguments:

* `$fallback` (bool) - Whether to run a fallback WordPress query if no results are found in Elasticsearch. These results will then be rendered. Default is `true`.

Examples:

```php
// Render the results.
Classifai\render_smart_404_results(
[
'index' => 'post',
'num' => 3,
'num_candidates' => 1000,
'rescore' => true,
'fallback' => true,
'score_function' => 'dot_product',
]
);
```

```php
// Get the results.
$results = Classifai\get_smart_404_results(
[
'index' => 'post',
'num' => 10,
'num_candidates' => 8000,
'rescore' => false,
'score_function' => 'cosine',
]
);

ob_start();

// Render the results.
foreach ( $results as $result ) {
?>
<div>
<?php if ( has_post_thumbnail( $result->ID ) ) : ?>
<figure>
<a href="<?php echo esc_url( get_permalink( $result->ID ) ); ?>">
<?php echo wp_kses_post( get_the_post_thumbnail( $result->ID ) ); ?>
</a>
</figure>
<?php endif; ?>
<a href="<?php echo esc_url( get_permalink( $result->ID ) ); ?>">
<?php echo esc_html( $result->post_title ); ?>
</a>
</div>
<?php
}

$output = ob_get_clean();
echo $output;
```

### Local Quickstart

If you want to quickly test things locally, ensure you have Docker installed (Docker Desktop recommended) and then run the following command:

```bash
docker run -p 9200:9200 -d --name elasticsearch \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-e "xpack.security.http.ssl.enabled=false" \
-e "xpack.license.self_generated.type=basic" \
docker.elastic.co/elasticsearch/elasticsearch:7.9.0
```

This will download, install and start Elasticsearch v7.9.0 to your local machine. You can then access Elasticsearch at `http://localhost:9200`, which is the same URL you can use to configure ElasticPress with. It is recommended that you change the `Content Items per Index Cycle` setting in ElasticPress to `20` to ensure indexing doesn't timeout. Also be aware of API rate limits on the OpenAI Embeddings API.

## Set Up Image Processing features (via Microsoft Azure)

Note that [Azure AI Vision](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/home#image-requirements) can analyze and crop images that meet the following requirements:
Expand Down
23 changes: 23 additions & 0 deletions includes/Classifai/Admin/templates/onboarding-step-three.php
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,18 @@
?>
</div>
<div class="classifai-setup-form">
<?php
/**
* Fires before the settings form for a feature.
*
* @since x.x.x
* @hook classifai_before_onboarding_feature_settings_form
*
* @param {string} $current_feature Current feature.
*/
do_action( 'classifai_before_onboarding_feature_settings_form', $current_feature );
?>

<input name="classifai-setup-feature" type="hidden" value="<?php echo esc_attr( $current_feature ); ?>" />
<table class="form-table">
<?php
Expand All @@ -73,6 +85,17 @@
}
?>
</table>
<?php
/**
* Fires after the settings form for a feature.
*
* @since x.x.x
* @hook classifai_after_onboarding_feature_settings_form
*
* @param {string} $current_feature Current active feature.
*/
do_action( 'classifai_after_onboarding_feature_settings_form', $current_feature );
?>
</div>
</div>

Expand Down
Loading
Loading