Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Term Cleanup Feature #815

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .eslintrc.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{

Check warning on line 1 in .eslintrc.json

View workflow job for this annotation

GitHub Actions / eslint

File ignored by default.
"globals": {
"wp": "readonly",
"jQuery": "readonly",
Expand All @@ -21,7 +21,8 @@
"Headers": "readonly",
"requestAnimationFrame": "readonly",
"React": "readonly",
"Block": "readonly"
"Block": "readonly",
"classifai_term_cleanup_params": "readonly"
},
"extends": ["plugin:@wordpress/eslint-plugin/recommended"],
"ignorePatterns": ["*.json", "webpack.config.js"]
Expand Down
44 changes: 43 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Tap into leading cloud-based services like [OpenAI](https://openai.com/), [Micro
* Convert text content into audio and output a "read-to-me" feature on the front-end to play this audio using [Microsoft Azure's Text to Speech API](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/text-to-speech), [Amazon Polly](https://aws.amazon.com/polly/) or [OpenAI's Text to Speech API](https://platform.openai.com/docs/guides/text-to-speech)
* Classify post content using [IBM Watson's Natural Language Understanding API](https://www.ibm.com/watson/services/natural-language-understanding/), [OpenAI's Embedding API](https://platform.openai.com/docs/guides/embeddings) or [Microsoft Azure's OpenAI service](https://azure.microsoft.com/en-us/products/ai-services/openai-service)
* Create a smart 404 page that has a recommended results section that suggests relevant content to the user based on the page URL they were trying to access using either [OpenAI's Embedding API](https://platform.openai.com/docs/guides/embeddings) or [Microsoft Azure's OpenAI service](https://azure.microsoft.com/en-us/products/ai-services/openai-service) in combination with [ElasticPress](https:/10up/ElasticPress)
* Find similar terms to merge together using either [OpenAI's Embedding API](https://platform.openai.com/docs/guides/embeddings) or [Microsoft Azure's OpenAI service](https://azure.microsoft.com/en-us/products/ai-services/openai-service) in combination with [ElasticPress](https:/10up/ElasticPress). Note this only compares top-level terms and if you merge a term that has children, these become top-level terms as per default WordPress behavior
* BETA: Recommend content based on overall site traffic via [Microsoft Azure's AI Personalizer API](https://azure.microsoft.com/en-us/services/cognitive-services/personalizer/) *(note that this service has been [deprecated by Microsoft](https://learn.microsoft.com/en-us/azure/ai-services/personalizer/) and as such, will no longer work. We are looking to replace this with a new provider to maintain the same functionality (see [issue#392](https:/10up/classifai/issues/392))*
* Generate image alt text, image tags, and smartly crop images using [Microsoft Azure's AI Vision API](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/)
* Scan images and PDF files for embedded text and save for use in post meta using [Microsoft Azure's AI Vision API](https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/)
Expand Down Expand Up @@ -56,7 +57,8 @@ Tap into leading cloud-based services like [OpenAI](https://openai.com/), [Micro
* To utilize the Azure OpenAI Language Processing functionality, you will need an active [Microsoft Azure](https://signup.azure.com/signup) account and you will need to [apply](https://aka.ms/oai/access) for OpenAI access.
* To utilize the Google Gemini Language Processing functionality, you will need an active [Google Gemini](https://ai.google.dev/tutorials/setup) account.
* To utilize the AWS Language Processing functionality, you will need an active [AWS](https://console.aws.amazon.com/) account.
* To utilize the Smart 404 feature, you will need to use [ElasticPress](https:/10up/ElasticPress) 5.0.0+ and [Elasticsearch](https://www.elastic.co/elasticsearch) 7.0+.
* To utilize the Smart 404 feature, you will need an active [OpenAI](https://platform.openai.com/signup) account or [Microsoft Azure](https://signup.azure.com/signup) account with OpenAI access and you will need to use [ElasticPress](https:/10up/ElasticPress) 5.0.0+ and [Elasticsearch](https://www.elastic.co/elasticsearch) 7.0+.
* To utilize the Term Cleanup feature, you will need an active [OpenAI](https://platform.openai.com/signup) account or [Microsoft Azure](https://signup.azure.com/signup) account with OpenAI access. For better performance, you will need [ElasticPress](https:/10up/ElasticPress) 5.0.0+ and [Elasticsearch](https://www.elastic.co/elasticsearch) 7.0+.

## Pricing

Expand Down Expand Up @@ -561,6 +563,46 @@ docker run -p 9200:9200 -d --name elasticsearch \

This will download, install and start Elasticsearch v7.9.0 to your local machine. You can then access Elasticsearch at `http://localhost:9200`, which is the same URL you can use to configure ElasticPress with. It is recommended that you change the `Content Items per Index Cycle` setting in ElasticPress to `20` to ensure indexing doesn't timeout. Also be aware of API rate limits on the OpenAI Embeddings API.

## Set Up the Term Cleanup Feature

### 1. Decide on Provider

* This Feature is powered by either OpenAI or Azure OpenAI.
* Once you've chosen a Provider, you'll need to create an account and get authentication details.
* When setting things up on the Azure side, ensure you choose either the `text-embedding-3-small` or `text-embedding-3-large` model. The Feature will not work with other models.

### 2. Configure Settings under Tools > ClassifAI > Language Processing > Term Cleanup

* Select the proper Provider in the provider dropdown.
* Enter your authentication details.
* Configure any other settings as desired.

### 3. ElasticPress configuration

It is recommended to use ElasticPress with this Feature, especially if processing more than 500 terms, as performance will be significantly better. Once the Term Cleanup Feature is configured, you can then proceed to get ElasticPress set up to index the data.

If on a standard WordPress installation:

* Install and activate the [ElasticPress](https:/10up/elasticpress) plugin.
* Set your Elasticsearch URL in the ElasticPress settings (`ElasticPress > Settings`).
* Enable the [term index](https://www.elasticpress.io/blog/2023/03/enabling-comments-and-terms-in-elasticpress-5-0/) feature.
* Go to the `ElasticPress > Sync` settings page and trigger a sync, ensuring this is set to run a sync from scratch. This will send over the new schema to Elasticsearch and index all content, including creating vector embeddings for each term.

If on a WordPress VIP hosted environment:

* [Enable Enterprise Search](https://docs.wpvip.com/enterprise-search/enable/).
* [Enable the term index](https://docs.wpvip.com/enterprise-search/enable-features/#h-terms). Example command: `vip @example-app.develop -- wp vip-search activate-feature terms`.
* [Run the VIP-CLI `index` command](https://docs.wpvip.com/enterprise-search/index/). This sends the new schema to Elasticsearch and indexes all content, including creating vector embeddings for each term. Note you may need to use the `--setup` flag to ensure the schema is created correctly.

### 4. Start the Term Cleanup Process

Once configured, the plugin will add a new submenu under the Tools menu called Term Cleanup.

* Go to the Term Cleanup page, click on your desired taxonomy, then click on the "Find similar" button.
* This initializes a background process that will compare each term to find ones that are similar.
* Once done, all the results will be displayed.
* You can then skip or merge the potential duplicate terms from the settings page.

## Set Up Image Processing features (via Microsoft Azure)

Note that [Azure AI Vision](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/home#image-requirements) can analyze and crop images that meet the following requirements:
Expand Down
294 changes: 294 additions & 0 deletions includes/Classifai/Admin/SimilarTermsListTable.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
<?php

namespace Classifai\Admin;

use WP_List_Table;

/**
* Class for displaying the list of similar terms for a given taxonomy.
*
* @see WP_List_Table
*/
class SimilarTermsListTable extends WP_List_Table {

/**
* Taxonomy to get similar terms for.
*
* @var string
*/
protected $taxonomy;

/**
* ID of last rendered term.
*
* @var int
*/
protected $last_item_id;

/**
* Initialize the class and set its properties.
*
* @param string $taxonomy The taxonomy to get similar terms for.
*/
public function __construct( $taxonomy ) {
$this->taxonomy = $taxonomy;

// Set parent defaults.
parent::__construct(
array(
'singular' => 'similar_term',
'plural' => 'similar_terms',
'ajax' => false,
)
);
}

/**
* Gets the list of columns.
*
* @return string[] Array of column titles keyed by their column name.
*/
public function get_columns() {
$tax = get_taxonomy( $this->taxonomy );
$labels = get_taxonomy_labels( $tax );
$label = $labels->singular_name ?? __( 'Term', 'classifai' );

return array(
'term' => $label,
// translators: %s: Singular label of the taxonomy.
'similar_term' => sprintf( __( 'Similar %s', 'classifai' ), $label ),
'actions' => __( 'Action', 'classifai' ),
);
}

/**
* Prepares the list of items for displaying.
*/
public function prepare_items() {
$per_page = $this->get_items_per_page( 'edit_post_per_page' );
$columns = $this->get_columns();
$hidden = array();
$sortable = $this->get_sortable_columns();
$search = isset( $_REQUEST['s'] ) ? sanitize_text_field( wp_unslash( $_REQUEST['s'] ) ) : ''; // phpcs:ignore WordPress.Security.NonceVerification.Recommended

$this->_column_headers = array( $columns, $hidden, $sortable );

$total = wp_count_terms(
[
'taxonomy' => $this->taxonomy,
'hide_empty' => false,
'meta_key' => 'classifai_similar_terms', // phpcs:ignore WordPress.DB.SlowDBQuery.slow_db_query_meta_key
'meta_compare' => 'EXISTS',
'search' => $search,
]
);

$this->set_pagination_args(
array(
'total_items' => $total, // WE have to calculate the total number of items.
'per_page' => $per_page, // WE have to determine how many items to show on a page.
'total_pages' => ceil( $total / $per_page ), // WE have to calculate the total number of pages.
)
);

$current = $this->get_pagenum();
$offset = ( $current - 1 ) * $per_page;

$terms = get_terms(
[
'taxonomy' => $this->taxonomy,
'orderby' => 'count',
'order' => 'DESC',
'hide_empty' => false,
'fields' => 'ids',
'meta_key' => 'classifai_similar_terms', // phpcs:ignore WordPress.DB.SlowDBQuery.slow_db_query_meta_key
'meta_compare' => 'EXISTS',
'number' => $per_page,
'offset' => $offset,
'search' => $search,
]
);

$items = [];

foreach ( $terms as $term_id ) {
$similar_terms = get_term_meta( $term_id, 'classifai_similar_terms', true );

if ( ! $similar_terms ) {
continue;
}

foreach ( $similar_terms as $k => $v ) {
$similar_term = get_term( $k );
if ( $similar_term ) {
$items[] = [
'term' => get_term( $term_id ),
'similar_term' => $similar_term,
'score' => $v,
];
} else {
unset( $similar_terms[ $k ] );
update_term_meta( $term_id, 'classifai_similar_terms', $similar_terms );
}
}

if ( empty( $similar_terms ) ) {
delete_term_meta( $term_id, 'classifai_similar_terms' );
}
}

$this->items = $items;
}

/**
* Generate term html to show it in Similar terms list table
*
* @param WP_Term $term Term Object.
* @param WP_Term $similar_term Similar Term Object.
* @param float $score Similarity score.
* @return string
*/
public function generate_term_html( $term, $similar_term, $score = null ) {
$args = array(
'action' => 'classifai_merge_term',
'taxonomy' => $this->taxonomy,
'from' => $similar_term->term_id,
'to' => $term->term_id,
'paged' => $this->get_pagenum(),
's' => isset( $_REQUEST['s'] ) ? sanitize_text_field( wp_unslash( $_REQUEST['s'] ) ) : false, // phpcs:ignore WordPress.Security.NonceVerification.Recommended
);
$merge_url = add_query_arg( $args, wp_nonce_url( admin_url( 'admin-post.php' ), 'classifai_merge_term' ) );
$score = $score ? ( $score > 1 ? $score - 1 : $score ) : '';

return sprintf(
// translators: %s: Term name, %d: Term ID.
__( '<span><strong>%1$s</strong> (ID: %2$s)</span><br/><br/>', 'classifai' ) .
// translators: %s: Term slug.
__( '<span><strong>Slug:</strong> %3$s</span><br/>', 'classifai' ) .
// translators: %s: Term count.
__( '<span><strong>Used:</strong> %4$s</span><br/>', 'classifai' ) .
// translators: %s: Term parent name.
__( '<span><strong>Parent:</strong> %5$s</span><br/>', 'classifai' ) .
// translators: %s: Similarity score.
( $score ? __( '<span><strong>Similarity:</strong> %6$s</span><br/>', 'classifai' ) : '%6$s' ) .
'<a href="%7$s" class="button button-primary term-merge-button">%8$s</a>',
esc_html( $term->name ),
'<a href="' . esc_url( get_edit_term_link( $term->term_id, $term->taxonomy ) ) . '" target="_blank">' . esc_html( $term->term_id ) . '</a>',
esc_html( $term->slug ),
// translators: %d: Term count.
'<a href="' . esc_url( admin_url( 'edit.php?tag=' . $term->slug ) ) . '" target="_blank">' . esc_html( sprintf( _n( '%d time', '%d times', $term->count, 'classifai' ), $term->count ) ) . '</a>',
esc_html( $term->parent > 0 ? get_term( $term->parent )->name : 'None' ),
$score ? esc_html( round( $score * 100, 2 ) . '%' ) : '',
esc_url( $merge_url ),
esc_html__( 'Merge and keep this', 'classifai' )
);
}

/**
* Handles the term column output.
*
* @param array $item The current term item.
*/
public function column_term( $item ) {
$term = $item['term'];
$similar_term = $item['similar_term'];
$this->last_item_id = $term->term_id;

return $this->generate_term_html( $term, $similar_term );
}

/**
* Handles the similar term column output.
*
* @param array $item The current term item.
*/
public function column_similar_term( $item ) {
$term = $item['term'];
$similar_term = $item['similar_term'];

return $this->generate_term_html( $similar_term, $term, $item['score'] );
}

/**
* Handles the term actions output.
*
* @param array $item The current term item.
*/
public function column_actions( $item ) {
$term = $item['term'];
$similar_term = $item['similar_term'];

$args = array(
'action' => 'classifai_skip_similar_term',
'taxonomy' => $this->taxonomy,
'term' => $term->term_id,
'similar_term' => $similar_term->term_id,
'paged' => $this->get_pagenum(),
's' => isset( $_REQUEST['s'] ) ? sanitize_text_field( wp_unslash( $_REQUEST['s'] ) ) : false, // phpcs:ignore WordPress.Security.NonceVerification.Recommended
);
$skip_url = add_query_arg( $args, wp_nonce_url( admin_url( 'admin-post.php' ), 'classifai_skip_similar_term' ) );

return sprintf(
"<a href='%s' class='button button-secondary'>%s</a>",
esc_url( $skip_url ),
esc_html__( 'Skip', 'classifai' )
);
}

/**
* Generates content for a single row of the table
*
* @param array $item The current item.
* @param string $column_name The current column name.
*/
protected function column_default( $item, $column_name ) {
return esc_html( $item[ $column_name ] );
}

/**
* Generates custom table navigation to prevent conflicting nonces.
*
* @param string $which The location of the bulk actions: Either 'top' or 'bottom'.
*/
protected function display_tablenav( $which ) {
?>
<div class="tablenav <?php echo esc_attr( $which ); ?>">
<div class="alignleft actions bulkactions">
<?php $this->bulk_actions( $which ); ?>
</div>
<?php
$this->extra_tablenav( $which );
$this->pagination( $which );
?>
<br class="clear" />
</div>
<?php
}

/**
* Gets the name of the default primary column.
*
* @return string Name of the default primary column, in this case, 'term'.
*/
protected function get_default_primary_column_name() {
return 'term';
}

/**
* Generates content for a single row of the table.
*
* @param object|array $item The current item
*/
public function single_row( $item ) {
$term = $item['term'];
$class = 'border';

if ( $this->last_item_id === $term->term_id ) {
$class .= ' skip';
}

echo '<tr class="' . esc_attr( $class ) . '">';
$this->single_row_columns( $item );
echo '</tr>';
}
}
2 changes: 1 addition & 1 deletion includes/Classifai/Admin/templates/classifai-header.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
// phpcs:ignore WordPress.Security.NonceVerification.Recommended
$active_page = isset( $_GET['tab'] ) ? sanitize_text_field( wp_unslash( $_GET['tab'] ) ) : 'classifai_settings';
// phpcs:ignore WordPress.Security.NonceVerification.Recommended
$is_setup_page = isset( $_GET['page'] ) && 'classifai_setup' === sanitize_text_field( wp_unslash( $_GET['page'] ) );
$is_setup_page = isset( $_GET['page'] ) && ( 'classifai_setup' === sanitize_text_field( wp_unslash( $_GET['page'] ) ) || 'classifai-term-cleanup' === sanitize_text_field( wp_unslash( $_GET['page'] ) ) );
?>
<header id="classifai-header">
<div class="classifai-header-layout">
Expand Down
Loading
Loading