Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blind index #118

Merged
merged 22 commits into from
Mar 12, 2020
Merged

Blind index #118

merged 22 commits into from
Mar 12, 2020

Conversation

coltfred
Copy link
Member

@coltfred coltfred commented Feb 20, 2020

TODO

  • Tests
  • Clarify use of the words index and indexed terms

src/search.rs Outdated
///The required length of the salt.
const REQUIRED_LEN: usize = 32;

///The result of creating a new index as well as indexing on a particular phrase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return this structure when they generate the index tokens for a string?

src/search.rs Outdated
pub sdk: IronSimpleSearch,
}

///Trait which gives the ability to create an index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the ability to create an index and the ability to initialize an existing index should be in separate traits. Everyone using the SDK needs initialize - the ability to create the index might not even be included in the normal import app.

src/search.rs Outdated
///If you need to index terms immediately, see `create_index_and_initialize` which will return
///the IronSimpleSearch for reuse.
async fn create_index(&self, group_id: &GroupId) -> Result<DocumentEncryptUnmanagedResult>;
///Create an index and also index the term.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't actually index anything, does it? Just returns the SDK "handle".
I'm not sure how useful it is to have these two separate. Maybe just have create_search_index and initialize_search_index?

src/search.rs Outdated
///Create an index and encrypt it to the provided group_id.
///If you need to index terms immediately, see `create_index_and_initialize` which will return
///the IronSimpleSearch for reuse.
async fn create_index(&self, group_id: &GroupId) -> Result<DocumentEncryptUnmanagedResult>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the DocumentEncryptUnmanagedResult that this returns be the input initialize_search_index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DocumentEncryptUnmanagedResult has the people it was encrypted to, errors about the sharing, etc. Maybe I should change the structure to just include the relevant data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe give it an alias that better describes its usage here?

src/search.rs Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@clintfred clintfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review not complete, still trying to wrap my head around some of the details here, but I have some initial feedback that I hope will generate discussion and be helpful on it's own.

src/search.rs Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated
#[async_trait]
pub trait SimpleSeachInitialize {
///Given the encrypted salt and the edeks, decrypt them and give back the IronSimpleSearch object.
async fn initialize_search_sdk(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming suggestions: initialize_search

src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated
///Given the encrypted salt and the edeks, decrypt them and give back the IronSimpleSearch object.
async fn initialize_search_sdk(
&self,
encrypted_data: &[u8],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just call this the encrypted_salt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to encrypted_salt_bytes to hopefully avoid confusion between the entire "EncryptedSalt" which is the deks + bytes and just the Aes bytes.

src/search.rs Outdated

///The result of creating a new index as well as initializing an IronSimpleSdk.
///If you only want to create the index, see create_index.
pub struct CreatedIndexResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest: CreateIndexResult or SearchIndexCreateResult or IndexCreateResult

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the suggestions, I would choose IndexCreateResult

Copy link
Member

@BobWall23 BobWall23 Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't create_index return an Index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe a BlindIndex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about putting the encrypted_salt in the BlindIndexSearchSdk struct and getting rid of this altogether?

Maybe this would look like getting rid of create_index (or not) and changing create_index_and_initialize_search to also return Result<BlindIndexSearchSdk>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline that this type of construction doesn't make sense as the existence of a BlindIndexSearch doesn't necessarily mean that an EncryptedBlindIndexSalt exists.

So I'm back to suggesting IndexCreateResult as a name here, unless someone has something better. I think what bothers me about the current name is that it starts with a verb.

src/search.rs Outdated
///Create an index and encrypt it to the provided group_id.
///If you need to index terms immediately, see `create_index_and_initialize` which will return
///the IronSimpleSearch for reuse.
async fn create_index(&self, group_id: &GroupId) -> Result<DocumentEncryptUnmanagedResult>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe give it an alias that better describes its usage here?

src/search.rs Outdated

///The result of creating a new index as well as initializing an IronSimpleSdk.
///If you only want to create the index, see create_index.
pub struct CreatedIndexResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about putting the encrypted_salt in the BlindIndexSearchSdk struct and getting rid of this altogether?

Maybe this would look like getting rid of create_index (or not) and changing create_index_and_initialize_search to also return Result<BlindIndexSearchSdk>

src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated

#[async_trait]
impl BlindIndexSearchInitialize for IronOxide {
async fn initialize_search(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider rename to initialize_blind_index_search?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said "ok", but didn't change it. Intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Pushed.

src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated
}

///Generate the list of tokens to use to find entries that match the search query, given the specified partition_id.
pub fn tokenize_query(&self, query: &str, partition_id: Option<&str>) -> Vec<u32> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what partition_id is. These two functions could use more descriptive docs, perhaps with an example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more text about it.

src/search.rs Outdated

///Generate the list of tokens to use to find entries that match the search query, given the specified partition_id.
/// query - The string you want to tokenize and hash
/// partition_id - An extra string you want to include in every hash, this allows 2 queries with different partition_ids to produce a different set of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to end your

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what

tests/search_ops.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated
@@ -40,6 +40,7 @@ vec1 = "~1.4.0"
# ironoxide requires rt-threaded at runtime, but not at compile time
tokio = {version = "~0.2.0", features=["time", "rt-threaded"]}
dashmap = "3.4.0"
ironcore-search-helpers = { git = "https:/IronCoreLabs/search-helpers", rev = "38be96fe6160a739396cf505054a4e0dec6e0aa6", optional=true}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to still add tags to this repo so we can point to a specific tag for versioning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we have to publish it or we can't publish ironoxide. I'll do that.


///Generate the list of tokens to use to find entries that match the search query, given the specified partition_id.
/// query - The string you want to tokenize and hash
/// partition_id - An extra string you want to include in every hash, this allows 2 queries with different partition_ids to produce a different set of tokens for the same data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks like it was just copy/pasted from tokenize_query where I assume it serves a different purpose even though the arguments are the same type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because they currently do the same thing. I'll change it when it gets updated to do something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That time has come

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the comment. Let me know if you think it could be improved.

src/search.rs Outdated
pub struct BlindIndexCreateResult {
pub encrypted_salt: EncryptedBlindIndexSalt,
pub sdk: BlindIndexSearch,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be unused now, which matches up with what we discussed in the meeting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
src/search.rs Outdated Show resolved Hide resolved
@coltfred coltfred merged commit b88cdbc into master Mar 12, 2020
@coltfred coltfred deleted the blind-index branch March 12, 2020 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants