Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on refactoring of some things, only comments, not intended for merge. #277

Draft
wants to merge 5 commits into
base: devel
Choose a base branch
from

Conversation

kilohsakul
Copy link
Collaborator

@kilohsakul kilohsakul commented Jul 21, 2023

Bitching about various things, for the sake of discussion.
Please comment things worth commenting, or just say ok or nothing.

@kilohsakul kilohsakul marked this pull request as draft July 21, 2023 18:26
@kilohsakul kilohsakul changed the title Discussion or refactoring some things, only comments, not intended for merge. Discussion on refactoring of some things, only comments, not intended for merge. Jul 21, 2023
Copy link
Collaborator

@Adda0 Adda0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on majority of the suggestions/questions. If others want to discuss the comments, I would suggest continuing with the existing conversations, if the problem they want to discuss is the same as in the conversation and talks about the same comment in the code.

@@ -112,6 +132,7 @@ Simlib::Util::BinaryRelation compute_relation(
* @param[out] prod_map Mapping of pairs of the original states (lhs_state, rhs_state) to new product states.
* @return NFA as a product of NFAs @p lhs and @p rhs with ε-transitions preserved.
*/
// all parameters needed/used?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed some time ago that all functions performing some kind of renaming of states should return the renaming map. But it seems to me that we never use these. I think we have no use case for the renaming so far. There might be some in the future, but we do not have one at the moment. Do we want to keep the renaming maps, then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ask at the mejtyng what the uses can be, and remove it if there is nothing concrete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe really just putting the one struct of the special type as the optional parameter there, with some arbitrary initial choice of optional parameters as members? That would allow us to change the insides of the struct later without having to modify the interface?

Copy link
Collaborator

@Adda0 Adda0 Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here we are talking about output parameters. When we pass the parameters, we normally do not want to be able to change the parameter class inside the function (we want to pass it by const reference). Hence, mixing the input, output, and input/output parameters in a single class would be confusing and error-prone, I believe.

But you are right that this would solve the issue with changing interfaces.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in the above comment, for these maps, I would probably put them to the key-value store

Copy link
Collaborator

@Adda0 Adda0 Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #277 (comment) for what problems I find with this approach for output parameters, as stated further in the linked comment.

include/mata/nfa/algorithms.hh Show resolved Hide resolved
Comment on lines 70 to +72
bool is_included_naive(const Nfa& smaller, const Nfa& bigger, const Alphabet* alphabet = nullptr, Run* cex = nullptr);
//naive is not a good word, maybe textbook, or explicit, construct_first
//the alphabet parameter is useless in inclusion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _basic or _simple? But I do not mind _naive. Why do you think it is not ideal? We talk about the implementations/algorithms as naive in our discussions all the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussions are private, code is public. I would go for textbook.

Copy link
Collaborator

@Adda0 Adda0 Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the algorithm, however basic, has a name, we can use the name instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the explicit. I have read a textbook with the antichain-based inclusion checking. Basically, you could append textbook to every function name in mata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like naive because it is not really clear what it is outside our group and it is somehow pejorative.
I would like textbook, I think everybody will know, I don't think it can be really confused with antichains. Explicit is ok too, but it requires a lot of context to get the meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think explicit is little bit better. textbook seems a little bit weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like basic/simple, but textbook is also ok for me. I agree with Lukas, naive sounds pejorative, explicit does not really give me any information.

Comment on lines 70 to +72
bool is_included_naive(const Nfa& smaller, const Nfa& bigger, const Alphabet* alphabet = nullptr, Run* cex = nullptr);
//naive is not a good word, maybe textbook, or explicit, construct_first
//the alphabet parameter is useless in inclusion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabet parameter is not useless. If you already have the alphabet, you can pass it to the function and save yourself creating the alphabet in the function again. Inclusion check can be rewritten, but for the current implementation, the alphabet parameter is an optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need an alphabet in inclusion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because inclusion is computed using a complement of an automaton, which in our interface requires the alphabet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In inlcusion, the complement can be with respect to the symbols used in the other automaton, not with respect to an external alphabet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the alphabet can be inherited. There might be an issue when you construct the complement wrt an alphabet that do not contain all symbols from the automaton before complement (but who knows).

Copy link
Member

@jurajsic jurajsic Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @Adda0 said (and it is clearly explained in the comment for this function), alphabet is there as an optimization if you already have it. Otherwise, you have to go trough the whole delta of both automata to create the alphabet and that could be costly. But maybe not, you are going to do complement anyway, that will probably take much more time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, the efficient antichain inclusion does not need the alphabet at all, the explicit one will do much more constly operations than collecting alphabet anyway.

@@ -98,6 +102,22 @@ bool is_universal_naive(const Nfa& aut, const Alphabet& alphabet, Run* cex);
*/
bool is_universal_antichains(const Nfa& aut, const Alphabet& alphabet, Run* cex);

//the alphabet parameter should be optional? The automaton can have its alphabet too. Run should be optional.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that Run should be optional.

Making alphabet optional would mean that we can (in this order), check for existence of alphabet in:

  1. function parameter,
  2. automaton member variable alphabet, and
  3. construct alphabet from transitions of the automaton (this is optional and probably not wanted - it seems strange to check for universality from symbols on transitions of the checked automaton only - therefore, we could fail in the case that neither 1) or 2) have alphabet specified .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last time we said that automaton would always have an alphabet, so 3 would not apply

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am not sure if it is possible to implement such an alphabet. It will need to be investigated further.

And 3. would still apply. We would check for universality for only symbols in the automaton. Therefore, it is up to us whether that seems OK or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, alphabet should be optional, if it is not there, use alphabet of the automaton. Otherwise I would throw error.

Nfa Mata::Nfa::Algorithms::minimize_brzozowski(const Nfa& aut) {
//compute the minimal deterministic automaton, Brzozovski algorithm
return determinize(revert(determinize(revert(aut))));
}

//method?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed that functions should return new instance and methods should compute inplace, if at all possible. Therefore, this function should remain a function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, algorithms that merge states could be implemented in-place. Simulation, Hopcroft, ... Not Brzozovski probably.
It could makse sense to implement them in place. ... ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we could use chaining and write something like intersection(nfa1.trim().revert(), nfa2.determinize()). This would allow us to eliminate some optional parameters from several functions, I believe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the in-place implementation brings us something here (performance wise). I don't know how to do quotienting efficiently on our structure of NFA without a fresh new construction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, true, one could avoid some of the allocations, not all.

The main question is whether we really want this dogma : "not in place therefore function" and not use methods that return new automata.
Maybe I just don't remember what the arguments were.
I

Copy link
Collaborator

@Adda0 Adda0 Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more a matter of taste. It would be consistent if we kept the dogma, but there is not much to gain from that. I would be OK with using both approaches interchangeably. For example, we can have one (or more) from these options implemented:

Nfa Mata::Nfa::concatenate(const Nfa& nfa1, const Nfa& nfa2);
Nfa Mata::Nfa::concatenation(const Nfa& nfa1, const Nfa& nfa2);
Nfa Mata::Nfa::concatenated(const Nfa& nfa1, const Nfa& nfa2);
Nfa Mata::Nfa::Nfa::concatenate_with(const Nfa& nfa2);
Nfa Mata::Nfa::Nfa::concatenated_with(const Nfa& nfa2);

All the options are acceptable, in my opinion. But maybe for operations working with two automata, having functions is a better design, clearly showing that the original automata are not modified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like concatenate and concatenate_with. And I'm fine with concatenation. I like the idea of chaining, it could lead to elegant and clear code.

I have no strong feeling regarding the inplace algorithsm, I think @vhavlena is right, that the savings will be minimal.

src/nfa/operations.cc Show resolved Hide resolved
src/nfa/operations.cc Show resolved Hide resolved
@@ -789,6 +819,7 @@ Nfa Mata::Nfa::reduce(const Nfa &aut, bool trim_input, StateToStateMap *state_ma
return result;
}

//Might be a method, but it will not be in-place anyway, so I don't know.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed that functions should return new instance and methods should compute in-place, if at all possible. Therefore, this function should remain a function.

Copy link
Collaborator Author

@kilohsakul kilohsakul Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but, actually, would it be better to call it determinized - identification of the returned thing? determinize sounds a bit like modifying the given thing.

Copy link
Collaborator

@Adda0 Adda0 Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But maybe then it should be a method which operates on the single instance and returns a new instance. That is the same as nfa.single_symbol_automaton(). However, we need to keep in mind that

// ...
nfa.determinized().trim();
std::cout << nfa.print_to_mata();
// ...

does not show all that clearly that what is printed is the original instance, not the determinized and trimmed one. This code is in fact a bug.

However, if we get used to nouns returning a new instance, it could work nicely, and it will be clearer.

@@ -874,6 +905,8 @@ std::ostream& std::operator<<(std::ostream& os, const Mata::Nfa::Nfa& nfa) {
return os;
}

//Are the following functions for alphabets there to stay or to be refacvtored together with alphabets?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are meant to stay (we use them in Noodler), but they will work with different alphabets and they may be refactored due to that quite a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Maybe we would not need all the variants?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not know yet. We will have to refactor alphabets, refactor how we work with alphabets in Noodler, and finally review which of these functions we need and use and which can be deleted.

@Adda0
Copy link
Collaborator

Adda0 commented Aug 1, 2023

I would not close the conversations as resolved when they are resolved but not implemented yet. Others will not see these discussions, and we will easily skip over the entire discussion and forget to act on the issue according to the final agreement in the discussion. At least until we have a separate issue, record in notes from the meetings, etc. where it cannot be missed.

@Adda0 Adda0 mentioned this pull request Aug 16, 2023
Copy link
Member

@tfiedor tfiedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this was some volume of discussion.

My general view: I would really advocate for keeping it conservative and traditional: nouns for objects/subjects and verbs for actions, embracing as much as C++ idioms as possible. Going for sheer number of characters will not lead to clarity, quite otherwise. Beware of Curse of Knowledge; knowing right now that one_letter_aut returns something and is not a property does not mean that next year you will scream in horror, what the fuck it is. Because, since I never used this function, I hardly have any idea what it really should do.

My comments were not meant to offend anyone, I apologize, if I hit some nerves, but I'm really thinking of long term not short term future and I always regretted being too smartass with my naming convetions.

P.S. Sorry for late review >.<

Comment on lines 70 to +72
bool is_included_naive(const Nfa& smaller, const Nfa& bigger, const Alphabet* alphabet = nullptr, Run* cex = nullptr);
//naive is not a good word, maybe textbook, or explicit, construct_first
//the alphabet parameter is useless in inclusion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think explicit is little bit better. textbook seems a little bit weird.

Comment on lines +106 to +118
//actually, what about to use this, with more optional parameters through structures
//struct univ_params
//{
// Alphabet* alphabet = NULL,
// Run* cex = NULL,
//};
// bool is_universal_antichains(const Nfa& aut, univ_params={NULL,NULL});
// and call as
// bool is_universal_antichains(const Nfa& aut);
// bool is_universal_antichains(const Nfa& aut,{.alphabet = bla});
// bool is_universal_antichains(const Nfa& aut,{.cex = bli});
// bool is_universal_antichains(const Nfa& aut,{.alphabet = bla, .cex = bli});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we could implement singleton-like configure structure (or structure with class parameters), where one could set configurations up and not use function parameters? The idea is something like following:

struct Configuration {
   string minimization_strategy = "default";
   string determinization_strategy = "default";
   void* cex = nullptr;
   void* alphabet = nullptr;
   ...
}
bool is_universal_antichains(const Nfa& aut);
Configuration.alphabet = bla;
bool is_universal_antichains(const Nfa& aut);
Configuration.cex = bli;
bool is_universal_antichains(const Nfa& aut);
Configuration.alphabet = blabla;
Configuration.cex = blibli;
bool is_universal_antichains(const Nfa& aut);

The benefits are that (1) configuration parameters are at one place only, (2) you can set some parameters for bigger blocks of code and not keep writing lots of parameters, ...
The cons are that (1) you write more code, (2) you might need to apply config to single scope only and then reset it, which could again results to some problems.

Comment on lines +20 to 21
//should we drop the create_ from all the names?
Nfa create_single_word_nfa(const std::vector<Symbol>& word);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, I would prefer Build:: namespace and no create_, however, for me unity is more important, and if one function does not make sense without create, then I would keep the prefix for all functions.

Comment on lines 48 to 50
Nfa construct(const Mata::Parser::ParsedSection& parsec, Alphabet* alphabet, StringToStateMap* state_map = nullptr);
//construct does not cary much info, what is better? In the comment, don't know what Parsed object is, overall I don't get what this function does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to adapt system from other libraries, that would name these kinds of fucntions as from_<FORMAT> and to_<FORMAT>. So this could be Builder::from_parsed_section(), or Builder::nfa_from_parsed_section().

Comment on lines +115 to 117
//the delta way is the cleanest, no? a private data member
//lets do it that way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong feelings either way.

using StateBoolArray = std::vector<bool>; ///< Bool array for states in the automaton.

namespace {
//Remove direct_, there are no close alternatives for NFA (unlike for Buchi). Remove "compute_" ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick to nouns = classes/structures, verbs = methods/functions. To me if we used just "noun" it would convey that fw_simulation() is a property, and it will be fast just returning some precomputed member, which in this case might not be. The compute_ fully conveys that it will take some time before it is computed.

@@ -270,6 +285,7 @@ Nfa Mata::Nfa::remove_epsilon(const Nfa& aut, Symbol epsilon) {
return result;
}

//TODO: experiment with reverts and chose which should survive
Nfa Mata::Nfa::fragile_revert(const Nfa& aut) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: it is probably better to name it revert_fragile, then if you are using autocomplete, and write revert you will see better what options you have.

@@ -558,6 +581,7 @@ std::pair<Run, bool> Mata::Nfa::get_word_for_path(const Nfa& aut, const Run& run
}

//TODO: this is not efficient
//method?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a long abandoned relict of sad past that is already surpassed by the Run structure anyway. Is it even used?

@@ -597,6 +622,7 @@ Mata::Parser::ParsedSection Mata::Nfa::serialize(
return {};
}

//method?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be explicit, while it might not be confusing for us (though I think we will eventually forget these little rules and will be angry instead) it might be confusing by new contributors, especially students.

Nfa Mata::Nfa::Algorithms::minimize_brzozowski(const Nfa& aut) {
//compute the minimal deterministic automaton, Brzozovski algorithm
return determinize(revert(determinize(revert(aut))));
}

//method?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like concatenate and concatenate_with. And I'm fine with concatenation. I like the idea of chaining, it could lead to elegant and clear code.

I have no strong feeling regarding the inplace algorithsm, I think @vhavlena is right, that the savings will be minimal.

@Adda0 Adda0 mentioned this pull request Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants