Skip to content
This repository has been archived by the owner on Jul 30, 2020. It is now read-only.

Improve formatting of comments shown in hover pop-ups #187

Closed
wants to merge 3 commits into from

Conversation

romix
Copy link
Contributor

@romix romix commented Dec 25, 2017

  • Remove different comment markers from the beginning and end of each comment line, perform some other kinds of preprocessing.
    - This is a very simplistic approach to parse comments. In the future, a more advanced real comment parser may need to be used.

  • Return two marked strings as a response to hover requests:

    • The first line is the formatted comment, which is reported as a markup string of type "text". This ensures that no C/C++ highlighting is used for the comment text.
      • The second line is the declaration/signature, which is reported as a markup string of type "c" or "c++".

} else if (lines[0].substr(0, 2) == "/*") {
// It is a /* comment. So, strip *, /* and /** from the beginning each line.
for (auto& line : lines) {
if (line.substr(0, 2) == "/*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test /** first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

if (line == "")
line = "\n";
}
if (lines.empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about stripping all leading / * ! and whitespace?

@jacobdufault
Copy link
Owner

Please make sure to write some unit tests for the format parser, for this type of thing they are extremely valuable.

}

/// Trim the spaces at the end of the string. Leave only one space at the end.
static std::string rtrim_spaces(std::string s) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:
if (s.size() && isspace(s.back())) {
s.erase(s.find_last_not_of(" \t") + 1, s.end());
}
return s;

@jacobdufault
Copy link
Owner

Btw comment parsing should happen in the indexer as well (instead of text_document_hover.cc) because the extractor is specific to how c++ comments are formatted.

@romix
Copy link
Contributor Author

romix commented Dec 25, 2017

@jacobdufault So, you mean we should parse format comments in the indexer so that text_document_hover.cc just uses already formatted comments stored in def.comments? Or do you mean something else?

@jacobdufault
Copy link
Owner

So, you mean we should parse format comments in the indexer so that text_document_hover.cc just uses already formatted comments stored in def.comments?

Yep.

@MaskRay
Copy link
Contributor

MaskRay commented Dec 25, 2017

For this code snippet:

  /** A database transaction.
   *	Every operation requires a transaction handle.
   */
  U y;

CINDEX_LINKAGE CXString clang_Cursor_getRawCommentText(CXCursor C); extracts

/** A database transaction.
   *	Every operation requires a transaction handle.
   */

where the leading space is removed, thus the first line is actually un-indented but indentation of others is kept. We should find someway to un-indent other lines.

I'm still concerned whether it is a good idea to format the lines.

@romix
Copy link
Contributor Author

romix commented Dec 25, 2017

@MaskRay But this PR removes all the formatting altogether. In this case it will produce one long line. We can of course teach each about new paragraphs, etc.

My concern is that some comments like Doxygen comment use their special mark-up, which we do not handle yet.

There are also different approaches how to format/render comments. E.g. we may want to show sections like like:

  • brief description
  • detailed description
  • Parameters: ...
  • Exceptions: ...
  • Returns: ...

In ideal case, it would be nice to be able to click on the types and references inside this hover and go to their definitions....

But probably we should do it slowly, step-by-step. It will definitely require multiple PRs.

@MaskRay
Copy link
Contributor

MaskRay commented Dec 25, 2017

Unless we are open to clang C++ API, what we have are only these three functions:

/**
 * \brief Given a cursor that represents a declaration, return the associated
 * comment's source range.  The range may include multiple consecutive comments
 * with whitespace in between.
 */
CINDEX_LINKAGE CXSourceRange clang_Cursor_getCommentRange(CXCursor C);

/**
 * \brief Given a cursor that represents a declaration, return the associated
 * comment text, including comment markers.
 */
CINDEX_LINKAGE CXString clang_Cursor_getRawCommentText(CXCursor C);

/**
 * \brief Given a cursor that represents a documentable entity (e.g.,
 * declaration), return the associated \\brief paragraph; otherwise return the
 * first paragraph.
 */
CINDEX_LINKAGE CXString clang_Cursor_getBriefCommentText(CXCursor C);

For

//! first TextComment
//!
//! first line in second TextComment
//! second line in second TextComment
int t;

clang -std=c++11 -fsyntax-only -Xclang -ast-dump a.cc gives:

`-VarDecl 0x55ff65dfc928 <a.cc:5:1, col:5> col:5 t 'int'
  `-FullComment 0x55ff65dfcae0 <line:1:4, line:4:37>
    |-ParagraphComment 0x55ff65dfca40 <line:1:4, col:21>
    | `-TextComment 0x55ff65dfca10 <col:4, col:21> Text=" first TextComment"
    `-ParagraphComment 0x55ff65dfcab0 <line:3:4, line:4:37>
      |-TextComment 0x55ff65dfca60 <line:3:4, col:36> Text=" first line in second TextComment"
      `-TextComment 0x55ff65dfca80 <line:4:4, col:37> Text=" second line in second TextComment"

clang has idea of comment components but clang-c loses all of this detailed information. clang_Cursor_getRawCommentText is implemented by https:/llvm-mirror/clang/blob/master/lib/AST/RawCommentList.cpp#L177 . I am playing with clang_Cursor_getCommentRange to see if that gives sufficient information to remove indentation in other lines.

Showing sections (Parameters, Exceptions, Returns...) is unlikely but i'd like to see what we can achieve with minimal effort.

Which editor does you use? For Emacs/Vim, I have reported the UI challenge to the language client plugins upstream (autozimu/LanguageClient-neovim#224 emacs-lsp/lsp-ui#17) sebastiencs has plan to add an interface for multi-line information and SpaceVim's author wsdjeg also plans to support LSP.

@romix
Copy link
Contributor Author

romix commented Dec 25, 2017

@MaskRay These days I mostly use Visual Studio Code with cquery. But I also use Spacemacs and NeoVim with ... YouCompleteMe, because I was not able to properly configure cquery in those editors ;-)

I agree that Clang's C++ API is more powerful and could be rather useful for comments processing.

Showing sections (Parameters, Exceptions, Returns...) is unlikely but i'd like to see what we can achieve with minimal effort.

I think it could be done, but I'm not sure about rendering it, as it may very much depend on the editor and how it render's hover responses with multiple strings and with different markup styles. E.g. in theory, one could produce HTML/Markdown markup for the documentation part of the hover and hope that the client would render it properly.

@romix
Copy link
Contributor Author

romix commented Dec 25, 2017

BTW, Clang comments processing based on CXCursor has its own peculiarities. For some reason, it does not report comments for macro definitions. And I'm sure there are other edge cases like this.

@romix
Copy link
Contributor Author

romix commented Dec 25, 2017

Another special case: With Clang C APIs it looks like we cannot collect comments like this:

int x; /// This is a comment for x

Even though -ast-dump shows that this comment is a child of a declaration.

@MaskRay
Copy link
Contributor

MaskRay commented Dec 25, 2017

@romix https:/llvm-mirror/clang/blob/master/test/Index/parse-all-comments.c seems to deliberately skip trailing ///. ///< markers should be used for trailing comments per https://www.stack.nl/~dimitri/doxygen/manual/docblocks.html

int trdoxyD;  // Not a Doxygen trailing comment.   trdoxyD NOT_DOXYGEN
              /// This comment doesn't get merged.   trdoxyE IS_DOXYGEN
int trdoxyE;

int trdoxyF;  /// A Doxygen non-trailing comment that gets dropped on the floor.
              // This comment will also be dropped.
int trdoxyG;  // This one won't.  trdoxyG NOT_DOXYGEN

MaskRay added a commit that referenced this pull request Dec 26, 2017
@MaskRay
Copy link
Contributor

MaskRay commented Dec 26, 2017

Adapted the multiple MarkedString part and commited 0c8e95e

@romix
Copy link
Contributor Author

romix commented Dec 26, 2017

I'm working on an improved version, which understands the typical Doxygen markup and splits complex multi-line comments into "sections". It is somewhat similar to Clang's C++ API internals, but simpler. This should in theory allow for showing sections (Parameters, Exceptions, Returns...).

@romix
Copy link
Contributor Author

romix commented Dec 26, 2017

@MaskRay Have a look at the updated version that I pushed here. It is not rebased on your commits because there are too many changes here. I'd like to hear your opinion about the overall approach.

Basically, I introduced a machinery to parse comments into a set of sections. Once the set of comment sections is created it can be formatted in any way to produce the final response for the hover request.

@MaskRay
Copy link
Contributor

MaskRay commented Dec 26, 2017

Sorry if my commits cause any trouble to you. My concern is clang has idea of different comment types (https:/llvm-mirror/clang/blob/master/lib/AST/RawCommentList.cpp https:/llvm-mirror/clang/blob/master/lib/AST/CommentBriefParser.cpp ...). By using CXString clang_Cursor_getRawCommentText(...), we lose information which is available in its backing C++ interface and doing a parser on that seems fragile and a lot of extra work that can be avoided.

@jacobdufault

@MaskRay
Copy link
Contributor

MaskRay commented Dec 26, 2017

I want to see how comments are rendered in VSCode/Emacs/Vim, but sadly the cquery plugin in my vscode no longer works (there is no indexer thread in the release/bin/cquery process for unknown reason). There is some ongoing support in lsp-ui-doc.el emacs-lsp/lsp-ui#19 (thanks so much to sebastiencs!) I really want to see how the the language server (cquery) interoperates with UIs (lsp-ui-doc, neovim floating window SpaceVim/SpaceVim#1102 )

@romix
Copy link
Contributor Author

romix commented Dec 26, 2017

@MaskRay I rebased on master, so that it is possible to build for others.

@romix
Copy link
Contributor Author

romix commented Dec 26, 2017

As an example of a produced hover, for this function:

/// This is a test function.
///
/// \param[in] x first parameter
/// @param y third param has to
///          be >= 0 
/// @exceptions ex1, ex2
/// \returns the sum of params and is >= 0
int testFunction3(int x, int y) throw(ex1, ex2) {
  return x + y;
}

it produces this hover in VSCode:
hoverwithparametersdescription

@romix
Copy link
Contributor Author

romix commented Dec 26, 2017

@jacobdufault @MaskRay BTW, one annoying thing that you can see in the hover image above is that the signature is shown without parameter names, but parameter descriptions refer to them. It would be nice if we could show parameter names in the signature as well, but I don't know if this information is stripped by cquery or clang.

@MaskRay
Copy link
Contributor

MaskRay commented Dec 26, 2017

The type you see in the image comes from var->def.detailed_name, which in turn comes from
indexer.cc:1152

      std::string type_name =
          ToString(clang_getTypeSpelling(clang_getCursorType(decl->cursor)));
CXString clang_getTypeSpelling(CXType CT) {
  QualType T = GetQualType(CT);
  if (T.isNull())
    return cxstring::createEmpty();

  CXTranslationUnit TU = GetTU(CT);
  SmallString<64> Str;
  llvm::raw_svector_ostream OS(Str);
  PrintingPolicy PP(cxtu::getASTUnit(TU)->getASTContext().getLangOpts());

  T.print(OS, PP);

  return cxstring::createDup(OS.str());
}

I think parameter names are not stored in QualType.

@romix
Copy link
Contributor Author

romix commented Dec 27, 2017

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

@jacobdufault
Copy link
Owner

I'm ok with the approach here, but the parsing code really needs tests. It shouldn't be too much work to write them.

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

How? I think that'd be a nice improvement.

@romix
Copy link
Contributor Author

romix commented Dec 27, 2017

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

@jacobdufault Here is one possible approach to create function signatures with parameter names. It is inspired by how KDevelop does it.

An alternative approach could be to take the signature that is currently produced by cquery and inject parameter names at the right positions. It would require some logic similar to the current logic used to find the place to inject a fully qualified function name.

@romix
Copy link
Contributor Author

romix commented Jan 20, 2018

@MaskRay Wow! What you found is very interesting.

Let's say we have this function declaration:

// Computes the edit distance of strings [a,a+la) and [b,b+lb) with Eugene W.
// Myers' O(ND) diff algorithm.
// Costs: insertion=1, deletion=1, no substitution.
// If the distance is larger than threshold, returns threshould + 1.
// \param a first string
// \param b second string
// \returns edit distance between \p a and \p b.
int MyersDiff(const char* a, int la, const char* b, int lb, int threshold);

I looked at the XML produced by the APIs and it looks like this:

<Function file="/home/xyz/src/cquery/src/working_files.cc" line="45" column="5">
<Name>MyersDiff</Name>
<USR>c:workingfiles.cc@aN@F@MyersDiff#*1C#I#S0#I#I#</USR>
<Declaration>int MyersDiff(const char a, int la, const char b, int lb, int threshold)</Declaration><Abstract>
<Para> Computes the edit distance of strings [a,a+la) and [b,b+lb) with Eugene W. Myers' O(ND) diff algorithm. Costs: insertion=1, deletion=1, no substitution. If the distance is larger than threshold, returns threshould + 1. </Para>
</Abstract>
<Parameters>
  <Parameter>
     <Name>a</Name>
     <Index>0</Index>
     <Direction isExplicit="0">in</Direction>
     <Discussion>
       <Para> first string </Para>
     </Discussion>
  </Parameter>
  <Parameter>
     <Name>b</Name>
     <Index>2</Index>
     <Direction isExplicit="0">in</Direction>
     <Discussion>
       <Para> second string </Para>
     </Discussion>
  </Parameter>
</Parameters>
<ResultDiscussion>
   <Para> edit distance between <monospaced>a</monospaced> and <monospaced>b.
   </monospaced></Para>
</ResultDiscussion>
</Function>

XML format seems to be the only one preserving the semantic structure of the comment. But as we can see, it is rather wasteful and it is also redundant when it comes to certain pieces of information like USR, file, Declaration.

BTW, ycmd seems to use it only to get the Declaration as far as I can see. But cquery does not have this issue, because we found another, less wasteful way to create declarations.

When it comes to comments formatting, the only thing YCMD seems to do is to take the RAW comment (and not what we see above) and strip comment markers at the beginning and at the end of all lines. They do not seem to e.g. highlight the parameter names or show them in bold, etc.

So I wonder what we should do with this API. One issue with it is that you can get the parsed comment only based on the cursor. You cannot get it e.g. from a string with the raw comment. It means that we need to obtain parsed comments during indexing. But then we need to serialize them somehow, I guess. And the only way provided by Clang for the serialization is to produce the XML. Of course, we can strip some parts of this XML, which are redundant. Or we can traverse the CXComment, extract only what we need and then serialize this extracted structural information.

Thoughts about how to proceed?

@romix
Copy link
Contributor Author

romix commented Jan 20, 2018

BTW, should we want to get the Declaration from the XML comment, it is a bit different from what we generate. It seems to basically contain the source range for the corresponding declaration rather than being reconstructed from the function types. This way, it looks more similar to what is actually written in the source code as apposed to the reconstructed view as seen by the Clang's type checker ;-)

@MaskRay
Copy link
Contributor

MaskRay commented Jan 21, 2018

We may keep current type signature which uses qualified names. But it is indeed interesting to ask to what extent shall we describe the type signature.

@romix
Copy link
Contributor Author

romix commented Jan 21, 2018

@MaskRay But what about the parsed comment itself? If we would use it, we wouldn't need a lot of complex code to parse it, split into sections, find parameter references, etc. Clang would do it for us.

@MaskRay
Copy link
Contributor

MaskRay commented Jan 21, 2018

I'm not clear what UI benefits parsed comments will bring..... (probably because I'm using Emacs and the LSP toolchain definitely lacks such thing)

@HotelCalifornia
Copy link

HotelCalifornia commented Mar 12, 2018

I'm not clear what UI benefits parsed comments will bring.....

here's an example of some formatting as it is:

note that there seem to be missing newlines (e.g. when I document errno values and between parameters).

as for doxygen parsing, I think that would be a nice-to-have (would certainly look a little nicer without the doxygen directives), but is that something the language server should be responsible for? (I might not be too clear on the division of labor here)

@edjubuh
Copy link
Contributor

edjubuh commented Mar 20, 2018

For formatting, clang gives us a formatted HTML string: https:/llvm-mirror/clang/blob/master/include/clang-c/Documentation.h#L530.

I wonder if we could make this a config option for clients to select. If the client supports rendering HTML (like VSCode and Atom), then they can enable this option and cquery will call the necessary libclang functions to get the formatted HTML docstring.

@jacobdufault
Copy link
Owner

I'm willing to merge the PR, but there is non-trivial parsing logic so I'd like to have some unit tests before doing so.

@jacobdufault
Copy link
Owner

@romix I'll close the PR for now, please reopen if you continue working on it.

MaskRay added a commit to MaskRay/ccls that referenced this pull request Mar 31, 2018
MaskRay added a commit to MaskRay/ccls that referenced this pull request Mar 31, 2018
MaskRay added a commit to MaskRay/ccls that referenced this pull request Apr 17, 2018
@philip-n
Copy link

I understand that this PR is on hiatus, as it initially lacked tests and there has been no more work invested since. I'm wondering if there is any other work being done towards this topic?

@romix Any chance you might revisit this at some time?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants