Skip to content

Parsing YARA files

Marek Milkovič edited this page Mar 21, 2018 · 2 revisions

Parsing YARA files

Parsing

You can either parse YARA file from your filesystem or memory buffer. Just use yaramod::parseFile function and provide it with either string representation of file path or input stream with the contents of YARA file. This function returns you either valid pointer to parsed YARA file or nullptr in case of failure.

Parsing from filesystem:

std::string filePath = "/home/.../file.yar";
auto yaraFile = yaramod::parseFile(filePath);

Parsing from memory buffer:

std::istringstream input("rule xxx { ... }");
auto yaraFile = yaramod::parseFile(input);

Failure during the parsing produces error on standard error output (std::cerr) but you can override this behavior by providing the second parameter to yaramod::parseFile with either file path or output stream and all error messages are going to be printed there.

Output errors to file:

std::string errorLog = "/home/.../error.log";
auto yaraFile = yaramod::parseFile(input, errorLog);

Output errors to memory buffer:

std::ostringstream errorLog;
auto yaraFile = yaramod::parseFile(input, errorLog);

Includes

YARA language supports inclusion of other files on the filesystem. Path provided in include directive is always relative to the YARA file on the disc. Since yaramod can also parse files from memory, relative paths are only allowed when parsing from the actual file.

Whenever yaramod runs into include, it takes the content of included file and starts parsing it as if it was in place of an include. Therefore, included content is indistinguishable from all other content in the file.

Imports

Same as with the original YARA, whenever you want to use functions from available modules, you need to import it. This way, we retain the compatibility with the original YARA compiler. In order to check what modules are imported, you can use YaraFile::getModules() method, which returns you a std::vector of pointer to yaramod::Module.

All imported modules:

for (const auto& module : yaraFile->getImports())
    std::cout << module->getName() << '\n';

Rules

Rules of the YARA file can be obtained with the method YaraFile::getRules(). Rules are always ordered as in the input file (including rules from the included files). Each rule is represented with yaramod::Rule object.

All rules in the file:

for (const auto& rule : yaraFile->getRules())
    std::cout << rule->getName() << '\n';

Meta information

Meta information are represented using yaramod::Meta. Each meta contain name of the meta value and value in form of yaramod::Literal, which is either integer, string or boolean. To obtain the printable representation of literal, use Literal::getPureText() method. There is also a method Literal::getText() which returns the textual representation for YARA file. That means no change for integer value, however boolean value is dumped using std::boolalpha as true or false, and string value is enclosed in double-quotes and all escape sequences in it.

All meta information together with its type:

for (const auto& meta : rule->getMetas()) {
    if (meta->getValue()->isString())
        std::cout << "String meta: ";
    else if (meta->getValue()->isInt())
        std::cout << "Int meta: ";
    else if (meta->getValue()->isBool())
        std::cout << "Bool meta: ";

    std::cout << meta->getName() << " = " << meta->getValue()->getPureText() << '\n';
}

Obtaining specific meta:

if (auto meta = rule->getMetaWithName("my_meta_value")) {
    // ...
}

Strings

Strings are always in the order they occur in the file. Abstract class yaramod::String is used as base for each string, which is then one of specialized classes - yaramod::PlainString, yaramod::HexString or yaramod::Regexp.

Dump all strings and their type:

for (const auto& string : rule->getStrings()) {
    if (string->isPlain())
        std::cout << "Plain string: ";
    else if (string->isHex())
        std::cout << "Hex string: ";
    else if (string->isRegexp())
        std::cout << "Regexp: ";

    std::cout << string->getIdentifier() << " = " << string->getText() << '\n';
}

Strings, as literals, also have method getPureText() which returns pure content of the string without any modifiers.

Modifiers for strings are ascii, wide, fullword and nocase. These can only be associated with plain strings, but are available for all types for possible further extensions of YARA language. Even though hex string and regular expressions are parsed into their own smaller ASTs, currently there is no way to traverse them. The only provided interface is to get the textual representation of them.

Condition

Conditions consists of expressions, which form another smaller AST inside YARA file. This AST can be traversed using visitor design-pattern.

The list of all available expression types:

String expressions
  • StringExpression - reference to string in strings section ($a01, $a02, $str)
  • StringWildcardExpression - reference to multiple strings using wildcard ($a*, $*)
  • StringAtExpression - refers to $str at <offset>
  • StringInRangeExpression - refers to $str in (<offset1> .. <offset2>)
  • StringCountExpression - refernce to number of matched string of certain string identifier (#a01, #str)
  • StringOffsetExpression - reference to first match offset (or Nth match offset) of string identifier (@a01, @a01[N])
  • StringLengthExpression - reference to length of first match (or Nth match) of string identifier (!a01, !a01[N]1)
Unary operations

All of these provide method getOperand() to return operand of an expression.

  • NotExpression - refers to logical not operator (!(@str > 10))
  • UnaryMinusExpression - refers to unary - operator (-20)
  • BitwiseNotExpression - refers to bitwise not (~uint8(0x0))
Binary operations

All of these provide methods getLeftOperand() and getRightOperand() to return both operands of an expression.

  • AndExpression - refers to logical and ($str1 and $str2)
  • OrExpression - refers to logical or ($str1 or $str2)
  • LtExpression - refers to < operator ($str1 < $str2)
  • GtExpression - refers to > operator ($str1 > $str2)
  • LeExpression - refers to <= operator (@str1 <= $str2)
  • GeExpression - refers to >= operator (@str1 >= @str2)
  • EqExpression - refers to == operator (!str1 == !str2)
  • NeqExpression - refers to != operator (!str1 != !str2)
  • ContainsExpression - refers to contains operator (pe.sections[0] contains "text")
  • MatchesExpression - refers to matches operator (pe.sections[0] matches /(text|data)/)
  • PlusExpression - refers to + operator (@str1 + 0x100)
  • MinusExpression - refers to - operator (@str1 - 0x100)
  • MultiplyExpression - refers to * operator (@str1 * 0x100)
  • DivideExpression - refers to / operator (@str1 / 0x100)
  • ModuloExpression - refers to % operator (@str1 % 0x100)
  • BitwiseXorExpression - refers to ^ operator (uint8(0x10) ^ uint8(0x20))
  • BitwiseAndExpression - refers to & operator (pe.characteristics & pe.DLL)
  • BitwiseOrExpression - refers to | operator (pe.characteristics | pe.DLL)
  • ShiftLeftExpression - refers to << operator (uint8(0x10) << 2)
  • ShiftRightExpression - refers to >> operator (uint8(0x10) >> 2)
For expressions

All of these provide method getVariable() to return variable used for iterating over the set of values (can also be any or all), getIteratedSet() to return an iterated set (can also be them) and getBody() to return the body of a for expression. For OfExpression, getBody() always returns nullptr.

  • ForIntExpression - refers to for which operates on set of integers (for all i in (1 .. 5) : ( ... ))
  • ForStringExpression - refers to for which operates on set of string identifiers (for all of ($str1, $str2) : ( ... ))
  • OfExpression - refers to of (all of ($str1, $str2))
Identificator expressions

All of these provide method getSymbol() to return symbol of an associated identifier.

  • IdExpression - refers to identifier (rule1, pe)
  • StructAccessExpression - refers to . operator for accessing structure memebers (pe.number_of_sections)
  • ArrayAccessExpression - refers to [] operator for accessing items in arrays (pe.sections[0])
  • FunctionCallExpression - refers to function call (pe.exports("ExitProcess"))
Literal expressions
  • BoolLiteralExpression - refers to true or false
  • StringLiteralExpression - refers to any sequence of characters enclosed in double-quotes ("text")
  • IntLiteralExpression - refers to any integer value be it decimal, hexadecimal or with multipliers (KB, MB) (42, -42, 0x100, 100MB)
  • DoubleLiteralExpression - refers to any floating point value (72.0, -72.0)
Keyword expressions
  • FilesizeExpression - refers to keyword filesize
  • EntrypointExpression - refers to keyword entrypoint
  • AllExpression - refers to keyword all
  • AnyExpression - refers to keyword any
  • ThemExpression - refers to keyword them
Other expressions
  • SetExpression - refers to set of either integers or string identifiers ((1,2,3,4,5), ($str*,$1,$2))
  • RangeExpression - refers to range of integers ((0x100 .. 0x200))
  • ParenthesesExpression - refers to expression enclosed in parentheses (((5 + 6) * 30))
  • IntFunctionExpression - refers to special built-in functions (u)int(8|16|32) (uint16(<offset>))
  • RegexpExpression - refers to regular expression (/<regexp>/<mods>)

Here is a small example how to dump all function calls in condition:

class FunctionCallDumper : public yaramod::ObservingVisitor {
    public:
        void visit(FunctionCallExpression* expr) override {
            std::cout << "Function call: " << expr->getFunction()->getText() << '\n';

            // Visit arguments because they can contain nested function calls
            for (auto& param : expr->getArguments())
                param->accept(this);
        }
};