Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added first four exercises of start-guide #133

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

josh-nook
Copy link
Contributor

Ignore in current state, this is for myself and another developer to go through over voice call.

Copy link
Collaborator

@IgWod IgWod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @josh-nook,

There are few small comments to address and the ending needs some polish (as you already know), but otherwise I think it is a quite good tutorial. Well done!

>
> **Modification:** The altering of a program

So altogether, a DBM _Tool_ is a program that can alter natively compiled user-space binary during runtime, with no source code required. We could take `simple_program` and pass it through to MAMBO as we did before, but instead of simply executing it, we could perform all sorts of modifications on it. Examples of these include:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small detail. Technically MAMBO is a DBM framework, whereas a tool is MAMBO + a specific plugin (e.g., MAMBO memcheck). Admittedly, we have been quite bad at making that distinction ourselves, but if possible, let's try to call MAMBO a DBM framework.

>
> **Debugging:** Detecting memory faults within a program

MAMBO isn't by any means the first DBM Tool to exist. [Pin](https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-dynamic-binary-instrumentation-tool.html), [Qemu](https://www.qemu.org), and [DynamoRIO](https://dynamorio.org) are all examples of DBM-based tools. So if other options are avaliable, what is the purpose of MAMBO?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the terminology is more important. It should say DBM frameworks, not DBM-based tools, since stuff like Pintools are clearly defined by PIN.


### Why MAMBO?

MAMBO was created as part of Cosmin Gorgovan's EPSRC-funded PhD in the School of Computer Science at the University of Manchester, with a handful of properties that distinguishes it from other DBMs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick. It reads like Cosmin created all those features; however, Guillermo optimised it for ARM64, and Alistair developed RISC-V support. I would say something along the lines: "It was initially developed by Cosmin's ... with other people contributing since then".


This exercise will go through how a program like our `simple_program` is executed using MAMBO, step-by-step. It's not _necessary_ content for the rest of the tutorial, but it'll certainly help you fully grasp MAMBO if you want to contribute the project.

This exercise will obfuscate for the sake of simplicity much of how MAMBO works, most notably with optimisations regarding branches.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second part of the sentence needs a better flow. Maybe: "..., most notably branch optimisations".

</div>
<br>

For simplicity, portability, and full control over execution, DBM Tools often **load target programs within their own address space**. This cannot be done with `ld`, shown on the LHS of the diagram below, so we must implement a userspace loader which for MAMBO is `libelf`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just add at the end: "Not to be confused with the Linux provided libelf: https://archlinux.org/packages/core/x86_64/libelf/"

Most optimisations are to do with the main source of overhead in DBM tools: indirect branches. Description of optimisations are out of the scope of this tutorial, so a handful of them are outlined below:

- **Inline hash lookups** are instrumented at the end of code blocks
- **Hot Paths** between basic blocks are identified and directly linked
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a bit of updating. Hot paths and traces are directly related, as traces are created for hot paths. Also, as far as I remember, traces can contain conditional branches. So the bullet points should be something like (not in these exact words):

  1. Indirect branch opts - I think that is correct
  2. Direct linking of direct branches (conditional and unconditional) to avoid calling the dispatcher
  3. Traces to optimise hot paths

- Instruction specific events


Callback functions that we write for inserting code into basic blocks (instrumentation) are registered with **scantime** with *MAMBO generated scantime events* ie. when our target program is passed through the code scanner:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about: "... are registered on scan-time to be executed with MAMBO generated scan-time events".

}
```

We've included also included a print statement so we can see the filename location and start address of each basic block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"included" is repeated twice.


The lighter sections within the label blocks represent a single basic block. For all but `LBB0_1`, they have one basic block as they end on a branch statement.

`LBB0_1` however has two basic blocks. This is because there are two branch statements: `b.ge .LBB0_4` and `b .LBB0_2`. Since a basic block is **strictly** single entry and single branch exit, `LBB0_1` will constitute as two seperate basic blocks in the code cache.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branch-link (bl) also splits the assembly block into 2 basic blocks.


// TODO

// I'm unsure as to why there is only 5 basic blocks and not 6.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would run objdump on the binary and look at the address, this may shed some lights on what is happening.

@josh-nook
Copy link
Contributor Author

Thanks for the feedback, I'll work on these at some point this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants