Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify IR data structures #82

Closed
13 tasks done
fabianschuiki opened this issue Jan 31, 2020 · 1 comment
Closed
13 tasks done

Simplify IR data structures #82

fabianschuiki opened this issue Jan 31, 2020 · 1 comment
Labels
A-ir Area: Intermediate representation. P-long-term Priority: Long-term endeavour.
Milestone

Comments

@fabianschuiki
Copy link
Owner

fabianschuiki commented Jan 31, 2020

Functions, Processes, Entities

Functions, processes, and entities are fairly uniform. They merely differ in their signatures (whether they have a return type or a list of output arguments), and whether they contain basic blocks or just a set of instructions.

Working with LLHD could be greatly simplified by unifying the Function, Process, and Entity structs into one Unit struct. Differences in signature are already handled, and entities can be represented by having a single implicit basic block.

CFG, DFG, Layout

The current setup with separate CFG, DFG, and layouts is rather cumbersome to work with. Rather consider consolidating ControlFlowGraph, DataFlowGraph, FunctionLayout, and InstLayout into one data structure that is shared among functions, processes, and entities.

Improve IR manipulation with a single builder

The graph-ir branch contains a quick trial on how the IR could be revamped. Part of that is a new builder/modifier scheme where the user does not directly interact with modules, functions, processes, entities, blocks, and instructions, but rather uses builder instances. These lock the module or unit in place using a lifetime and are the only means to modify these structs -- they don't provide such functionality themselves.

llhd/examples/gir.rs

Lines 47 to 85 in 75aeda7

impl Module {
/// Create a new module.
pub fn new() -> Self {
Self {
present_units: Default::default(),
units: Default::default(),
}
}
/// Modify the module.
pub fn modify(&mut self) -> ModuleBuilder {
ModuleBuilder(UnsafeCell::new(self))
}
fn alloc_unit(&self, data: UnitData) -> UnitId {
// Safe because we only add elements. This may cause the vector to grow,
// which moves the boxes around. But since they're boxes, the referred
// to data will not move.
let mut units = unsafe { &mut *self.units.get() };
for (id, slot) in units.iter_mut().enumerate() {
if slot.is_none() {
*slot = Some(Box::new(data));
return UnitId(id as u32);
}
}
let id = UnitId(units.len() as u32);
units.push(Some(Box::new(data)));
id
}
fn dealloc_unit(&mut self, unit: UnitId) -> UnitData {
// Safe because the function takes &mut, ensuring that no references to
// any of the units exist.
unsafe {
assert!(!(*self.present_units.get()).contains(unit.0));
*(*self.units.get())[unit.0 as usize].take().unwrap()
}
}
}

llhd/examples/gir.rs

Lines 107 to 155 in 75aeda7

impl<'m> ModuleBuilder<'m> {
pub fn new_function<'u>(&'u self, name: UnitName) -> UnitBuilder<'m, 'u> {
self.new_unit(UnitData::new(UnitKind::Function, name))
}
pub fn new_process<'u>(&'u self, name: UnitName) -> UnitBuilder<'m, 'u> {
self.new_unit(UnitData::new(UnitKind::Process, name))
}
pub fn new_entity<'u>(&'u self, name: UnitName) -> UnitBuilder<'m, 'u> {
self.new_unit(UnitData::new(UnitKind::Entity, name))
}
fn new_unit<'u>(&'u self, data: UnitData) -> UnitBuilder<'m, 'u> {
// Safe because we hand out pointers on the heap.
let id = self.alloc_unit(data);
let data = unsafe {
(*self.units.get())[id.0 as usize]
.as_mut()
.unwrap()
.as_mut()
};
UnitBuilder(Unit(id, data as *const _, PhantomData), data, PhantomData)
}
pub fn add_unit(&self, unit: Unit<'m>) {}
pub fn remove_unit(&self, unit: Unit<'m>) {}
/// Modify a unit in the module.
///
/// Only one module can be modified at the time through this function, since
/// we cannot guarantee that the user requests modification of the same unit
/// twice. If you want to modify units in parallel, use `modify_units`.
pub fn modify_unit<'u>(&'u mut self, unit: Unit<'m>) -> UnitBuilder<'m, 'u> {
// Safe because the function is &mut, enforcing a single reference.
let data = unsafe {
(*self.units.get())[(unit.0).0 as usize]
.as_mut()
.unwrap()
.as_mut() as *mut _
};
UnitBuilder(unit, data, PhantomData)
}
/// Modify all units in the module in parallel.
pub fn modify_units<'u>(&'u mut self) -> Vec<UnitBuilder<'m, 'u>> {
vec![]
}
}

This provides a single point of modification for the IR, and allows the code to perform a lot of bookkeeping on the side, such as pred/succ and usage tables, as well as dominator tree invalidation. The current IR would greatly benefit from such a scheme as it streamlines interaction with LLHD and provides more opportunities to make things fast on the inside.

An interesting improvement would be to make the changes applied to these builders deferred, such that the user can accumulate modifications which then only become visible upon committing them. This would allow the user to interact with analyses such as DTs and TRGs without them being invalidated by modification, with the express mental model being that the changes are delayed and as such the analyses remain valid.

An important side-effect of this is that modification through a single structure allows that structure to maintain a lot of accelerating lookup data structures, such as predecessor/successor lists, terminator maps, phi node lists, cached dominator tables, etc.

Todo

Modification through a single builder

  • Add CFG functions to builder and deprecate them in the CFG.
  • Add DFG functions to builder and deprecate them in the DFG.
  • Add layout functions to builder and deprecate them in the layout.
  • Remove &mut self functions from Unit trait
  • Remove &self functions from UnitBuilder trait
  • Unify Entity with the rest by adding a single default block
  • Collapse Entity, Process, and Function into a UnitData struct
  • Remove ModUnits and replace with explicit list of units and declarations
  • Convert Unit and UnitBuilder into structs
  • Change dump() to take Unit instead of DFG and optional CFG
  • Switch DominatorTree and PredecessorTable to use Unit
  • Move FunctionInsertPos functions into Unit
  • Make cfg, dfg, and layout private
@fabianschuiki fabianschuiki added A-ir Area: Intermediate representation. P-long-term Priority: Long-term endeavour. labels Jan 31, 2020
@fabianschuiki fabianschuiki added this to the v0.13 milestone Feb 8, 2020
@fabianschuiki fabianschuiki changed the title Simply CFG, DFG, layout Simply IR data structures Feb 9, 2020
@fabianschuiki fabianschuiki changed the title Simply IR data structures Simplify IR data structures Feb 9, 2020
@fabianschuiki
Copy link
Owner Author

Improved IR node handling moved to #107.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ir Area: Intermediate representation. P-long-term Priority: Long-term endeavour.
Projects
None yet
Development

No branches or pull requests

1 participant