Simplify IR data structures #82

fabianschuiki · 2020-01-31T14:43:58Z

Functions, Processes, Entities

Functions, processes, and entities are fairly uniform. They merely differ in their signatures (whether they have a return type or a list of output arguments), and whether they contain basic blocks or just a set of instructions.

Working with LLHD could be greatly simplified by unifying the Function, Process, and Entity structs into one Unit struct. Differences in signature are already handled, and entities can be represented by having a single implicit basic block.

CFG, DFG, Layout

The current setup with separate CFG, DFG, and layouts is rather cumbersome to work with. Rather consider consolidating ControlFlowGraph, DataFlowGraph, FunctionLayout, and InstLayout into one data structure that is shared among functions, processes, and entities.

Improve IR manipulation with a single builder

The graph-ir branch contains a quick trial on how the IR could be revamped. Part of that is a new builder/modifier scheme where the user does not directly interact with modules, functions, processes, entities, blocks, and instructions, but rather uses builder instances. These lock the module or unit in place using a lifetime and are the only means to modify these structs -- they don't provide such functionality themselves.

llhd/examples/gir.rs

Lines 47 to 85 in 75aeda7

 impl Module { 

 /// Create a new module. 

 pub fn new() -> Self { 

 Self { 

 present_units: Default::default(), 

 units: Default::default(), 

 } 

 } 

 /// Modify the module. 

 pub fn modify(&mut self) -> ModuleBuilder { 

 ModuleBuilder(UnsafeCell::new(self)) 

 } 

 fn alloc_unit(&self, data: UnitData) -> UnitId { 

 // Safe because we only add elements. This may cause the vector to grow, 

 // which moves the boxes around. But since they're boxes, the referred 

 // to data will not move. 

 let mut units = unsafe { &mut *self.units.get() }; 

 for (id, slot) in units.iter_mut().enumerate() { 

 if slot.is_none() { 

 *slot = Some(Box::new(data)); 

 return UnitId(id as u32); 

 } 

 } 

 let id = UnitId(units.len() as u32); 

 units.push(Some(Box::new(data))); 

 id 

 } 

 fn dealloc_unit(&mut self, unit: UnitId) -> UnitData { 

 // Safe because the function takes &mut, ensuring that no references to 

 // any of the units exist. 

 unsafe { 

 assert!(!(*self.present_units.get()).contains(unit.0)); 

 *(*self.units.get())[unit.0 as usize].take().unwrap() 

 } 

 } 

 }

llhd/examples/gir.rs

Lines 107 to 155 in 75aeda7

 impl<'m> ModuleBuilder<'m> { 

 pub fn new_function<'u>(&'u self, name: UnitName) -> UnitBuilder<'m, 'u> { 

 self.new_unit(UnitData::new(UnitKind::Function, name)) 

 } 

 pub fn new_process<'u>(&'u self, name: UnitName) -> UnitBuilder<'m, 'u> { 

 self.new_unit(UnitData::new(UnitKind::Process, name)) 

 } 

 pub fn new_entity<'u>(&'u self, name: UnitName) -> UnitBuilder<'m, 'u> { 

 self.new_unit(UnitData::new(UnitKind::Entity, name)) 

 } 

 fn new_unit<'u>(&'u self, data: UnitData) -> UnitBuilder<'m, 'u> { 

 // Safe because we hand out pointers on the heap. 

 let id = self.alloc_unit(data); 

 let data = unsafe { 

 (*self.units.get())[id.0 as usize] 

 .as_mut() 

 .unwrap() 

 .as_mut() 

 }; 

 UnitBuilder(Unit(id, data as *const _, PhantomData), data, PhantomData) 

 } 

 pub fn add_unit(&self, unit: Unit<'m>) {} 

 pub fn remove_unit(&self, unit: Unit<'m>) {} 

 /// Modify a unit in the module. 

 /// 

 /// Only one module can be modified at the time through this function, since 

 /// we cannot guarantee that the user requests modification of the same unit 

 /// twice. If you want to modify units in parallel, use `modify_units`. 

 pub fn modify_unit<'u>(&'u mut self, unit: Unit<'m>) -> UnitBuilder<'m, 'u> { 

 // Safe because the function is &mut, enforcing a single reference. 

 let data = unsafe { 

 (*self.units.get())[(unit.0).0 as usize] 

 .as_mut() 

 .unwrap() 

 .as_mut() as *mut _ 

 }; 

 UnitBuilder(unit, data, PhantomData) 

 } 

 /// Modify all units in the module in parallel. 

 pub fn modify_units<'u>(&'u mut self) -> Vec<UnitBuilder<'m, 'u>> { 

 vec![] 

 } 

 }

This provides a single point of modification for the IR, and allows the code to perform a lot of bookkeeping on the side, such as pred/succ and usage tables, as well as dominator tree invalidation. The current IR would greatly benefit from such a scheme as it streamlines interaction with LLHD and provides more opportunities to make things fast on the inside.

An interesting improvement would be to make the changes applied to these builders deferred, such that the user can accumulate modifications which then only become visible upon committing them. This would allow the user to interact with analyses such as DTs and TRGs without them being invalidated by modification, with the express mental model being that the changes are delayed and as such the analyses remain valid.

An important side-effect of this is that modification through a single structure allows that structure to maintain a lot of accelerating lookup data structures, such as predecessor/successor lists, terminator maps, phi node lists, cached dominator tables, etc.

Todo

Modification through a single builder

The text was updated successfully, but these errors were encountered:

fabianschuiki · 2020-04-12T07:09:45Z

Improved IR node handling moved to #107.

fabianschuiki added A-ir Area: Intermediate representation. P-long-term Priority: Long-term endeavour. labels Jan 31, 2020

fabianschuiki mentioned this issue Jan 31, 2020

Simplify functions, processes, and entities #83

Closed

fabianschuiki added improvement and removed improvement labels Feb 6, 2020

fabianschuiki added this to the v0.13 milestone Feb 8, 2020

fabianschuiki changed the title ~~Simply CFG, DFG, layout~~ Simply IR data structures Feb 9, 2020

fabianschuiki changed the title ~~Simply IR data structures~~ Simplify IR data structures Feb 9, 2020

fabianschuiki mentioned this issue Feb 9, 2020

Improve IR manipulation with a single builder #85

Closed

fabianschuiki closed this as completed Apr 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify IR data structures #82

Simplify IR data structures #82

fabianschuiki commented Jan 31, 2020 •

edited

Loading

fabianschuiki commented Apr 12, 2020

Simplify IR data structures #82

Simplify IR data structures #82

Comments

fabianschuiki commented Jan 31, 2020 • edited Loading

Functions, Processes, Entities

CFG, DFG, Layout

Improve IR manipulation with a single builder

Todo

Modification through a single builder

fabianschuiki commented Apr 12, 2020

fabianschuiki commented Jan 31, 2020 •

edited

Loading