Skip to content
Asad Dhorajiwala edited this page Jan 12, 2023 · 1 revision

Motivation

Memoization (or caching in general) is a very common method to optimize algorithms such as the Fibonacci sequence calculator, pathfinder, etc. In a common example, it could be pretty obvious where memoization can be used and whether it should be used. However, in some cases, it could be unintuitive and hard to see whether we can memoize any of the calls. Our tool, MemoFinder aims to help users to understand whether memoization or caching, in general, can help optimize their algorithm. The tool works as a dynamic code analysis that tracks function calls, and processing time and uses that data to create a visualization to help the user understand where memoization can be appropriate. Furthermore, the tool is able to add suggestions as to where most likely memoization will help to optimize the code.

The main reason we went with the dynamic path is that for some cases it is impossible to know if memoization will help until we actually run the code. For instance, take a recursive/brute force solution for calculating fib numbers. If the application only tries to calculate the first 2 numbers, memoization will actually not have any effect. This example could be seen as an edge case for this kind of problem, but this happens very often with web applications or any application that serves data to the user. For example, let's say you have a video streaming service and it takes a long time to render videos on the server. Hence, you decide to cache some of the videos but how can you know which ones to cache? Our application will be able to track and suggest the correct place for caching so your service can run smoothly.

User studies

User Study 1

Notes

During the first user studies, the main goal was to understand the usefulness of the tool and whether it actually helps users to understand whether memoization helps

Summary

  • The tree graph was successful in helping users understand whether memoization is necessary
  • Users liked the ease of use of the graph and its intuitiveness
  • However, they found it too hard to use when the graph became too big/wide

Changes Based on Feedback

  • We decided to add a new grid view
  • The grid consisted of nodes that were grouped by the function calls with the same attributes and returns
  • The size of the nodes represented amount of calls and time it took to execute
  • Extra information also appeared on hover over
  • This should solve the case when the tree graph gets too big to use
  • We also added a table view which listed information from our analysis in a table
  • This should give the user even more information to work with
  • After the tool has finished running, we also add a comment to the line that is causing the function to require memoization

User study 2

Notes

The 2nd user study aimed to see the usefulness of new grid view and table view

Summary

  • Users found the grid view to be not very intuitive at first, but after hovering over for extra information, they quickly understood the use
  • All the users found the table view extremely helpful and intuitive
  • Some users were confused by the meaning of Memoization Score
  • The grid and tree view were hard to read when the function names and argument list were too long

Changes Based on Feedback

  • Text wraps for both grid and tree view to ensure readability with long function names
  • Add a description for how we determine which function need to be memorized
  • changed the table view so now it's more readable and intuitive, hide the memorization score and give user direct suggestion on which function need to be memorized
  • Changed the analysis comments to be more verbose and give more information

Implementation Overview

The analysis happens in 3 stages:

  1. We augment Python AST to track function calls and processing time. The overhead is extremely minimal as this part only utilizes Python built-in modules
  2. Once the data is available, it is filtered and evaluated. Many things are calculated at this stage such as the memoization score. The memoization score represents how memoizing (or caching) a function call will actually improve the performance of the algorithm. More on how this is done here
  3. Using the evaluated data, we produce different types of graphs which should allow for a better understanding of function calls. We also insert comments to a position that we believe would be the most appropriate place to memoize the code

Final evaluation

We found the final version of the tool to be easy to use, versatile and fast. All the views were intuitive and the extra hover over information gives extra insight. Although the tool was originally meant to be used for memoization, we were able to make it work for cases where general caching is required. (Look at Video Service problem as an example). Finally, because the tool introduces a very small overhead to the original code, it is extremely fast.

There are definitely some points we wish we could work on more, however, the time constraint didn't allow it. Here are some of them:

  • Our tool does not consider the case when a function changes global state. If the function does change global state, we cannot memoize it/cache as it might change the final output
  • The tree view could use some work such as ensuring it's always zoomed in at the center and having extra tools to make it usable when the tree is extremely big. However, since it is still usable in its current state, we decided to focus on other views
  • Our current estimate for time savings is a very crude lower bound. A more sophisticated algorithm could have given us a better lower bound however we found that it would be too slow, especially for larger programs. Still, a better estimate could have been found if given more time
  • Our probability model to determine whether or not a function is memoized or not and our memoization score was not rigorously proven to be the best model/equation to determine these questions. As such there are times when we get false-positives and false-negatives. If we had more time, we could have further fine-tuned these to get the most consistent results

Team management

The team has decided to split into sub-teams and focus on specific aspects of the implementation. Approximate team setup:

  • AST team - create Python AST augment algorithm
    • Kobi
    • Akim
  • API/backend
    • Valdi
  • Frontend
    • Asad
    • Alex

Although everyone was separated into sub-teams, we still continued helping each other through code review and debugging help.