Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract file names #14

Merged
merged 13 commits into from
Jan 10, 2023
Merged

Extract file names #14

merged 13 commits into from
Jan 10, 2023

Conversation

lauraporta
Copy link
Member

No description provided.

@lauraporta lauraporta linked an issue Dec 12, 2022 that may be closed by this pull request
@lauraporta lauraporta marked this pull request as ready for review December 19, 2022 15:40
@lauraporta lauraporta changed the base branch from developement to main December 19, 2022 15:41
@lauraporta lauraporta changed the base branch from main to developement December 19, 2022 15:41
@lauraporta
Copy link
Member Author

lauraporta commented Jan 6, 2023

What there is in this PR?

This PR deals with the part of the codebase dedicated to loading the data. The feature here implemented makes it possible to get the file names to be loaded.

There are two ways to load the data:

  1. use allen-dff, a file that contains all the data in an unique matlab object. This allows the use of only one file name.
  2. load various single files separately. One way is to perform a recursive search into the file system for the correct path of the files. This option allows for project specific customization, and its logic is implemented in a project specific parser.
    I implemented both.

Description

The loading module creates a Specifications objec, which in addition to the saved configurations, will also hold the paths to the data to be loaded.
When instatiated, specs will create an instance of FolderNamingSpecs and call its method extract_file_names(), which is the core of this PR.
Its main logic, in pseudocode, is:

if allen-dff
	save allen-dff file path as "File" object
else
	for pre saved folder paths
		recursively search in the given folder
		save file path as "File" object

For recursive search I am using path.glob("**/*").

Pre saved folder paths

The pre saved folder paths are three:

  • experimental folder
  • stimulus AI schedule files folder
  • serial2p folder

Their specificity is declined in the parser Parser2pRSP; it depends on the paths that are specified in the configuration, which right now are pretty explicit. I would like them to be hidden but it goes beyond the scope of this PR.

File object

File is a new object that contains the file paths saved as Path object, as a string and classifies it according to DataType (allen-dff or other?) and AnalysisType (some files are built just for one kind of analysis).
I thought that this object could make it easier taler on to find a specific file in a group. If the file does not belong to any DataType in use, File throws an exception that is then catched by search_file_paths() in FolderNamingSpecs. This exception is used to identify a file path not to be stored.


server: 'ssh.swc.ucl.ac.uk'

paths:
winstor: '/Volumes/your_server/'
imaging: '/Volumes/path/to/imaging/data/'
winstor: '/Volumes/winstor/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General point, but I guess we don't want these paths hard coded in the repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I could add it to .gitignore to just stop tracking it.


config_path = Path(__file__).parent / "config/config.yml"
config_path = Path(__file__).parents[1] / "config/config.yml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, (eventually) the configs should live somewhere else and not be tracked by git.

winstor: '/Volumes/winstor/'
imaging: '/Volumes/winstor/swc/margrie/Chryssanthi/imaging'
allen-dff: '/Volumes/winstor/swc/margrie/Chryssanthi/imaging/allen_dff'
serial2p: '/Volumes/winstor/swc/margrie/Chryssanthi/imaging/serial2p'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand why serial2p is being processed by this repo? Isn't this tool just for functional imaging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Serial2p" here refers to a folder in which serial2p output is saved and that is then loaded. imaging, allen-dff, serial2p, stimulus-ai-schedule are names of folders from which I might want to load the data.

self.path: Path = path
self._path_str = str(path)
self.datatype = self._get_data_type()
self.analysistype = self._get_analysis_type()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(very) minor point, but for consistency and readability, I'd call this self.analysis_type

elif "allen_dff.mat" in self._path_str:
return self.DataType.ALLEN_DFF
else:
raise ValueError("File not to be used")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth logging these type of things at some point as well as throwing an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I log it in a very summarized form in a function that instantiates File. I could track the path of all discarded files, the only downside is that they would be quite a lot and the logging file would be enormous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's raises an error, presumably it shouldn't be raised often?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the exception as a way to block the instantiation of a File obj and count the number of discarded files. But well, maybe it's improper. I can think of a better way to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving a quick look online, it seems that there is a cleaner way to do it with __new__

pass

@abstractmethod
def get_path_to_stimulus_AI_schedule_files(self) -> Path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it's obvious, but I wouldn't use abbreviations (e.g. AI) if it wouldn't be totally clear what this means to anyone reading the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check the precise meaning of AI in this case and make it more explicit

@adamltyson
Copy link
Member

Looks good :)

@lauraporta lauraporta merged commit d7338f1 into developement Jan 10, 2023
@lauraporta lauraporta deleted the extract-file-names branch February 7, 2023 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract all file names of raw data
2 participants