Extract file names #14

lauraporta · 2022-12-12T16:54:22Z

No description provided.

…ncomplete

…file-names

lauraporta · 2023-01-06T14:51:48Z

What there is in this PR?

This PR deals with the part of the codebase dedicated to loading the data. The feature here implemented makes it possible to get the file names to be loaded.

There are two ways to load the data:

use allen-dff, a file that contains all the data in an unique matlab object. This allows the use of only one file name.
load various single files separately. One way is to perform a recursive search into the file system for the correct path of the files. This option allows for project specific customization, and its logic is implemented in a project specific parser.
I implemented both.

Description

The loading module creates a Specifications objec, which in addition to the saved configurations, will also hold the paths to the data to be loaded.
When instatiated, specs will create an instance of FolderNamingSpecs and call its method extract_file_names(), which is the core of this PR.
Its main logic, in pseudocode, is:

if allen-dff
	save allen-dff file path as "File" object
else
	for pre saved folder paths
		recursively search in the given folder
		save file path as "File" object

For recursive search I am using path.glob("**/*").

Pre saved folder paths

The pre saved folder paths are three:

experimental folder
stimulus AI schedule files folder
serial2p folder

Their specificity is declined in the parser Parser2pRSP; it depends on the paths that are specified in the configuration, which right now are pretty explicit. I would like them to be hidden but it goes beyond the scope of this PR.

File object

File is a new object that contains the file paths saved as Path object, as a string and classifies it according to DataType (allen-dff or other?) and AnalysisType (some files are built just for one kind of analysis).
I thought that this object could make it easier taler on to find a specific file in a group. If the file does not belong to any DataType in use, File throws an exception that is then catched by search_file_paths() in FolderNamingSpecs. This exception is used to identify a file path not to be stored.

adamltyson · 2023-01-09T11:08:44Z

load_suite2p/config/config.yml


 server: 'ssh.swc.ucl.ac.uk'

 paths:
- winstor: '/Volumes/your_server/'
- imaging: '/Volumes/path/to/imaging/data/'
+ winstor: '/Volumes/winstor/'


General point, but I guess we don't want these paths hard coded in the repo.

I agree, I could add it to .gitignore to just stop tracking it.

adamltyson · 2023-01-09T11:09:09Z

load_suite2p/load/load_data.py


-config_path = Path(__file__).parent / "config/config.yml"
+config_path = Path(__file__).parents[1] / "config/config.yml"


As above, (eventually) the configs should live somewhere else and not be tracked by git.

adamltyson · 2023-01-09T11:10:39Z

load_suite2p/config/config.yml

+ winstor: '/Volumes/winstor/'
+ imaging: '/Volumes/winstor/swc/margrie/Chryssanthi/imaging'
+ allen-dff: '/Volumes/winstor/swc/margrie/Chryssanthi/imaging/allen_dff'
+ serial2p: '/Volumes/winstor/swc/margrie/Chryssanthi/imaging/serial2p'


I don't think I understand why serial2p is being processed by this repo? Isn't this tool just for functional imaging?

"Serial2p" here refers to a folder in which serial2p output is saved and that is then loaded. imaging, allen-dff, serial2p, stimulus-ai-schedule are names of folders from which I might want to load the data.

adamltyson · 2023-01-09T11:16:21Z

load_suite2p/objects/file.py

+ self.path: Path = path
+ self._path_str = str(path)
+ self.datatype = self._get_data_type()
+ self.analysistype = self._get_analysis_type()


(very) minor point, but for consistency and readability, I'd call this self.analysis_type

adamltyson · 2023-01-09T11:22:11Z

load_suite2p/objects/file.py

+ elif "allen_dff.mat" in self._path_str:
+ return self.DataType.ALLEN_DFF
+ else:
+ raise ValueError("File not to be used")


May be worth logging these type of things at some point as well as throwing an error.

I log it in a very summarized form in a function that instantiates File. I could track the path of all discarded files, the only downside is that they would be quite a lot and the logging file would be enormous.

if it's raises an error, presumably it shouldn't be raised often?

I used the exception as a way to block the instantiation of a File obj and count the number of discarded files. But well, maybe it's improper. I can think of a better way to do it.

Giving a quick look online, it seems that there is a cleaner way to do it with __new__

adamltyson · 2023-01-09T11:26:45Z

load_suite2p/objects/parsers2p/parser2p.py

+ pass
+
+ @abstractmethod
+ def get_path_to_stimulus_AI_schedule_files(self) -> Path:


I'm sure it's obvious, but I wouldn't use abbreviations (e.g. AI) if it wouldn't be totally clear what this means to anyone reading the code.

I'll double check the precise meaning of AI in this case and make it more explicit

adamltyson · 2023-01-09T11:27:49Z

Looks good :)

Add placeholder for PR

e96ee1f

lauraporta linked an issue Dec 12, 2022 that may be closed by this pull request

Extract all file names of raw data #13

Closed

lauraporta added 6 commits December 13, 2022 17:20

Add the logic to search for file names

e6b91cc

Fix small error in test

e0c1013

Add categorizing functions, looking for paths of data to be loaded, i…

73f9a68

…ncomplete

Add File class and improvements on FolderNamingSpecs class

61260b2

Apply changes from branch scaffold

52b18fa

Merge commit 'ba9904f4769e3214ef3cc736b5244cbc761ad14d' into extract-…

f877277

…file-names

lauraporta marked this pull request as ready for review December 19, 2022 15:40

lauraporta changed the base branch from developement to main December 19, 2022 15:41

lauraporta changed the base branch from main to developement December 19, 2022 15:41

Refactor and goodbye to os module

b673314

lauraporta requested a review from adamltyson January 6, 2023 14:53

adamltyson approved these changes Jan 9, 2023

View reviewed changes

lauraporta added 5 commits January 10, 2023 13:19

Stop tracking config file

ce658ca

Use env variables to store the path of config

7be4e11

Add renaming

e2cfe63

Change handling of DataType not found, remove exception

486ad19

Remove test linked to specific config file

cc2a549

lauraporta merged commit d7338f1 into developement Jan 10, 2023

lauraporta deleted the extract-file-names branch February 7, 2023 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract file names #14

Extract file names #14

lauraporta commented Dec 12, 2022

lauraporta commented Jan 6, 2023 •

edited

Loading

adamltyson Jan 9, 2023

lauraporta Jan 9, 2023

adamltyson Jan 9, 2023

adamltyson Jan 9, 2023

lauraporta Jan 9, 2023

adamltyson Jan 9, 2023

adamltyson Jan 9, 2023

lauraporta Jan 9, 2023

adamltyson Jan 9, 2023

lauraporta Jan 9, 2023

lauraporta Jan 9, 2023

adamltyson Jan 9, 2023

lauraporta Jan 9, 2023

adamltyson commented Jan 9, 2023


		config_path = Path(__file__).parent / "config/config.yml"
		config_path = Path(__file__).parents[1] / "config/config.yml"

Extract file names #14

Extract file names #14

Conversation

lauraporta commented Dec 12, 2022

lauraporta commented Jan 6, 2023 • edited Loading

What there is in this PR?

Description

Pre saved folder paths

File object

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamltyson commented Jan 9, 2023

lauraporta commented Jan 6, 2023 •

edited

Loading