Skip to content
README.md 8.31 KiB
Newer Older
Joaquín Chaves's avatar
Joaquín Chaves committed
# SeaBASS Task Tools

Joaquín Chaves's avatar
Joaquín Chaves committed
The ```seabass-task-tools``` module provides tools for loading groups of SeaBASS files into standardized data structures, along with helper functions for analyzing, plotting, and manipulating the bundled data.

Joaquín Chaves's avatar
Joaquín Chaves committed
The ```BundleSB```class in Python is designed to bundle and standardize data from multiple SeaBASS files. It takes a list of file paths pointing to separate SeaBASS files, extracts user-specified variables and key metadata, and collates the data into a structured dictionary format.

Joaquín Chaves's avatar
Joaquín Chaves committed
The goal of ```BundleSB``` is to make it easier to load, analyze, and visualize groups of related SeaBASS data files in a consistent manner. Different SeaBASS files frequently contain slightly different variables, depths, timestamps etc. which makes aggregation tricky. BundleSB abstracts these inconsistencies away behind a clean dictionary interface.
Joaquín Chaves's avatar
Joaquín Chaves committed

In addition to the core ```BundleSB``` class for bundling SeaBASS data, this Python module also includes a suite of helper functions for analyzing, plotting, and working with ```BundleSB``` objects and SeaBASS files in general. For example, handy functions like ```list_sb()``` and ```plot_depth_profile()``` reduce boilerplate code for common tasks like file listing and profile visualization. There are also specialized plotting functions such as ```plot_spectra_subset()```, which enable tailored spectral plots from ```BundleSB``` subsets meeting certain wavelength, depth, or other criteria. And lower-level functions assist with direct manipulation of variables within Bundle objects.

Joaquín Chaves's avatar
Joaquín Chaves committed
Having these helpers alongside the main BundleSB class enhances workflows by filling gaps in the workflow from raw SeaBASS files to publication plots and journal exports. It eliminates wheel reinvention by abstracting away numerous repetitive data wrangling and visualization steps into reusable code. And additional helper functions will continue to be added over time, improving interoperability between modules that leverage the standardized Bundle data structure. This collection of functionality moves closer towards an integrated, end-to-end SeaBASS analysis toolkit.
Joaquín Chaves's avatar
Joaquín Chaves committed

## Example

Joaquín Chaves's avatar
Joaquín Chaves committed
Example usage:
Joaquín Chaves's avatar
Joaquín Chaves committed

    import sb_task_tools.sb_task_tools as sb
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
    bb3_file_list = sb.list_sb('./data', matching_re='*BB3*.sb')
    bb3_bundle = sb.BundleSB(bb3_file_list, user_variables=[ 'vsfp', 'vsfg', 'sal'])

    bb3_bundle.variables
        OrderedDict([('lat', ('lat', 'degrees')),
                    ('lon', ('lon', 'degrees')),
                    ('station', ('station', 'none')),
                    ('depth', ('depth', 'meters')),
                    ('time', ('time', 'datetime')),
                    ('sal', ('sal', 'psu')),
                    ('vsfg527_124ang', ('vsfg527_124ang', '1/m/sr')),
                    ('filename', 'filename'),
                    ('vsfg469_124ang', ('vsfg469_124ang', '1/m/sr')),
                    ('vsfg652_124ang', ('vsfg652_124ang', '1/m/sr')),
                    ('vsfp527_124ang', ('vsfp527_124ang', '1/m/sr')),
                    ('vsfp652_124ang', ('vsfp652_124ang', '1/m/sr')),
                    ('vsfp469_124ang', ('vsfp469_124ang', '1/m/sr'))])

    bb3_bundle.wavelengths
    OrderedDict([('sal', [nan]),
                ('vsfg', [469.0, 527.0, 652.0]),
                ('vsfp', [469.0, 527.0, 652.0])])
Joaquín Chaves's avatar
Joaquín Chaves committed

After initialization, the object contains these main attribute dictionaries:

Joaquín Chaves's avatar
Joaquín Chaves committed
```data```: Contains the actual bundled variable data extracted from the files\
```variables```: Metadata associated with each variable such as units\
```wavelengths``` and ```angles```: Wavelengths and angles for each radiometric variable, if any. NaNs are assigned to these magnitudes if the variables do not contain wavelength or angular data embedded in the var name.\
```size_class```: Particle size classes associated with each variable, typically for PSD (particle size distribution) data. If var name does not contain size class information, NaNs are returned.\
Joaquín Chaves's avatar
Joaquín Chaves committed
```parsed_variables```: List of which variables were successfully extracted
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
Additionally, the class handles a number of common data management tasks automatically:\
Automatically appending filename tags to keep track of which measurements came from which file.\
Extracting location, depth, station, and timestamp data into separate entries if available from either the data columns or the headers.\
Joaquín Chaves's avatar
Joaquín Chaves committed
Optional standard deviation extraction alongside each main variable
Joaquín Chaves's avatar
Joaquín Chaves committed
Handling missing variables across files by padding with NaNs to keep data aligned.\
Checking for and warning about any negative variable values.
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
```BundleSB``` solves the problem of pulling together groups of separate but related SeaBASS data sources into one easy-to-use Python dictionary interface. This enables simpler plotting, analysis, and data sharing workflows compared to handling many one-off SeaBASS files.
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
## Installation
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
    pip install git+https://oceandata.sci.gsfc.nasa.gov/rcs/joaquin/seabass-task-tools.git
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
## Additional functions

Joaquín Chaves's avatar
Joaquín Chaves committed
    list_sb(path, sb_file_ext='.sb', matching_re='')
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
The ```list_sb``` function is designed to generate a list of SeaBASS file paths from a specified directory. It handles several common tasks when programmatically accessing groups of SeaBASS files:
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
Changing to the target data directory if needed
Joaquín Chaves's avatar
Joaquín Chaves committed
Listing all files matching the standard ```'.sb'``` SeaBASS extension by default.\
Allowing a custom file extension pattern if data uses something non-standard.\
Accepting optional regular expression matching to only get a filtered subgroup of files.\
Automatically prepending full directory paths to file names.\
Returning to original working directory after listing.\
Raising error if no files found to avoid silent failures.\
The goal is to reduce effort required to get a Python list containing SeaBASS file locations compared to calling ```glob.glob()``` directly.\
Joaquín Chaves's avatar
Joaquín Chaves committed

Example usage:

Joaquín Chaves's avatar
Joaquín Chaves committed
    sb_files = list_sb('../data')
    # Returns full paths for all .sb files in ../data dir 
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
    sb_files = list_sb('../data', sb_file_ext='*.sb')  
    # Returns paths for all .sb files in ../data dir
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
    sb_files = list_sb('../data', matching_wldcard='*BB3*.sb')
Joaquín Chaves's avatar
Joaquín Chaves committed
    # Returns paths for all .sb files in ../data dir
Joaquín Chaves's avatar
Joaquín Chaves committed
    # that contain sbstring 'BB3' using bash-style wildcards
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
    sb_files = list_sb('../data', matching_re='^fluoro')  
Joaquín Chaves's avatar
Joaquín Chaves committed
    # Returns only files starting with 'fluoro' using regex
Joaquín Chaves's avatar
Joaquín Chaves committed

## Plotting data
Joaquín Chaves's avatar
Joaquín Chaves committed
[work in progress, more functions need documentations here]
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
    plot_depth_profile(sb_bundle, var, xlabel=None, filename=None)
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
The plot_depth_profile function is designed to create a standard depth profile plot from oceanographic data contained in a SeaBASS Bundle instance. Depth profile plots have depth on the y-axis and the variable value on the x-axis.

This function handles several common tasks when making a publication-quality depth profile plot:

Joaquín Chaves's avatar
Joaquín Chaves committed
Extracting the ```'depth'``` and specified variable data from the SeaBASS Bundle.\
Handling missing data or large depth gaps by inserting ```NaN``` breaks.\
Plotting depth inverted on the y-axis with values on top.\
Setting the x-axis ticks and labels to the top.\
Adding gridlines\
Tightly fitting the figure size to plot area\
Accepting an optional custom x-axis label\
Saving figure to file if filename provided\
The goal is to reduce effort required for consistent, polished oceanographic profile plots compared to general matplotlib use.\
Joaquín Chaves's avatar
Joaquín Chaves committed

Example usage:
Joaquín Chaves's avatar
Joaquín Chaves committed
``` 
sb_bundle = sb.BundleSB(file_list) 
ax = sb.plot_depth_profile(sb_bundle, 'sal', filename='salinity_profile')
```
##
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
    plot_spectra_subset(var, sb_bundle, depth_threshold,
Joaquín Chaves's avatar
Joaquín Chaves committed
    ylabel, title, filename, angle=None, plot_cv=False, alpha=0.5)
Joaquín Chaves's avatar
Joaquín Chaves committed

Joaquín Chaves's avatar
Joaquín Chaves committed
The plot_spectra_subset function is used to generate spectral plots from selected subsets of data contained within a SeaBASS Bundle instance.

It handles several common tasks when visualizing spectral data:

Joaquín Chaves's avatar
Joaquín Chaves committed
Extracting data for only user-specified wavelengths rather than full spectra\
Optionally extracting angular or coefficient of variation data\
Filtering samples in the Bundle by depth threshold\
Reshaping extracted data into a matrix for plotting\
Setting titles, axis labels, transparency\
Joaquín Chaves's avatar
Joaquín Chaves committed
Saving figure to file if filename provided

The goal is to simplify the process of generating publication-quality spectral plots from SeaBASS Bundles based on specific user criteria.

Example usage:

Joaquín Chaves's avatar
Joaquín Chaves committed
    angle = 124
Joaquín Chaves's avatar
Joaquín Chaves committed
    ax = sb.plot_spectra_subset(var, bb3_bundle,  depth_threshold, 
Joaquín Chaves's avatar
Joaquín Chaves committed
                        ylabel, title, filename, angle=angle)