API Reference
This section provides detailed documentation for all modules in the cryoblob package.
Main Package
Module: cryoblob
JAX based, JIT compiled, scalable codes for detection of amorphous blobs in low SNR cryo-EM images.
Submodules
- adapt:
Adaptive image processing methods that take advantage of JAX’s automatic differentiation capabilities. The functions are: - adaptive_wiener:
Adaptive Wiener filter that optimizes the noise estimate using gradient descent.
- adaptive_threshold:
Adaptively optimizes thresholding parameters using gradient descent to produce a differentiably thresholded image.
- blobs:
Contains the core blob detection algorithms. The functions are: - find_connected_components:
Pure JAX implementation of 3D connected components labeling.
- center_of_mass_3d:
Calculate center of mass for each labeled region in a 3D image.
- find_particle_coords:
Find particle coordinates using connected components and center of mass.
- preprocessing:
Pre-processes low SNR images to improve contrast of blobs.
- blob_list_log:
Detects blobs in an input image using the Laplacian of Gaussian (LoG) method.
- files:
Interfacing with data files. The functions are: - file_params:
Get the parameters for the file organization.
- load_mrc:
Reads an MRC-format cryo-EM file, extracting image data and metadata.
- process_single_file:
Process a single file for blob detection with memory optimization.
- process_batch_of_files:
Process a batch of files in parallel with memory optimization.
- folder_blobs:
Process a folder of images for blob detection with memory optimization.
- estimate_batch_size:
Estimate optimal batch size for processing MRC files based on available memory.
- estimate_memory_usage:
Estimate memory usage in GB for processing a single MRC file.
- get_optimal_batch_size:
Get optimal batch size by sampling multiple files from the list.
- image:
Utility functions for image processing. The functions are: - image_resizer:
Resize an image using a fast resizing algorithm implemented in JAX.
- resize_x:
Resize image along y-axis by independently resampling each column.
- gaussian_kernel:
Create a normalized 2D Gaussian kernel.
- apply_gaussian_blur:
Apply Gaussian blur to an image using convolution in JAX.
- difference_of_gaussians:
Applies Difference of Gaussians (DoG) filtering to enhance circular blobs.
- laplacian_of_gaussian:
Applies Laplacian of Gaussian (LoG) filtering to an input image.
- laplacian_kernel:
Create a Laplacian kernel for edge detection in a JAX-compatible manner.
- exponential_kernel:
Create an exponential kernel for image processing.
- perona_malik:
Perform edge-preserving denoising using the Perona-Malik anisotropic diffusion.
- histogram:
Calculate the histogram of an image.
- equalize_hist:
Perform histogram equalization on an image using JAX.
- equalize_adapthist:
Perform adaptive histogram equalization on an image using JAX.
- wiener:
Perform Wiener filtering on an image using JAX.
- plots:
Plotting functions for visualizing MRC images and blob detection results. The functions are: - plot_mrc:
Plot an MRC image using Matplotlib with an optional scaling mode and scalebar.
- types:
Type aliases and PyTrees. The types are: - scalar_float:
Zero dimensional floating point number
- scalar_int:
Zero dimensional integer.
- scalar_num:
Zero dimensional number, that can either be a floating point number or an integer.
- non_jax_number:
A number that is not a JAX array. This is because even single number are stored as 0D JAX arrays.
The PyTrees are: - MRC_Image:
A PyTree structure for MRC images. Contains the image data and metadata.
The factory functions are: - make_MRC_Image:
Factory function to create an MRC_Image instance.
- valid:
Pydantic models for data validation and configuration management. The classes are: - PreprocessingConfig:
Configuration for image preprocessing parameters
- BlobDetectionConfig:
Configuration for blob detection parameters
- FileProcessingConfig:
Configuration for file processing and batch operations
- MRCMetadata:
Validation for MRC file metadata
- ValidationPipeline:
Main pipeline class for validating all configurations
- class cryoblob.AdaptiveFilterConfig(*args, **kwargs)[source]
Bases:
BaseModelConfiguration model for adaptive filtering parameters.
Validates parameters used in adaptive_wiener and adaptive_threshold functions.
- validate_kernel_size()
Ensure kernel size is odd for proper centering.
- class cryoblob.BlobDetectionConfig(*args, **kwargs)[source]
Bases:
BaseModelConfiguration model for blob detection parameters.
Validates parameters used in blob_list_log function.
- validate_max_blob_size()
Ensure max_blob_size > min_blob_size.
- class cryoblob.FileProcessingConfig(*args, **kwargs)[source]
Bases:
BaseModelConfiguration model for file processing and batch operations.
Validates parameters used in folder_blobs function.
- validate_folder_exists()
Ensure the folder exists and is accessible.
- class cryoblob.MRCMetadata(*args, **kwargs)[source]
Bases:
BaseModelValidation model for MRC file metadata.
Ensures MRC file headers contain valid values.
- validate_data_range()
Ensure data_max > data_min.
- validate_mean_in_range()
Ensure data_mean is between data_min and data_max.
- class cryoblob.Path(*args, **kwargs)[source]
Bases:
PurePathPurePath subclass that can make system calls.
Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.
- classmethod cwd()[source]
Return a new path pointing to the current working directory (as returned by os.getcwd()).
- classmethod home()[source]
Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).
- samefile(other_path)[source]
Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
- iterdir()[source]
Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.
- glob(pattern)[source]
Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
- rglob(pattern)[source]
Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
- absolute()[source]
Return an absolute version of this path by prepending the current working directory. No normalization or symlink resolution is performed.
Use resolve() to get the canonical path to a file.
- resolve(strict=False)[source]
Make the path absolute, resolving all symlinks on the way and also normalizing it.
- stat(*, follow_symlinks=True)[source]
Return the result of the stat() system call on this path, like os.stat() does.
- open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)[source]
Open the file pointed by this path and return a file object, as the built-in open() function does.
- read_text(encoding=None, errors=None)[source]
Open the file in text mode, read it, and close the file.
- write_text(data, encoding=None, errors=None, newline=None)[source]
Open the file in text mode, write to it, and close the file.
- touch(mode=438, exist_ok=True)[source]
Create this file with the given access mode, if it doesn’t exist.
- lchmod(mode)[source]
Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.
- unlink(missing_ok=False)[source]
Remove this file or link. If the path is a directory, use rmdir() instead.
- lstat()[source]
Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.
- rename(target)[source]
Rename this path to the target path.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- replace(target)[source]
Rename this path to the target path, overwriting if that path exists.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- symlink_to(target, target_is_directory=False)[source]
Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.
- hardlink_to(target)[source]
Make this path a hard link pointing to the same file as target.
Note the order of arguments (self, target) is the reverse of os.link’s.
- link_to(target)[source]
Make the target path a hard link pointing to this path.
Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.
Deprecated since Python 3.10 and scheduled for removal in Python 3.12. Use hardlink_to() instead.
- class cryoblob.PreprocessingConfig(*args, **kwargs)[source]
Bases:
BaseModelConfiguration model for image preprocessing parameters.
This validates all parameters used in the preprocessing function to ensure they are within valid ranges and types before being passed to JAX-compiled functions.
- validate_sigma_values()
Ensure sigma values are reasonable for image processing.
- validate_conflicting_options()
Ensure conflicting preprocessing options aren’t both enabled.
- class cryoblob.ValidationPipeline(*args, **kwargs)[source]
Bases:
BaseModelMain validation pipeline that combines all configuration models.
This provides a single entry point for validating complete processing configurations.
- validate_for_single_image()[source]
Validate configuration for single image processing.
- Returns:
- preprocessing_config (Validated preprocessing parameters)
- blob_config (Validated blob detection parameters)
- Return type:
beartype.typing.Tuple.(<class ‘cryoblob.valid.PreprocessingConfig’>, <class ‘cryoblob.valid.BlobDetectionConfig’>)
- validate_for_batch_processing()[source]
Validate configuration for batch file processing.
- Returns:
- preprocessing_config (Validated preprocessing parameters)
- blob_config (Validated blob detection parameters)
- file_config (Validated file processing parameters)
- Raises:
ValueError – If file_processing configuration is not provided:
- Return type:
beartype.typing.Tuple.(<class ‘cryoblob.valid.PreprocessingConfig’>, <class ‘cryoblob.valid.BlobDetectionConfig’>, <class ‘cryoblob.valid.FileProcessingConfig’>)
- validate_for_adaptive_processing()[source]
Validate configuration for adaptive filtering.
- Returns:
- preprocessing_config (Validated preprocessing parameters)
- adaptive_config (Validated adaptive filtering parameters)
- Raises:
ValueError – If adaptive_filtering configuration is not provided:
- Return type:
beartype.typing.Tuple.(<class ‘cryoblob.valid.PreprocessingConfig’>, <class ‘cryoblob.valid.AdaptiveFilterConfig’>)
- to_preprocessing_kwargs()[source]
Convert preprocessing config to kwargs dict for existing functions.
- Returns:
- kwargs
- Return type:
Dictionary compatible with existing preprocessing function
- cryoblob.adaptive_threshold(img, target, initial_threshold=0.5, initial_slope=10.0, learning_rate=0.01, iterations=100)
Description
Adaptively optimizes thresholding parameters using gradient descent to produce a differentiably thresholded image.
- param - img (Float[Array:
The input image to threshold.
- param “h w”]):
The input image to threshold.
- param - target (Float[Array:
A reference binary image for supervised parameter optimization.
- param “h w”]):
A reference binary image for supervised parameter optimization.
- param - initial_threshold (scalar_float:
Initial guess for the threshold parameter. Default is 0.5.
- param optional):
Initial guess for the threshold parameter. Default is 0.5.
- param - initial_slope (scalar_float:
Initial guess for the slope controlling sigmoid steepness. Default is 10.0.
- param optional):
Initial guess for the slope controlling sigmoid steepness. Default is 10.0.
- param - learning_rate (scalar_float:
The learning rate used during gradient optimization. Default is 0.01.
- param optional):
The learning rate used during gradient optimization. Default is 0.01.
- param - iterations (scalar_int:
Number of iterations for gradient optimization. Default is 100.
- param optional):
Number of iterations for gradient optimization. Default is 100.
- returns:
thresholded_img (Float[Array, “h w”]) – The image after differentiable thresholding using optimized parameters.
optimized_threshold (scalar_float) – The optimized threshold parameter.
optimized_slope (scalar_float) – The optimized slope parameter.
Flow
—-
sigmoid_threshold – Applies a sigmoid function to the input image.
threshold_loss_fn – Computes the loss between the thresholded image and the target.
step – Performs a single optimization step.
optimized_params – Optimizes threshold and slope parameters.
thresholded_img – Applies the optimized thresholding parameters to the
input image.
- cryoblob.adaptive_wiener(img, target, kernel_size=3, initial_noise=0.1, learning_rate=0.01, iterations=100)
Adaptive Wiener filter that optimizes the noise estimate using gradient descent.
- Parameters:
(Float[Array (- target) – Noisy input image.
w"]) ("h) – Noisy input image.
(Float[Array – A target image or reference image used for optimization.
w"]) – A target image or reference image used for optimization.
Tuple[int (- kernel_size (scalar_int |) – Window size for Wiener filter. Default is 3.
int] – Window size for Wiener filter. Default is 3.
optional) – Window size for Wiener filter. Default is 3.
(scalar_float (- learning_rate) – Initial guess for noise parameter. Default is 0.1.
optional) – Initial guess for noise parameter. Default is 0.1.
(scalar_float – Learning rate for optimization. Default is 0.01.
optional) – Learning rate for optimization. Default is 0.01.
(scalar_int (- iterations) – Number of optimization steps. Default is 100.
optional) – Number of optimization steps. Default is 100.
- Returns:
filtered_img (Float[Array, “h w”]) – Wiener filtered image with optimized noise parameter.
optimized_noise (scalar_float) – The optimized noise parameter.
- Return type:
beartype.typing.Tuple.(jaxtyping.Float.(jaxtyping.Array, ‘h w’), beartype.typing.Union.(<class ‘float’>, jaxtyping.Float.(jaxtyping.Array, ‘’)))
- cryoblob.apply_gaussian_blur(image, sigma=1.0, kernel_size=5, mode='same')
Description
Apply Gaussian blur to an image using convolution in JAX.
- param - image (Real[Array:
Input image.
- param “y x”]):
Input image.
- param - sigma (scalar_float:
Standard deviation for Gaussian kernel. Defaults to 1.0.
- param optional):
Standard deviation for Gaussian kernel. Defaults to 1.0.
- param - kernel_size (scalar_int:
Size of Gaussian kernel. Must be odd. Defaults to 5.
- param optional):
Size of Gaussian kernel. Must be odd. Defaults to 5.
- param - mode (Literal[“full”:
Convolution mode. Defaults to “same”.
- param “valid”:
Convolution mode. Defaults to “same”.
- param “same”]):
Convolution mode. Defaults to “same”.
- returns:
Blurred image.
- rtype:
blurred (Float[Array, “yp xp”])
- cryoblob.blob_list_log(mrc_image, min_blob_size=5, max_blob_size=20, blob_step=1, downscale=4, std_threshold=6)
Description
Detect blobs of varying sizes in an MRC image using the Laplacian of Gaussian (LoG) method.
- param - mrc_image (MRC_Image):
The PyTree containing the image data and metadata.
- param - min_blob_size (scalar_num:
Minimum blob size to detect. Defaults to 10.
- param optional):
Minimum blob size to detect. Defaults to 10.
- param - max_blob_size (scalar_num:
Maximum blob size to detect. Defaults to 100.
- param optional):
Maximum blob size to detect. Defaults to 100.
- param - blob_step (scalar_num:
Step size between consecutive blob scales. Defaults to 2.
- param optional):
Step size between consecutive blob scales. Defaults to 2.
- param - downscale (scalar_num:
Factor by which the image is downscaled before detection. Defaults to 4.
- param optional):
Factor by which the image is downscaled before detection. Defaults to 4.
- param - std_threshold (scalar_num:
Threshold in standard deviations for blob detection. Defaults to 6.
- param optional):
Threshold in standard deviations for blob detection. Defaults to 6.
- returns:
Array of blob coordinates and sizes, shape [n, 3]. Columns represent (Y, X, Blob size in pixels).
- rtype:
scaled_coords (Float[Array, “n 3”])
- cryoblob.center_of_mass_3d(image, labels, num_labels)
Description
Calculate center of mass for each labeled region in a 3D image.
- param - image (Float[Array:
3D image array
- param “x y z”]):
3D image array
- param - labels (Integer[Array:
Integer array of labels
- param “x y z”]):
Integer array of labels
- param - num_labels (int):
Number of labels (excluding background)
- returns:
Array of centroid coordinates for each label
- rtype:
centroids (Float[Array, “n 3”])
- cryoblob.create_high_quality_pipeline()[source]
Create a validation pipeline optimized for high-quality blob detection.
- cryoblob.difference_of_gaussians(image, sigma1, sigma2, sampling=1, hist_stretch=True, normalized=True)
Description
Applies Difference of Gaussians (DoG) filtering to enhance circular blobs.
- param - image (Real[Array:
Input 2D image.
- param “y x”]):
Input 2D image.
- param - sigma1 (scalar_num):
Standard deviation of the first Gaussian (smaller).
- param - sigma2 (scalar_num):
Standard deviation of the second Gaussian (larger).
- param - sampling (scalar_num:
Downsampling factor; 1 means no resizing. Default is 1.
- param optional):
Downsampling factor; 1 means no resizing. Default is 1.
- param - hist_stretch (bool:
Apply histogram stretching if True. Default is True.
- param optional):
Apply histogram stretching if True. Default is True.
- param - normalized (bool:
Normalize filtered output by sigma2 if True. Default is True.
- param optional):
Normalize filtered output by sigma2 if True. Default is True.
- returns:
dog_filtered (Float[Array, “y x”]) – DoG-filtered image.
Flow
—-
Downsamples image if sampling ≠ 1 (JIT-safe way).
- Histogram stretch if requested.
- Create arithmetic-enforced DoG kernel.
- Convolve the image with DoG kernel.
- Normalize output if required.
- cryoblob.equalize_adapthist(image, kernel_size=8, clip_limit=0.01, nbins=256)
Description
Perform Contrast Limited Adaptive Histogram Equalization (CLAHE).
- param - image (Real[Array:
Input image.
- param “h w”]):
Input image.
- param - kernel_size (scalar_int:
Size of local regions for histogram equalization. Default is 8.
- param optional):
Size of local regions for histogram equalization. Default is 8.
- param - clip_limit (scalar_float:
Clipping limit for histogram. Higher values amplify contrast more strongly. Default is 0.01.
- param optional):
Clipping limit for histogram. Higher values amplify contrast more strongly. Default is 0.01.
- param - nbins (scalar_int:
Number of bins for the histogram. Default is 256.
- param optional):
Number of bins for the histogram. Default is 256.
- returns:
Image after applying CLAHE.
- rtype:
equalized_final (Float[Array, “h w”])
Notes
CLAHE performs localized histogram equalization to improve image contrast without amplifying noise excessively. The algorithm:
Divides the image into small regions (tiles).
Performs local histogram equalization on each tile separately.
Clips histograms at the specified limit to prevent noise amplification.
Interpolates results to produce a smoothly equalized image.
- cryoblob.equalize_hist(image, nbins=256, mask=None)
Description
Perform histogram equalization on an image using JAX.
- param - image (Real[Array:
Input image to equalize
- param “h w”]):
Input image to equalize
- param - nbins (scalar_int:
Number of bins for histogram. Default is 256
- param optional):
Number of bins for histogram. Default is 256
- param - mask (Real[Array:
Optional mask for selective equalization. Default is None (use all pixels)
- param “h w”]:
Optional mask for selective equalization. Default is None (use all pixels)
- param optional):
Optional mask for selective equalization. Default is None (use all pixels)
- returns:
Histogram equalized image
- rtype:
equalized (Float[Array, “h w”])
- cryoblob.estimate_batch_size(sample_file_path, target_memory_gb=4.0, safety_factor=0.7, processing_overhead=3.0)
Description
Estimate optimal batch size for processing MRC files based on available memory and file characteristics. This function analyzes a sample file to estimate memory requirements and calculates the maximum number of files that can be processed simultaneously without exceeding memory limits.
- param - sample_file_path (str):
Path to a representative MRC file for size estimation
- param - target_memory_gb (scalar_float:
Target GPU memory usage in GB. Default is 4.0
- param optional):
Target GPU memory usage in GB. Default is 4.0
- param - safety_factor (scalar_float:
Safety factor to prevent memory overflow (0.0-1.0). Default is 0.7 (use 70% of target memory)
- param optional):
Safety factor to prevent memory overflow (0.0-1.0). Default is 0.7 (use 70% of target memory)
- param - processing_overhead (scalar_float:
Memory overhead multiplier for processing operations. Default is 3.0 (processing uses 3x the raw data size)
- param optional):
Memory overhead multiplier for processing operations. Default is 3.0 (processing uses 3x the raw data size)
- returns:
Recommended batch size for processing
- rtype:
batch_size (scalar_int)
Notes
The estimation considers: - Raw file size in memory (dtype conversion) - Preprocessing operations (filtering, transformations) - Blob detection memory requirements - JAX compilation overhead - Intermediate array storage
Memory estimation formula:
` per_file_memory = file_size * processing_overhead available_memory = target_memory_gb * safety_factor * 1e9 batch_size = max(1, available_memory // per_file_memory) `Examples
>>> batch_size = estimate_batch_size("sample.mrc", target_memory_gb=8.0) >>> print(f"Recommended batch size: {batch_size}")
- cryoblob.estimate_memory_usage(file_path, include_preprocessing=True, include_blob_detection=True)
Description
Estimate memory usage in GB for processing a single MRC file.
- param - file_path (str):
Path to MRC file
- param - include_preprocessing (bool:
Include memory for preprocessing operations. Default is True
- param optional):
Include memory for preprocessing operations. Default is True
- param - include_blob_detection (bool:
Include memory for blob detection. Default is True
- param optional):
Include memory for blob detection. Default is True
- returns:
Estimated memory usage in GB
- rtype:
memory_gb (scalar_float)
- cryoblob.exponential_kernel(arr, k)
Description
Create an exponential kernel for image processing.
- param - arr (Float[Array:
Input array
- param “H W”]):
Input array
- param - k (scalar_float):
Exponential decay constant
- returns:
Exponential kernel
- rtype:
kernel (Float[Array, “H W”])
- cryoblob.file_params()
Description
Run this at the beginning to generate the dict This gives both the absolute and relative paths on how the files are organized.
- returns:
main_directory (str) – the main directory where the package is located.
folder_structure (dict) – where the files and data are stored, as read
from the organization.json file.
- cryoblob.find_connected_components(binary_image, connectivity=6)
Description
Pure JAX implementation of 3D connected components labeling. Uses a two-pass algorithm.
- param - binary_image (Bool[Array:
Binary image where True/1 indicates foreground
- param “x y z”]):
Binary image where True/1 indicates foreground
- param - connectivity (int:
Either 6 (face-connected) or 26 (fully-connected). Default is 6
- param optional):
Either 6 (face-connected) or 26 (fully-connected). Default is 6
- returns:
labels (Integer[Array, “x y z”]) – Array where each connected component has unique integer label
num_labels (int) – Number of connected components found
- cryoblob.find_particle_coords(results_3D, max_filtered, image_thresh)
Description
Find particle coordinates using connected components and center of mass. Pure JAX implementation.
- param - results_3D (Float[Array:
3D array of filter responses
- param “x y z”]):
3D array of filter responses
- param - max_filtered (Float[Array:
Maximum filtered array
- param “x y z”]):
Maximum filtered array
- param - image_thresh (scalar_float):
Threshold for peak detection
- returns:
Array of particle coordinates
- rtype:
coords (Float[Array, “n 3”])
- cryoblob.folder_blobs(folder_location, file_type='mrc', blob_downscale=7.0, target_memory_gb=4.0, stream_large_files=True, **kwargs)
Description
Process a folder of MRC images for blob detection with memory optimization and validated preprocessing configuration. Automatically manages batch processing and memory usage to prevent GPU memory overflow.
- param - folder_location (str):
Path to folder containing MRC images to process
- param - file_type (Literal[“mrc”]:
File extension to search for in the folder. Default is “mrc”
- param optional):
File extension to search for in the folder. Default is “mrc”
- param - blob_downscale (scalar_float:
Downscaling factor applied during blob detection. Default is 7.0
- param optional):
Downscaling factor applied during blob detection. Default is 7.0
- param - target_memory_gb (scalar_float:
Target GPU memory usage in GB for batch size optimization. Default is 4.0
- param optional):
Target GPU memory usage in GB for batch size optimization. Default is 4.0
- param - stream_large_files (bool:
Whether to use memory-mapped file access for large files. Default is True
- param optional):
Whether to use memory-mapped file access for large files. Default is True
- param - **kwargs:
Additional preprocessing parameters passed to PreprocessingConfig. Valid options: exponential, logarizer, gblur, background, apply_filter
- returns:
DataFrame containing detected blob information with columns: [‘File Location’, ‘Center Y (nm)’, ‘Center X (nm)’, ‘Size (nm)’]
- rtype:
blob_dataframe (pd.DataFrame)
- raises ValueError::
If preprocessing parameters are invalid according to PreprocessingConfig validation
Notes
Memory Management: - Uses batch processing to control memory usage - Automatically adjusts batch size based on available memory - Clears device memory between batches - Streams large files if needed - Efficiently handles intermediate results
The function processes files in batches to prevent memory overflow and provides a progress bar to track processing status. Empty folders return an empty DataFrame with the expected column structure.
- cryoblob.gaussian_kernel(size, sigma)
Description
Create a normalized 2D Gaussian kernel.
- param - size (scalar_int):
Kernel size (size x size). Must be odd.
- param - sigma (scalar_float):
Standard deviation of the Gaussian distribution.
- returns:
Normalized 2D Gaussian kernel.
- rtype:
kernel (Float[Array, “size size”])
- cryoblob.get_optimal_batch_size(file_list, target_memory_gb=4.0, sample_fraction=0.1)
Description
Get optimal batch size by sampling multiple files from the list.
- param - file_list (list[str]):
List of file paths to process
- param - target_memory_gb (scalar_float:
Target memory usage in GB. Default is 4.0
- param optional):
Target memory usage in GB. Default is 4.0
- param - sample_fraction (scalar_float:
Fraction of files to sample for estimation. Default is 0.1
- param optional):
Fraction of files to sample for estimation. Default is 0.1
- returns:
Optimal batch size
- rtype:
batch_size (scalar_int)
- cryoblob.histogram(image, bins=256, range_limits=None)
Calculate histogram from input image data.
- Parameters:
(Real[Array (- image) – Input array (any shape), flattened internally.
"..."]) – Input array (any shape), flattened internally.
(scalar_int (- bins) – Number of histogram bins.
optional) – Number of histogram bins.
(Tuple[scalar_float (- range_limits) – Min and max range for bins.
scalar_float] – Min and max range for bins.
optional) – Min and max range for bins.
- Returns:
Histogram counts per bin.
- Return type:
hist (Num[Array, “bins”])
- cryoblob.image_resizer(orig_image, new_sampling)
Description
Resize an image using a fast resizing algorithm implemented in JAX. If a 3D stack is provided, the function will sum along the last dimension.
- param - orig_image (Real[Array:
The original image to be resized. It should be a 2D JAX array or 3D stack.
- param “y x”] | Real[Array:
The original image to be resized. It should be a 2D JAX array or 3D stack.
- param “y x c”]):
The original image to be resized. It should be a 2D JAX array or 3D stack.
- param - new_sampling (scalar_num | Real[Array:
The new sampling rate for resizing the image. It can be a single float value or a tuple of two float values representing the sampling rates for the x and y axes respectively. - If a single value is provided, it will be applied to both axes. - If new_sampling is greater than 1, the image will be downsampled. - If new_sampling is less than 1, the image will be upsampled.
- param “2”]):
The new sampling rate for resizing the image. It can be a single float value or a tuple of two float values representing the sampling rates for the x and y axes respectively. - If a single value is provided, it will be applied to both axes. - If new_sampling is greater than 1, the image will be downsampled. - If new_sampling is less than 1, the image will be upsampled.
- returns:
The resized image.
- rtype:
resampled_image (Float[Array, “a b”])
- cryoblob.load_mrc(filepath)[source]
Description
Reads an MRC-format cryo-EM file from the specified path, extracting image data and relevant metadata. All numeric data are converted into JAX arrays and wrapped into a structured MRC_Image PyTree, compatible with JAX’s functional programming paradigm.
- param - filepath (str):
Path to the MRC file to be loaded.
- returns:
image_data: Image array (2D or 3D).
- voxel_size: Array containing voxel dimensions in
Å (Z, Y, X).
- origin: Array indicating the origin coordinates from the
header (Z, Y, X).
data_min: Minimum pixel value.
data_max: Maximum pixel value.
data_mean: Mean pixel value.
- mode: Integer code representing data type
(e.g., 0=int8, 1=int16, 2=float32).
- rtype:
MRC_Image (A PyTree containing)
Examples
>>> mrc_image = load_mrc("example.mrc") >>> print(mrc_image.voxel_size) Array([1.2, 1.2, 1.2], dtype=float32)
Notes
This function uses the mrcfile library for parsing MRC files.
- The resulting PyTree structure (MRC_Image) is explicitly
designed for use in JAX-based image processing pipelines.
- cryoblob.log_kernel(size, sigma, kernel_min=3)
Description
Create a Laplacian of Gaussian kernel for edge detection.
- param - size (int):
Kernel size, enforced positive and odd for ‘gaussian’ mode.
- param - sigma (scalar_float):
Gaussian standard deviation for LoG kernel.
- param - kernel_min (int:
Maximum kernel size (default is 3). This is used to enforce minimum kernel size.
- param optional):
Maximum kernel size (default is 3). This is used to enforce minimum kernel size.
- returns:
Laplacian kernel.
- rtype:
kernel (Float[Array, “size size”])
- cryoblob.make_MRC_Image(image_data, voxel_size, origin, data_min, data_max, data_mean, mode)
Description
Factory function to create an MRC_Image instance.
- param - image_data (Num[Array:
The image data array from the MRC file. Can be 2D or 3D.
- param “H W”] | Num[Array:
The image data array from the MRC file. Can be 2D or 3D.
- param “D H W”]):
The image data array from the MRC file. Can be 2D or 3D.
- param - voxel_size (Float[Array:
Voxel size in the order (Z, Y, X).
- param “3”]):
Voxel size in the order (Z, Y, X).
- param - origin (Float[Array:
Origin coordinates from the MRC file header (Z, Y, X).
- param “3”]):
Origin coordinates from the MRC file header (Z, Y, X).
- param - data_min (scalar_float):
Minimum value of image data (as stored in header).
- param - data_max (scalar_float):
Maximum value of image data (as stored in header).
- param - data_mean (scalar_float):
Mean value of image data (as stored in header).
- param - mode (scalar_int):
Data type mode from MRC header (e.g., 0: int8, 2: float32).
- returns:
An instance of the MRC_Image PyTree structure.
- rtype:
MRC_Image
- cryoblob.perona_malik(image, num_iter, kappa, gamma=0.1, conduction_fn=jaxtyping.jaxtyped)
Perform edge-preserving denoising using the Perona-Malik anisotropic diffusion.
- Parameters:
(Float[Array (- image) – Input noisy image.
W"]) ("H) – Input noisy image.
(scalar_int) (- num_iter) – Number of diffusion iterations.
(scalar_float) (- kappa) – Conductance coefficient controlling sensitivity to edges.
(scalar_float (- gamma) – Diffusion rate (0 < gamma <= 0.25 for stability), default is 0.1.
optional) – Diffusion rate (0 < gamma <= 0.25 for stability), default is 0.1.
(Callable (- conduction_fn) – Conductivity function, defaults to exponential.
optional) – Conductivity function, defaults to exponential.
- Returns:
Edge-preserved denoised image.
- Return type:
denoised_image (Float[Array, “H W”])
Notes
The Perona-Malik equation is given by: u_t = gamma * div(c * grad(u)) + u where: - u is the input image - t is time - gamma is the diffusion rate - c is the conductivity function - grad is the gradient operator - div is the divergence operator
The conductivity function c is typically an exponential function: c(delta) = exp(-delta^2 / kappa^2) where delta is the difference between neighboring pixels.
Perona, Pietro, Takahiro Shiota, and Jitendra Malik. “Anisotropic diffusion.” Geometry-driven diffusion in computer vision (1994): 73-92.
- cryoblob.plot_mrc(mrc_image, image_size=(15, 15), cmap='magma', mode='plain')
Description
Plot an MRC image using Matplotlib with an optional scaling mode and scalebar.
- param - mrc_image (MRC_Image):
The PyTree structure containing image data and voxel metadata.
- param - image_size (Tuple[scalar_int:
Size of the plotted figure (width, height) in inches. Default is (15, 15).
- param scalar_int]:
Size of the plotted figure (width, height) in inches. Default is (15, 15).
- param optional):
Size of the plotted figure (width, height) in inches. Default is (15, 15).
- param - cmap (str:
The Matplotlib colormap to use. Default is “viridis”.
- param optional):
The Matplotlib colormap to use. Default is “viridis”.
- param - mode (str:
Mode of visualization: - “plain”: Plot image data without modifications. - “log”: Plot logarithmically scaled image data. - “exp”: Plot exponentially scaled image data. Default is “plain”.
- param optional):
Mode of visualization: - “plain”: Plot image data without modifications. - “log”: Plot logarithmically scaled image data. - “exp”: Plot exponentially scaled image data. Default is “plain”.
- returns:
Displays the plot.
- rtype:
None
Examples
>>> plot_mrc(mrc_image, image_size=(10, 10), cmap="viridis", mode="log")
- cryoblob.preprocessing(image_orig, return_params=False, exponential=True, logarizer=False, gblur=2, background=0, apply_filter=0)
Description
Pre-processing of low SNR images to improve contrast of blobs.
- param - image_orig (Float[Array:
An input image represented as a 2D JAX array.
- param “y x”]):
An input image represented as a 2D JAX array.
- param - return_params (bool:
A boolean indicating whether to return the processing parameters. Default is False.
- param optional):
A boolean indicating whether to return the processing parameters. Default is False.
- param - exponential (bool:
A boolean indicating whether to apply an exponential function to the image. Default is True.
- param optional):
A boolean indicating whether to apply an exponential function to the image. Default is True.
- param - logarizer (bool:
A boolean indicating whether to apply a log function to the image. Default is False.
- param optional):
A boolean indicating whether to apply a log function to the image. Default is False.
- param - gblur (int:
The standard deviation of the Gaussian filter. Default is 2.
- param optional):
The standard deviation of the Gaussian filter. Default is 2.
- param - background (int:
The standard deviation of the Gaussian filter for background subtraction. Default is 0.
- param optional):
The standard deviation of the Gaussian filter for background subtraction. Default is 0.
- param - apply_filter (int:
If greater than 1, a Wiener filter is applied to the image.
- param optional):
If greater than 1, a Wiener filter is applied to the image.
- returns:
The pre-processed image
- rtype:
image_proc (Float[Array, “y x”])
- cryoblob.process_batch_of_files(file_batch, preprocessing_config, blob_downscale)
Process a batch of files in parallel with memory optimization.
- Parameters:
(List[str]) (- file_batch) – List of file paths to process
(Dict) (- preprocessing_kwargs) – Preprocessing parameters
(float) (- blob_downscale) – Downscaling factor
- Returns:
List of (blobs, file_path) tuples
- Return type:
results (List[Tuple[Array, str]])
- cryoblob.process_single_file(file_path, preprocessing_config, blob_downscale, stream_mode=True)
Description
Process a single MRC file for blob detection with memory optimization and validated preprocessing configuration.
- param - file_path (str):
Path to the MRC image file to process
- param - preprocessing_config (PreprocessingConfig):
Validated preprocessing configuration containing all processing parameters
- param - blob_downscale (scalar_float):
Downscaling factor applied during blob detection to reduce computational load
- param - stream_mode (bool:
Whether to use memory-mapped file access for large files to reduce memory usage. Default is True
- param optional):
Whether to use memory-mapped file access for large files to reduce memory usage. Default is True
- returns:
scaled_blobs (Float[Array, “n 3”]) – Array of detected blob coordinates and sizes where each row contains
[Y_position_nm, X_position_nm, Size_nm]
file_path (str) – Original file path for tracking processed files
- raises Exception::
Returns empty array and original file path if processing fails, with error message printed to console
Notes
The function uses streaming mode for large files to reduce memory usage and immediately releases file handles after reading. All intermediate arrays are explicitly deleted to manage GPU memory efficiently.
- cryoblob.resize_x(x_image, new_x_len)
Description
Resize image along y-axis by independently resampling each column. Uses lax.scan over the y-dimension, then vmap over x-dimension.
- param - x_image (Num[Array:
Image to resize (y by x)
- param “y x”]):
Image to resize (y by x)
- param - new_x_len (scalar_int):
Target number of columns
- returns:
Resized image (new_y by x)
- rtype:
resized (Float[Array, “y new_x”])
- cryoblob.validate_mrc_metadata(voxel_size, origin, data_min, data_max, data_mean, mode, image_shape)[source]
Validate MRC metadata and return validated model.
- Parameters:
voxel_size (-)
origin (-)
data_min (-)
data_max (-)
data_mean (-)
mode (-)
image_shape (-)
- Returns:
- metadata
- Return type:
Validated MRC metadata model
- Raises:
ValidationError – If any metadata values are invalid:
- cryoblob.wiener(img, kernel_size=3, noise=None)
Description
JAX implementation of Wiener filter for noise reduction. This is similar to scipy.signal.wiener.
- param - img (Float[Array:
The input image to be filtered
- param “h w”]):
The input image to be filtered
- param - kernel_size (int or tuple:
The size of the sliding window for local statistics. If tuple, represents (height, width). Default is 3
- param optional):
The size of the sliding window for local statistics. If tuple, represents (height, width). Default is 3
- param - noise (scalar_float:
The noise power. If None, uses the average of the local variance. Default is None
- param optional):
The noise power. If None, uses the average of the local variance. Default is None
- returns:
The filtered output with the same shape as input
- rtype:
filtered (Float[Array, “h w”])
Notes
The Wiener filter is optimal in terms of the mean square error. It estimates the local mean and variance around each pixel.
Adaptive Processing Module
Module: adapt
Contains adaptive image processing methods that take advantage of JAX’s automatic differentiation capabilities.
Functions
- adaptive_wiener:
Adaptive Wiener filter that optimizes the noise estimate using gradient descent.
- adaptive_threshold:
Adaptively optimizes thresholding parameters using gradient descent to produce a differentiably thresholded image.
- cryoblob.adapt.adaptive_wiener(img, target, kernel_size=3, initial_noise=0.1, learning_rate=0.01, iterations=100)
Adaptive Wiener filter that optimizes the noise estimate using gradient descent.
- Parameters:
(Float[Array (- target) – Noisy input image.
w"]) ("h) – Noisy input image.
(Float[Array – A target image or reference image used for optimization.
w"]) – A target image or reference image used for optimization.
Tuple[int (- kernel_size (scalar_int |) – Window size for Wiener filter. Default is 3.
int] – Window size for Wiener filter. Default is 3.
optional) – Window size for Wiener filter. Default is 3.
(scalar_float (- learning_rate) – Initial guess for noise parameter. Default is 0.1.
optional) – Initial guess for noise parameter. Default is 0.1.
(scalar_float – Learning rate for optimization. Default is 0.01.
optional) – Learning rate for optimization. Default is 0.01.
(scalar_int (- iterations) – Number of optimization steps. Default is 100.
optional) – Number of optimization steps. Default is 100.
- Returns:
filtered_img (Float[Array, “h w”]) – Wiener filtered image with optimized noise parameter.
optimized_noise (scalar_float) – The optimized noise parameter.
- Return type:
beartype.typing.Tuple.(jaxtyping.Float.(jaxtyping.Array, ‘h w’), beartype.typing.Union.(<class ‘float’>, jaxtyping.Float.(jaxtyping.Array, ‘’)))
- cryoblob.adapt.adaptive_threshold(img, target, initial_threshold=0.5, initial_slope=10.0, learning_rate=0.01, iterations=100)
Description
Adaptively optimizes thresholding parameters using gradient descent to produce a differentiably thresholded image.
- param - img (Float[Array:
The input image to threshold.
- param “h w”]):
The input image to threshold.
- param - target (Float[Array:
A reference binary image for supervised parameter optimization.
- param “h w”]):
A reference binary image for supervised parameter optimization.
- param - initial_threshold (scalar_float:
Initial guess for the threshold parameter. Default is 0.5.
- param optional):
Initial guess for the threshold parameter. Default is 0.5.
- param - initial_slope (scalar_float:
Initial guess for the slope controlling sigmoid steepness. Default is 10.0.
- param optional):
Initial guess for the slope controlling sigmoid steepness. Default is 10.0.
- param - learning_rate (scalar_float:
The learning rate used during gradient optimization. Default is 0.01.
- param optional):
The learning rate used during gradient optimization. Default is 0.01.
- param - iterations (scalar_int:
Number of iterations for gradient optimization. Default is 100.
- param optional):
Number of iterations for gradient optimization. Default is 100.
- returns:
thresholded_img (Float[Array, “h w”]) – The image after differentiable thresholding using optimized parameters.
optimized_threshold (scalar_float) – The optimized threshold parameter.
optimized_slope (scalar_float) – The optimized slope parameter.
Flow
—-
sigmoid_threshold – Applies a sigmoid function to the input image.
threshold_loss_fn – Computes the loss between the thresholded image and the target.
step – Performs a single optimization step.
optimized_params – Optimizes threshold and slope parameters.
thresholded_img – Applies the optimized thresholding parameters to the
input image.
Blob Detection Module
Module: blobs
Codes for actually detecting the blobs. The image processing and data I/O files are kept separately. This just deals with preprocessing data and counting blobs.
Functions
- find_connected_components:
Pure JAX implementation of 3D connected components labeling.
- center_of_mass_3d:
Calculate center of mass for each labeled region in a 3D image.
- find_particle_coords:
Find particle coordinates using connected components and center of mass.
- preprocessing:
Pre-processes low SNR images to improve contrast of blobs.
- blob_list_log:
Detects blobs in an input image using the Laplacian of Gaussian (LoG) method.
- cryoblob.blobs.find_connected_components(binary_image, connectivity=6)
Description
Pure JAX implementation of 3D connected components labeling. Uses a two-pass algorithm.
- param - binary_image (Bool[Array:
Binary image where True/1 indicates foreground
- param “x y z”]):
Binary image where True/1 indicates foreground
- param - connectivity (int:
Either 6 (face-connected) or 26 (fully-connected). Default is 6
- param optional):
Either 6 (face-connected) or 26 (fully-connected). Default is 6
- returns:
labels (Integer[Array, “x y z”]) – Array where each connected component has unique integer label
num_labels (int) – Number of connected components found
- cryoblob.blobs.center_of_mass_3d(image, labels, num_labels)
Description
Calculate center of mass for each labeled region in a 3D image.
- param - image (Float[Array:
3D image array
- param “x y z”]):
3D image array
- param - labels (Integer[Array:
Integer array of labels
- param “x y z”]):
Integer array of labels
- param - num_labels (int):
Number of labels (excluding background)
- returns:
Array of centroid coordinates for each label
- rtype:
centroids (Float[Array, “n 3”])
- cryoblob.blobs.find_particle_coords(results_3D, max_filtered, image_thresh)
Description
Find particle coordinates using connected components and center of mass. Pure JAX implementation.
- param - results_3D (Float[Array:
3D array of filter responses
- param “x y z”]):
3D array of filter responses
- param - max_filtered (Float[Array:
Maximum filtered array
- param “x y z”]):
Maximum filtered array
- param - image_thresh (scalar_float):
Threshold for peak detection
- returns:
Array of particle coordinates
- rtype:
coords (Float[Array, “n 3”])
- cryoblob.blobs.preprocessing(image_orig, return_params=False, exponential=True, logarizer=False, gblur=2, background=0, apply_filter=0)
Description
Pre-processing of low SNR images to improve contrast of blobs.
- param - image_orig (Float[Array:
An input image represented as a 2D JAX array.
- param “y x”]):
An input image represented as a 2D JAX array.
- param - return_params (bool:
A boolean indicating whether to return the processing parameters. Default is False.
- param optional):
A boolean indicating whether to return the processing parameters. Default is False.
- param - exponential (bool:
A boolean indicating whether to apply an exponential function to the image. Default is True.
- param optional):
A boolean indicating whether to apply an exponential function to the image. Default is True.
- param - logarizer (bool:
A boolean indicating whether to apply a log function to the image. Default is False.
- param optional):
A boolean indicating whether to apply a log function to the image. Default is False.
- param - gblur (int:
The standard deviation of the Gaussian filter. Default is 2.
- param optional):
The standard deviation of the Gaussian filter. Default is 2.
- param - background (int:
The standard deviation of the Gaussian filter for background subtraction. Default is 0.
- param optional):
The standard deviation of the Gaussian filter for background subtraction. Default is 0.
- param - apply_filter (int:
If greater than 1, a Wiener filter is applied to the image.
- param optional):
If greater than 1, a Wiener filter is applied to the image.
- returns:
The pre-processed image
- rtype:
image_proc (Float[Array, “y x”])
- cryoblob.blobs.blob_list_log(mrc_image, min_blob_size=5, max_blob_size=20, blob_step=1, downscale=4, std_threshold=6)
Description
Detect blobs of varying sizes in an MRC image using the Laplacian of Gaussian (LoG) method.
- param - mrc_image (MRC_Image):
The PyTree containing the image data and metadata.
- param - min_blob_size (scalar_num:
Minimum blob size to detect. Defaults to 10.
- param optional):
Minimum blob size to detect. Defaults to 10.
- param - max_blob_size (scalar_num:
Maximum blob size to detect. Defaults to 100.
- param optional):
Maximum blob size to detect. Defaults to 100.
- param - blob_step (scalar_num:
Step size between consecutive blob scales. Defaults to 2.
- param optional):
Step size between consecutive blob scales. Defaults to 2.
- param - downscale (scalar_num:
Factor by which the image is downscaled before detection. Defaults to 4.
- param optional):
Factor by which the image is downscaled before detection. Defaults to 4.
- param - std_threshold (scalar_num:
Threshold in standard deviations for blob detection. Defaults to 6.
- param optional):
Threshold in standard deviations for blob detection. Defaults to 6.
- returns:
Array of blob coordinates and sizes, shape [n, 3]. Columns represent (Y, X, Blob size in pixels).
- rtype:
scaled_coords (Float[Array, “n 3”])
File I/O Module
Module: files
Contains the codes for interfacing with data files. One goal here is to separate the Python code from the JAX code. Thus most of the necessary outward facing code, which is necessarily in Python, is here.
Functions
- file_params:
Get the parameters for the file organization.
- load_mrc:
Reads an MRC-format cryo-EM file, extracting image data and metadata.
- process_single_file:
Process a single file for blob detection with memory optimization.
- process_batch_of_files:
Process a batch of files in parallel with memory optimization.
- folder_blobs:
Process a folder of images for blob detection with memory optimization.
- estimate_batch_size:
Estimate optimal batch size for processing MRC files based on available memory.
- estimate_memory_usage:
Estimate memory usage in GB for processing a single MRC file.
- get_optimal_batch_size:
Get optimal batch size by sampling multiple files from the list.
- cryoblob.files.file_params()
Description
Run this at the beginning to generate the dict This gives both the absolute and relative paths on how the files are organized.
- returns:
main_directory (str) – the main directory where the package is located.
folder_structure (dict) – where the files and data are stored, as read
from the organization.json file.
- cryoblob.files.load_mrc(filepath)[source]
Description
Reads an MRC-format cryo-EM file from the specified path, extracting image data and relevant metadata. All numeric data are converted into JAX arrays and wrapped into a structured MRC_Image PyTree, compatible with JAX’s functional programming paradigm.
- param - filepath (str):
Path to the MRC file to be loaded.
- returns:
image_data: Image array (2D or 3D).
- voxel_size: Array containing voxel dimensions in
Å (Z, Y, X).
- origin: Array indicating the origin coordinates from the
header (Z, Y, X).
data_min: Minimum pixel value.
data_max: Maximum pixel value.
data_mean: Mean pixel value.
- mode: Integer code representing data type
(e.g., 0=int8, 1=int16, 2=float32).
- rtype:
MRC_Image (A PyTree containing)
Examples
>>> mrc_image = load_mrc("example.mrc") >>> print(mrc_image.voxel_size) Array([1.2, 1.2, 1.2], dtype=float32)
Notes
This function uses the mrcfile library for parsing MRC files.
- The resulting PyTree structure (MRC_Image) is explicitly
designed for use in JAX-based image processing pipelines.
- cryoblob.files.process_single_file(file_path, preprocessing_config, blob_downscale, stream_mode=True)
Description
Process a single MRC file for blob detection with memory optimization and validated preprocessing configuration.
- param - file_path (str):
Path to the MRC image file to process
- param - preprocessing_config (PreprocessingConfig):
Validated preprocessing configuration containing all processing parameters
- param - blob_downscale (scalar_float):
Downscaling factor applied during blob detection to reduce computational load
- param - stream_mode (bool:
Whether to use memory-mapped file access for large files to reduce memory usage. Default is True
- param optional):
Whether to use memory-mapped file access for large files to reduce memory usage. Default is True
- returns:
scaled_blobs (Float[Array, “n 3”]) – Array of detected blob coordinates and sizes where each row contains
[Y_position_nm, X_position_nm, Size_nm]
file_path (str) – Original file path for tracking processed files
- raises Exception::
Returns empty array and original file path if processing fails, with error message printed to console
Notes
The function uses streaming mode for large files to reduce memory usage and immediately releases file handles after reading. All intermediate arrays are explicitly deleted to manage GPU memory efficiently.
- cryoblob.files.process_batch_of_files(file_batch, preprocessing_config, blob_downscale)
Process a batch of files in parallel with memory optimization.
- Parameters:
(List[str]) (- file_batch) – List of file paths to process
(Dict) (- preprocessing_kwargs) – Preprocessing parameters
(float) (- blob_downscale) – Downscaling factor
- Returns:
List of (blobs, file_path) tuples
- Return type:
results (List[Tuple[Array, str]])
- cryoblob.files.folder_blobs(folder_location, file_type='mrc', blob_downscale=7.0, target_memory_gb=4.0, stream_large_files=True, **kwargs)
Description
Process a folder of MRC images for blob detection with memory optimization and validated preprocessing configuration. Automatically manages batch processing and memory usage to prevent GPU memory overflow.
- param - folder_location (str):
Path to folder containing MRC images to process
- param - file_type (Literal[“mrc”]:
File extension to search for in the folder. Default is “mrc”
- param optional):
File extension to search for in the folder. Default is “mrc”
- param - blob_downscale (scalar_float:
Downscaling factor applied during blob detection. Default is 7.0
- param optional):
Downscaling factor applied during blob detection. Default is 7.0
- param - target_memory_gb (scalar_float:
Target GPU memory usage in GB for batch size optimization. Default is 4.0
- param optional):
Target GPU memory usage in GB for batch size optimization. Default is 4.0
- param - stream_large_files (bool:
Whether to use memory-mapped file access for large files. Default is True
- param optional):
Whether to use memory-mapped file access for large files. Default is True
- param - **kwargs:
Additional preprocessing parameters passed to PreprocessingConfig. Valid options: exponential, logarizer, gblur, background, apply_filter
- returns:
DataFrame containing detected blob information with columns: [‘File Location’, ‘Center Y (nm)’, ‘Center X (nm)’, ‘Size (nm)’]
- rtype:
blob_dataframe (pd.DataFrame)
- raises ValueError::
If preprocessing parameters are invalid according to PreprocessingConfig validation
Notes
Memory Management: - Uses batch processing to control memory usage - Automatically adjusts batch size based on available memory - Clears device memory between batches - Streams large files if needed - Efficiently handles intermediate results
The function processes files in batches to prevent memory overflow and provides a progress bar to track processing status. Empty folders return an empty DataFrame with the expected column structure.
- cryoblob.files.estimate_batch_size(sample_file_path, target_memory_gb=4.0, safety_factor=0.7, processing_overhead=3.0)
Description
Estimate optimal batch size for processing MRC files based on available memory and file characteristics. This function analyzes a sample file to estimate memory requirements and calculates the maximum number of files that can be processed simultaneously without exceeding memory limits.
- param - sample_file_path (str):
Path to a representative MRC file for size estimation
- param - target_memory_gb (scalar_float:
Target GPU memory usage in GB. Default is 4.0
- param optional):
Target GPU memory usage in GB. Default is 4.0
- param - safety_factor (scalar_float:
Safety factor to prevent memory overflow (0.0-1.0). Default is 0.7 (use 70% of target memory)
- param optional):
Safety factor to prevent memory overflow (0.0-1.0). Default is 0.7 (use 70% of target memory)
- param - processing_overhead (scalar_float:
Memory overhead multiplier for processing operations. Default is 3.0 (processing uses 3x the raw data size)
- param optional):
Memory overhead multiplier for processing operations. Default is 3.0 (processing uses 3x the raw data size)
- returns:
Recommended batch size for processing
- rtype:
batch_size (scalar_int)
Notes
The estimation considers: - Raw file size in memory (dtype conversion) - Preprocessing operations (filtering, transformations) - Blob detection memory requirements - JAX compilation overhead - Intermediate array storage
Memory estimation formula:
` per_file_memory = file_size * processing_overhead available_memory = target_memory_gb * safety_factor * 1e9 batch_size = max(1, available_memory // per_file_memory) `Examples
>>> batch_size = estimate_batch_size("sample.mrc", target_memory_gb=8.0) >>> print(f"Recommended batch size: {batch_size}")
- cryoblob.files.estimate_memory_usage(file_path, include_preprocessing=True, include_blob_detection=True)
Description
Estimate memory usage in GB for processing a single MRC file.
- param - file_path (str):
Path to MRC file
- param - include_preprocessing (bool:
Include memory for preprocessing operations. Default is True
- param optional):
Include memory for preprocessing operations. Default is True
- param - include_blob_detection (bool:
Include memory for blob detection. Default is True
- param optional):
Include memory for blob detection. Default is True
- returns:
Estimated memory usage in GB
- rtype:
memory_gb (scalar_float)
- cryoblob.files.get_optimal_batch_size(file_list, target_memory_gb=4.0, sample_fraction=0.1)
Description
Get optimal batch size by sampling multiple files from the list.
- param - file_list (list[str]):
List of file paths to process
- param - target_memory_gb (scalar_float:
Target memory usage in GB. Default is 4.0
- param optional):
Target memory usage in GB. Default is 4.0
- param - sample_fraction (scalar_float:
Fraction of files to sample for estimation. Default is 0.1
- param optional):
Fraction of files to sample for estimation. Default is 0.1
- returns:
Optimal batch size
- rtype:
batch_size (scalar_int)
Image Processing Module
Module: image
Contains the basic functions for image processing, including resizing, filtering. This module will be used for data preprocessing.
Functions:
- image_resizer:
Resize an image using a fast resizing algorithm implemented in JAX.
- resize_x:
Resize image along y-axis by independently resampling each column.
- gaussian_kernel:
Create a normalized 2D Gaussian kernel.
- apply_gaussian_blur:
Apply Gaussian blur to an image using convolution in JAX.
- difference_of_gaussians:
Applies Difference of Gaussians (DoG) filtering to enhance circular blobs.
- laplacian_of_gaussian:
Applies Laplacian of Gaussian (LoG) filtering to an input image.
- laplacian_kernel:
Create a Laplacian kernel for edge detection in a JAX-compatible manner.
- exponential_kernel:
Create an exponential kernel for image processing.
- perona_malik:
Perform edge-preserving denoising using the Perona-Malik anisotropic diffusion.
- histogram:
Calculate the histogram of an image.
- equalize_hist:
Perform histogram equalization on an image using JAX.
- equalize_adapthist:
Perform adaptive histogram equalization on an image using JAX.
- wiener:
Perform Wiener filtering on an image using JAX.
- cryoblob.image.image_resizer(orig_image, new_sampling)
Description
Resize an image using a fast resizing algorithm implemented in JAX. If a 3D stack is provided, the function will sum along the last dimension.
- param - orig_image (Real[Array:
The original image to be resized. It should be a 2D JAX array or 3D stack.
- param “y x”] | Real[Array:
The original image to be resized. It should be a 2D JAX array or 3D stack.
- param “y x c”]):
The original image to be resized. It should be a 2D JAX array or 3D stack.
- param - new_sampling (scalar_num | Real[Array:
The new sampling rate for resizing the image. It can be a single float value or a tuple of two float values representing the sampling rates for the x and y axes respectively. - If a single value is provided, it will be applied to both axes. - If new_sampling is greater than 1, the image will be downsampled. - If new_sampling is less than 1, the image will be upsampled.
- param “2”]):
The new sampling rate for resizing the image. It can be a single float value or a tuple of two float values representing the sampling rates for the x and y axes respectively. - If a single value is provided, it will be applied to both axes. - If new_sampling is greater than 1, the image will be downsampled. - If new_sampling is less than 1, the image will be upsampled.
- returns:
The resized image.
- rtype:
resampled_image (Float[Array, “a b”])
- cryoblob.image.resize_x(x_image, new_x_len)
Description
Resize image along y-axis by independently resampling each column. Uses lax.scan over the y-dimension, then vmap over x-dimension.
- param - x_image (Num[Array:
Image to resize (y by x)
- param “y x”]):
Image to resize (y by x)
- param - new_x_len (scalar_int):
Target number of columns
- returns:
Resized image (new_y by x)
- rtype:
resized (Float[Array, “y new_x”])
- cryoblob.image.gaussian_kernel(size, sigma)
Description
Create a normalized 2D Gaussian kernel.
- param - size (scalar_int):
Kernel size (size x size). Must be odd.
- param - sigma (scalar_float):
Standard deviation of the Gaussian distribution.
- returns:
Normalized 2D Gaussian kernel.
- rtype:
kernel (Float[Array, “size size”])
- cryoblob.image.apply_gaussian_blur(image, sigma=1.0, kernel_size=5, mode='same')
Description
Apply Gaussian blur to an image using convolution in JAX.
- param - image (Real[Array:
Input image.
- param “y x”]):
Input image.
- param - sigma (scalar_float:
Standard deviation for Gaussian kernel. Defaults to 1.0.
- param optional):
Standard deviation for Gaussian kernel. Defaults to 1.0.
- param - kernel_size (scalar_int:
Size of Gaussian kernel. Must be odd. Defaults to 5.
- param optional):
Size of Gaussian kernel. Must be odd. Defaults to 5.
- param - mode (Literal[“full”:
Convolution mode. Defaults to “same”.
- param “valid”:
Convolution mode. Defaults to “same”.
- param “same”]):
Convolution mode. Defaults to “same”.
- returns:
Blurred image.
- rtype:
blurred (Float[Array, “yp xp”])
- cryoblob.image.difference_of_gaussians(image, sigma1, sigma2, sampling=1, hist_stretch=True, normalized=True)
Description
Applies Difference of Gaussians (DoG) filtering to enhance circular blobs.
- param - image (Real[Array:
Input 2D image.
- param “y x”]):
Input 2D image.
- param - sigma1 (scalar_num):
Standard deviation of the first Gaussian (smaller).
- param - sigma2 (scalar_num):
Standard deviation of the second Gaussian (larger).
- param - sampling (scalar_num:
Downsampling factor; 1 means no resizing. Default is 1.
- param optional):
Downsampling factor; 1 means no resizing. Default is 1.
- param - hist_stretch (bool:
Apply histogram stretching if True. Default is True.
- param optional):
Apply histogram stretching if True. Default is True.
- param - normalized (bool:
Normalize filtered output by sigma2 if True. Default is True.
- param optional):
Normalize filtered output by sigma2 if True. Default is True.
- returns:
dog_filtered (Float[Array, “y x”]) – DoG-filtered image.
Flow
—-
Downsamples image if sampling ≠ 1 (JIT-safe way).
- Histogram stretch if requested.
- Create arithmetic-enforced DoG kernel.
- Convolve the image with DoG kernel.
- Normalize output if required.
- cryoblob.image.log_kernel(size, sigma, kernel_min=3)
Description
Create a Laplacian of Gaussian kernel for edge detection.
- param - size (int):
Kernel size, enforced positive and odd for ‘gaussian’ mode.
- param - sigma (scalar_float):
Gaussian standard deviation for LoG kernel.
- param - kernel_min (int:
Maximum kernel size (default is 3). This is used to enforce minimum kernel size.
- param optional):
Maximum kernel size (default is 3). This is used to enforce minimum kernel size.
- returns:
Laplacian kernel.
- rtype:
kernel (Float[Array, “size size”])
- cryoblob.image.exponential_kernel(arr, k)
Description
Create an exponential kernel for image processing.
- param - arr (Float[Array:
Input array
- param “H W”]):
Input array
- param - k (scalar_float):
Exponential decay constant
- returns:
Exponential kernel
- rtype:
kernel (Float[Array, “H W”])
- cryoblob.image.perona_malik(image, num_iter, kappa, gamma=0.1, conduction_fn=jaxtyping.jaxtyped)
Perform edge-preserving denoising using the Perona-Malik anisotropic diffusion.
- Parameters:
(Float[Array (- image) – Input noisy image.
W"]) ("H) – Input noisy image.
(scalar_int) (- num_iter) – Number of diffusion iterations.
(scalar_float) (- kappa) – Conductance coefficient controlling sensitivity to edges.
(scalar_float (- gamma) – Diffusion rate (0 < gamma <= 0.25 for stability), default is 0.1.
optional) – Diffusion rate (0 < gamma <= 0.25 for stability), default is 0.1.
(Callable (- conduction_fn) – Conductivity function, defaults to exponential.
optional) – Conductivity function, defaults to exponential.
- Returns:
Edge-preserved denoised image.
- Return type:
denoised_image (Float[Array, “H W”])
Notes
The Perona-Malik equation is given by: u_t = gamma * div(c * grad(u)) + u where: - u is the input image - t is time - gamma is the diffusion rate - c is the conductivity function - grad is the gradient operator - div is the divergence operator
The conductivity function c is typically an exponential function: c(delta) = exp(-delta^2 / kappa^2) where delta is the difference between neighboring pixels.
Perona, Pietro, Takahiro Shiota, and Jitendra Malik. “Anisotropic diffusion.” Geometry-driven diffusion in computer vision (1994): 73-92.
- cryoblob.image.histogram(image, bins=256, range_limits=None)
Calculate histogram from input image data.
- Parameters:
(Real[Array (- image) – Input array (any shape), flattened internally.
"..."]) – Input array (any shape), flattened internally.
(scalar_int (- bins) – Number of histogram bins.
optional) – Number of histogram bins.
(Tuple[scalar_float (- range_limits) – Min and max range for bins.
scalar_float] – Min and max range for bins.
optional) – Min and max range for bins.
- Returns:
Histogram counts per bin.
- Return type:
hist (Num[Array, “bins”])
- cryoblob.image.equalize_hist(image, nbins=256, mask=None)
Description
Perform histogram equalization on an image using JAX.
- param - image (Real[Array:
Input image to equalize
- param “h w”]):
Input image to equalize
- param - nbins (scalar_int:
Number of bins for histogram. Default is 256
- param optional):
Number of bins for histogram. Default is 256
- param - mask (Real[Array:
Optional mask for selective equalization. Default is None (use all pixels)
- param “h w”]:
Optional mask for selective equalization. Default is None (use all pixels)
- param optional):
Optional mask for selective equalization. Default is None (use all pixels)
- returns:
Histogram equalized image
- rtype:
equalized (Float[Array, “h w”])
- cryoblob.image.equalize_adapthist(image, kernel_size=8, clip_limit=0.01, nbins=256)
Description
Perform Contrast Limited Adaptive Histogram Equalization (CLAHE).
- param - image (Real[Array:
Input image.
- param “h w”]):
Input image.
- param - kernel_size (scalar_int:
Size of local regions for histogram equalization. Default is 8.
- param optional):
Size of local regions for histogram equalization. Default is 8.
- param - clip_limit (scalar_float:
Clipping limit for histogram. Higher values amplify contrast more strongly. Default is 0.01.
- param optional):
Clipping limit for histogram. Higher values amplify contrast more strongly. Default is 0.01.
- param - nbins (scalar_int:
Number of bins for the histogram. Default is 256.
- param optional):
Number of bins for the histogram. Default is 256.
- returns:
Image after applying CLAHE.
- rtype:
equalized_final (Float[Array, “h w”])
Notes
CLAHE performs localized histogram equalization to improve image contrast without amplifying noise excessively. The algorithm:
Divides the image into small regions (tiles).
Performs local histogram equalization on each tile separately.
Clips histograms at the specified limit to prevent noise amplification.
Interpolates results to produce a smoothly equalized image.
- cryoblob.image.wiener(img, kernel_size=3, noise=None)
Description
JAX implementation of Wiener filter for noise reduction. This is similar to scipy.signal.wiener.
- param - img (Float[Array:
The input image to be filtered
- param “h w”]):
The input image to be filtered
- param - kernel_size (int or tuple:
The size of the sliding window for local statistics. If tuple, represents (height, width). Default is 3
- param optional):
The size of the sliding window for local statistics. If tuple, represents (height, width). Default is 3
- param - noise (scalar_float:
The noise power. If None, uses the average of the local variance. Default is None
- param optional):
The noise power. If None, uses the average of the local variance. Default is None
- returns:
The filtered output with the same shape as input
- rtype:
filtered (Float[Array, “h w”])
Notes
The Wiener filter is optimal in terms of the mean square error. It estimates the local mean and variance around each pixel.
Plotting Module
Module: files
Contains the codes for interfacing with data files. One goal here is to separate the Python code from the JAX code. Thus most of the necessary outward facing code, which is necessarily in Python, is here.
Functions
- plot_mrc:
Plot MRC image data using Matplotlib with optional scaling and scalebar.
- cryoblob.plots.plot_mrc(mrc_image, image_size=(15, 15), cmap='magma', mode='plain')
Description
Plot an MRC image using Matplotlib with an optional scaling mode and scalebar.
- param - mrc_image (MRC_Image):
The PyTree structure containing image data and voxel metadata.
- param - image_size (Tuple[scalar_int:
Size of the plotted figure (width, height) in inches. Default is (15, 15).
- param scalar_int]:
Size of the plotted figure (width, height) in inches. Default is (15, 15).
- param optional):
Size of the plotted figure (width, height) in inches. Default is (15, 15).
- param - cmap (str:
The Matplotlib colormap to use. Default is “viridis”.
- param optional):
The Matplotlib colormap to use. Default is “viridis”.
- param - mode (str:
Mode of visualization: - “plain”: Plot image data without modifications. - “log”: Plot logarithmically scaled image data. - “exp”: Plot exponentially scaled image data. Default is “plain”.
- param optional):
Mode of visualization: - “plain”: Plot image data without modifications. - “log”: Plot logarithmically scaled image data. - “exp”: Plot exponentially scaled image data. Default is “plain”.
- returns:
Displays the plot.
- rtype:
None
Examples
>>> plot_mrc(mrc_image, image_size=(10, 10), cmap="viridis", mode="log")
Type Definitions Module
Module: types
A single location for storing commonly used type aliases and PyTrees along with factory functions for creating them.
Types
- scalar_float:
Zero dimensional floating point number
- scalar_int:
Zero dimensional integer.
- scalar_num:
Zero dimensional number, that can either be a floating point number or an integer.
- non_jax_number:
A number that is not a JAX array. This is because even single number are stored as 0D JAX arrays.
PyTrees
- MRC_Image:
A PyTree structure for MRC images. Contains the image data and metadata.
Factory Functions
- make_MRC_Image:
Factory function to create an MRC_Image instance.
- cryoblob.types.make_MRC_Image(image_data, voxel_size, origin, data_min, data_max, data_mean, mode)
Description
Factory function to create an MRC_Image instance.
- param - image_data (Num[Array:
The image data array from the MRC file. Can be 2D or 3D.
- param “H W”] | Num[Array:
The image data array from the MRC file. Can be 2D or 3D.
- param “D H W”]):
The image data array from the MRC file. Can be 2D or 3D.
- param - voxel_size (Float[Array:
Voxel size in the order (Z, Y, X).
- param “3”]):
Voxel size in the order (Z, Y, X).
- param - origin (Float[Array:
Origin coordinates from the MRC file header (Z, Y, X).
- param “3”]):
Origin coordinates from the MRC file header (Z, Y, X).
- param - data_min (scalar_float):
Minimum value of image data (as stored in header).
- param - data_max (scalar_float):
Maximum value of image data (as stored in header).
- param - data_mean (scalar_float):
Mean value of image data (as stored in header).
- param - mode (scalar_int):
Data type mode from MRC header (e.g., 0: int8, 2: float32).
- returns:
An instance of the MRC_Image PyTree structure.
- rtype:
MRC_Image
Validation Module
Module: valid
Pydantic models for data validation and configuration management in the cryoblob preprocessing pipeline. This module provides type-safe validation for preprocessing parameters, file paths, and blob detection configurations.
Classes
- PreprocessingConfig:
Configuration for image preprocessing parameters
- BlobDetectionConfig:
Configuration for blob detection parameters
- FileProcessingConfig:
Configuration for file processing and batch operations
- MRCMetadata:
Validation for MRC file metadata
- ValidationPipeline:
Main pipeline class for validating all configurations
- class cryoblob.valid.PreprocessingConfig(*args, **kwargs)[source]
Bases:
BaseModelConfiguration model for image preprocessing parameters.
This validates all parameters used in the preprocessing function to ensure they are within valid ranges and types before being passed to JAX-compiled functions.
- validate_sigma_values()
Ensure sigma values are reasonable for image processing.
- validate_conflicting_options()
Ensure conflicting preprocessing options aren’t both enabled.
- class cryoblob.valid.BlobDetectionConfig(*args, **kwargs)[source]
Bases:
BaseModelConfiguration model for blob detection parameters.
Validates parameters used in blob_list_log function.
- validate_max_blob_size()
Ensure max_blob_size > min_blob_size.
- class cryoblob.valid.FileProcessingConfig(*args, **kwargs)[source]
Bases:
BaseModelConfiguration model for file processing and batch operations.
Validates parameters used in folder_blobs function.
- validate_folder_exists()
Ensure the folder exists and is accessible.
- class cryoblob.valid.MRCMetadata(*args, **kwargs)[source]
Bases:
BaseModelValidation model for MRC file metadata.
Ensures MRC file headers contain valid values.
- validate_data_range()
Ensure data_max > data_min.
- validate_mean_in_range()
Ensure data_mean is between data_min and data_max.
- class cryoblob.valid.AdaptiveFilterConfig(*args, **kwargs)[source]
Bases:
BaseModelConfiguration model for adaptive filtering parameters.
Validates parameters used in adaptive_wiener and adaptive_threshold functions.
- validate_kernel_size()
Ensure kernel size is odd for proper centering.
- class cryoblob.valid.ValidationPipeline(*args, **kwargs)[source]
Bases:
BaseModelMain validation pipeline that combines all configuration models.
This provides a single entry point for validating complete processing configurations.
- validate_for_single_image()[source]
Validate configuration for single image processing.
- Returns:
- preprocessing_config (Validated preprocessing parameters)
- blob_config (Validated blob detection parameters)
- Return type:
beartype.typing.Tuple.(<class ‘cryoblob.valid.PreprocessingConfig’>, <class ‘cryoblob.valid.BlobDetectionConfig’>)
- validate_for_batch_processing()[source]
Validate configuration for batch file processing.
- Returns:
- preprocessing_config (Validated preprocessing parameters)
- blob_config (Validated blob detection parameters)
- file_config (Validated file processing parameters)
- Raises:
ValueError – If file_processing configuration is not provided:
- Return type:
beartype.typing.Tuple.(<class ‘cryoblob.valid.PreprocessingConfig’>, <class ‘cryoblob.valid.BlobDetectionConfig’>, <class ‘cryoblob.valid.FileProcessingConfig’>)
- validate_for_adaptive_processing()[source]
Validate configuration for adaptive filtering.
- Returns:
- preprocessing_config (Validated preprocessing parameters)
- adaptive_config (Validated adaptive filtering parameters)
- Raises:
ValueError – If adaptive_filtering configuration is not provided:
- Return type:
beartype.typing.Tuple.(<class ‘cryoblob.valid.PreprocessingConfig’>, <class ‘cryoblob.valid.AdaptiveFilterConfig’>)
- to_preprocessing_kwargs()[source]
Convert preprocessing config to kwargs dict for existing functions.
- Returns:
- kwargs
- Return type:
Dictionary compatible with existing preprocessing function
- cryoblob.valid.create_default_pipeline()[source]
Create a validation pipeline with default settings.
- cryoblob.valid.create_high_quality_pipeline()[source]
Create a validation pipeline optimized for high-quality blob detection.
- cryoblob.valid.validate_mrc_metadata(voxel_size, origin, data_min, data_max, data_mean, mode, image_shape)[source]
Validate MRC metadata and return validated model.
- Parameters:
voxel_size (-)
origin (-)
data_min (-)
data_max (-)
data_mean (-)
mode (-)
image_shape (-)
- Returns:
- metadata
- Return type:
Validated MRC metadata model
- Raises:
ValidationError – If any metadata values are invalid: