eugene.interpret.generate_pfms_sdata¶

eugene.interpret.generate_pfms_sdata(model, sdata, seq_var, layer_name, kernel_size=None, activations=None, seqs=None, num_seqlets=100, padding=0, activation_threshold=None, num_filters=None, batch_size=None, device=None, num_workers=None, prefetch_factor=None, transforms=None, prefix='', suffix='', copy=False)¶

Generate position frequency matrices (PFMs) for a given layer in a PyTorch model.

Parameters:

model (torch.nn.Module) – The model to generate PFMs for.
sdata (xr.Dataset) – The dataset to use for generating PFMs.
seq_var (str) – The name of the sequence variable in the dataset.
layer_name (str) – The name of the layer to generate PFMs for.
kernel_size (int, optional) – The size of the kernel to use for generating PFMs. If not specified, the kernel size will be inferred from the layer.
activations (torch.Tensor, optional) – The activations to use for generating PFMs. If not specified, the activations will be computed using the dataset and layer.
seqs (List[str], optional) – The sequences to use for generating PFMs. If not specified, the sequences will be inferred from the dataset.
num_seqlets (int, optional) – The number of sequencelets to use for generating PFMs.
padding (int, optional) – The amount of padding to use when generating sequencelets.
activation_threshold (float, optional) – The threshold to use for selecting sequencelets based on their activation values.
num_filters (int, optional) – The number of filters to use for generating PFMs. If not specified, all filters will be used.
batch_size (int, optional) – The batch size to use when generating PFMs.
device (str, optional) – The device to use for generating PFMs.
num_workers (int, optional) – The number of workers to use for generating PFMs.
prefetch_factor (int, optional) – The prefetch factor to use when generating PFMs.
transforms (Dict[str, Any], optional) – The transforms to apply to the dataset.
prefix (str, optional) – The prefix to use for the output file.
suffix (str, optional) – The suffix to use for the output file.
copy (bool, optional) – Whether to copy the dataset before generating PFMs.

Returns:

pfms – The position frequency matrices.

Return type:

np.ndarray