eugene.interpret.generate_pfms_sdata

eugene.interpret.generate_pfms_sdata(model, sdata, seq_var, layer_name, kernel_size=None, activations=None, seqs=None, num_seqlets=100, padding=0, activation_threshold=None, num_filters=None, batch_size=None, device=None, num_workers=None, prefetch_factor=None, transforms=None, prefix='', suffix='', copy=False)

Generate position frequency matrices (PFMs) for a given layer in a PyTorch model.

Parameters:
  • model (torch.nn.Module) – The model to generate PFMs for.

  • sdata (xr.Dataset) – The dataset to use for generating PFMs.

  • seq_var (str) – The name of the sequence variable in the dataset.

  • layer_name (str) – The name of the layer to generate PFMs for.

  • kernel_size (int, optional) – The size of the kernel to use for generating PFMs. If not specified, the kernel size will be inferred from the layer.

  • activations (torch.Tensor, optional) – The activations to use for generating PFMs. If not specified, the activations will be computed using the dataset and layer.

  • seqs (List[str], optional) – The sequences to use for generating PFMs. If not specified, the sequences will be inferred from the dataset.

  • num_seqlets (int, optional) – The number of sequencelets to use for generating PFMs.

  • padding (int, optional) – The amount of padding to use when generating sequencelets.

  • activation_threshold (float, optional) – The threshold to use for selecting sequencelets based on their activation values.

  • num_filters (int, optional) – The number of filters to use for generating PFMs. If not specified, all filters will be used.

  • batch_size (int, optional) – The batch size to use when generating PFMs.

  • device (str, optional) – The device to use for generating PFMs.

  • num_workers (int, optional) – The number of workers to use for generating PFMs.

  • prefetch_factor (int, optional) – The prefetch factor to use when generating PFMs.

  • transforms (Dict[str, Any], optional) – The transforms to apply to the dataset.

  • prefix (str, optional) – The prefix to use for the output file.

  • suffix (str, optional) – The suffix to use for the output file.

  • copy (bool, optional) – Whether to copy the dataset before generating PFMs.

Returns:

pfms – The position frequency matrices.

Return type:

np.ndarray