eugene
API¶
preprocess
¶
from eugene import preprocess
This module is designed to let users interact and modify SeqData objects to prepare for model training and other steps of the workflow. There are three main classes of preprocessing functions.
Sequence preprocessing¶
|
Make a set of unique ids for each sequence in a SeqData object and store as new xarray variable. |
|
Pad sequences in a SeqData object. |
|
One-hot encode sequences in a SeqData object. |
Train-test splitting¶
|
Add a variable labeling sequences as part of the train or test split based on chromosome. |
|
Add a variable labeling sequences as part of the train or test split, splitting by homology. |
|
Add a variable labeling sequences as part of the train or test split, splitting randomly. |
Target preprocessing¶
|
Clamp targets to a given percentile in a SeqData object. |
|
Scale targets in a SeqData object. |
dataload
¶
from eugene import dataload
This module is designed to help users prepare their SeqDatas for model training and other steps of the workflow (e.g. augmentation)
SeqData utilities¶
|
Concatenate multiple SeqDatas into one. |
|
Add observational metadata to a SeqData. |
Augmentation¶
|
Randomly applies a reverse-complement transformation to each sequence in a training batch |
models
¶
from eugene import models
This module is designed to allow users to easily build and initialize several neural network architectures that are designed for biological sequences.
Blocks¶
Blocks are composed to create architectures in EUGENe. You can find all the arguments that would be passed into the dense_kwargs
and recurrent_kwargs
arguments of all built-in model in the DenseBlock
and RecurrentBlock
classes, respectively. See the towers section for more information on the conv_kwargs
argument.
|
A block for dense layers |
|
Flexible block for convolutional models |
|
A block for recurrent layers |
Towers¶
The Conv1DTower
class is currently used for all built-in CNNs. This will be deprecated in the future in favor of the more general Tower
class. For now, you can find all the arguments that would be passed into the cnn_kwargs
argument of all built-in CNNs in the Conv1DTower
class.
|
A tower of blocks. |
|
Generates a PyTorch module for multiple convolutional layers |
LightningModules¶
|
Base LightningModule class for EUGENe that handles models that predict single tensor outputs. |
|
LightningModule class for training models that predict profile data (both shape and count). |
Initialization¶
|
Initialize the weights of a model. |
|
Initialize the convolutional kernel of choice using a set of motifs |
Zoo¶
Arguments for the cnn_kwargs
, recurrent_kwargs
and dense_kwargs
of all models can be found in the Conv1DTower
, RecurrentBlock
and DenseBlock
classes, respectively. See the blocks section and the towers section for more information. The Satori
architecture currently uses the MultiHeadAttention
layer which can be found at eugene.models.base._layers
for more information on the mha_kwargs
argument.
|
Basic fully connected network |
|
Basic FCN model with reverse complement |
|
Basic convolutional network |
|
Basic CNN model with reverse complement |
|
Basic recurrent network |
|
Basic RNN model with reverse complement |
|
Basic hybrid network |
|
Basic hybrid network with reverse complement |
|
Tutorial CNN model |
|
DeepBind architecture implemented from Alipanahi et al 2015 in PyTorch |
|
ResidualBind architecture implemented from Koo et al 2021 in PyTorch |
|
Custom convolutional model used in Kopp et al. 2021 paper. |
|
DeepSEA model implementation from Zhou et al 2015 in PyTorch |
|
Basset model implementation from Kelley et al 2016 in PyTorch |
|
Factorized Basset model implementation from Wnuk et al 2017 in PyTorch |
|
DanQ model from Quang and Xie 2016 in PyTorch |
|
Satori model from Ullah and Ben-Hur 2021 in PyTorch |
|
Custom convolutional model used in Jores et al. 2021 paper. |
|
DeepSTARR model from de Almeida et al 2022 |
|
This nn.Module was taken without permission from a Mr. |
|
DeepMEL model implementation from Minnoye et al 2020 in PyTorch |
|
scBasset model implementation from Yuan et al 2022 in PyTorch |
Utilities¶
|
List all layers in a model |
|
Get a layer from a model by name |
|
Instantiate a module or architecture from a config file |
train
¶
from eugene import train
Training procedures for data and models.
|
Fit a model using PyTorch Lightning. |
|
Fit a SequenceModule using PyTorch Lightning. |
evaluate
¶
from eugene import evaluate
Evaluation functions for trained models. Both prediction helpers and metrics.
Predictions¶
|
Predictions from a model and dataloader. |
Predictions for a SequenceModule model and SeqData |
|
|
Predictions from a model and train/val dataloaders. |
Predictions for a SequenceModule model and SeqData |
interpret
¶
from eugene import interpret
Interpretation suite of EUGENe, currently broken into filter visualization, feature attribution and in silico experimentation
Filter interpretation¶
|
Generate position frequency matrices (PFMs) for a given layer in a PyTorch model. |
|
Convert position frequency matrices (PFMs) to a MEME motif file. |
Attribution analysis¶
|
Compute attributions for model and SeqData combination. |
Global importance analysis (GIA)¶
|
Implant a feature into all sequences in an xarray dataset and return the model predictions. |
Calculate the dependence of the model predictions on the distance between two motifs. |
Generative¶
|
In silico evolve a set of sequences that are stored in a SeqData object. |
plot
¶
from eugene import plot
Plotting suite in EUGENe for multiple aspects of the workflow.
Categorical plotting¶
|
Plots a countplot of a vars in a SeqData using Seaborn. |
|
Plots a histogram of a vars in a SeqData using seaborn. |
|
Plots a boxplot of a vars in a SeqData using Seaborn. |
|
Plots a violinplot of a vars in a SeqData using Seaborn. |
|
Plots a scatterplot of two columns in seqs_annot using Seaborn. |
Training summaries¶
|
Plots the loss curves from a PyTorch Lightning (PL) training run. |
|
Plots the loss curves from a PyTorch Lightning (PL) training run. |
|
Plots the training summary from a given training run |
Performance¶
|
Plot a scatter plot of the performance of the model on a subset of the sequences. |
|
Plot a confusion matrix for given targets and predictions within SeqData |
|
Plot the area under the receiver operating characteristic curve for one or more predictions against a one or more targets. |
|
Plot the area under the precision recall curve for one or more predictions against a one or more targets. |
|
Plot a performance summary across model predictions for a passed in metric |
Sequences¶
|
Plot a track of the importance scores for a sequence using the logomaker package |
|
Plot the saliency tracks for multiple sequences across multiple importance scores in one plot. |
|
Plot the PFM for a single filter in a SeqData object's uns dictionary as a PWM logo |
|
Plot multiple filters in a SeqData object's uns dictionary as PWM logos. |
Global importance analysis (GIA)¶
|
Plot a lineplot for each position of the sequence after implanting a feature. |
Plot the median predicted cooperativity as a function of motif pair distance. |
utils
¶
File I/O¶
|
Make a directory if it doesn't exist. |