eugene.preprocess.ohe_seqs_sdata¶
- eugene.preprocess.ohe_seqs_sdata(sdata, alphabet='DNA', seq_var='seq', ohe_var='ohe_seq', fill_value=0, copy=False)¶
One-hot encode sequences in a SeqData object.
Wraps the ohe function from SeqPro on the sequences in a SeqData object. Automatically adds a new variable to the SeqData object with the one-hot encoded sequences called “ohe_seq”. with dimensions ()”_sequence”, “length”, “_ohe”). Will also overwrite any existing variable with the same name.
- Parameters:
sdata (xr.Dataset) – SeqData object.
alphabet (str, optional) – Alphabet to use for one-hot encoding, by default “DNA”
seq_var (str, optional) – Name of the variable holding the sequences to be encoded, by default “seq”
ohe_var (str, optional) – Name of the variable to store the one-hot encoded sequences in, by default “ohe_seq”
fill_value (Union[int, float], optional) – Value to fill the one-hot encoded sequences with, by default 0
copy (bool, optional) – Whether to return a copy of the SeqData object, by default False
- Returns:
SeqData object with one-hot encoded sequences. If copy is True, a copy of the SeqData object is returned, else the original SeqData object is modified in place.
- Return type:
xr.Dataset