eugene.preprocess.train_test_homology_split

eugene.preprocess.train_test_homology_split(sdata, seq_var, train_var='train_val', test_size=0.1, nucleotide=True)

Add a variable labeling sequences as part of the train or test split, splitting by homology.

Parameters:
  • sdata (xr.Dataset) – SeqData object.

  • seq_var (str) – Variable containing the sequences.

  • train_var (str, optional) – Name of the variable holding the labels, by default “train_val”

  • test_size (float, optional) – Proportion of data to put in the test set, by default 0.1

  • nucleotide (bool, optional) – Whether the input sequences are nucleotides or not, by default True

Raises:

ImportError – If [graph-part](https://github.com/graph-part/graph-part) is not installed.