eugene.preprocess.train_test_random_split

eugene.preprocess.train_test_random_split(sdata, dim, train_var='train_val', groups=None, test_size=0.1, random_state=None)

Add a variable labeling sequences as part of the train or test split, splitting randomly.

Parameters:
  • sdata (xr.Dataset) – SeqData object.

  • dim (str) – Dimension to split randomly.

  • train_var (str, optional) – Name of the variable holding the labels such that True = train and False = test, by default “train_val”

  • groups (ArrayLike, optional) – Groups to stratify the splits by, by default None

  • test_size (float, optional) – Proportion of data to put in the test set, by default 0.1

  • random_state (int, optional) – Random seed, by default None