A

AttOmics

9fb58744 example notebook · by Aurélien BEAUDE

AttOmics: Attention-based architecture for diagnosis and prognosis from Omics data

AttOmics architecture

The increasing availability of high-throughput omics data allows for considering a new medicine centered on individual patients. Precision medicine relies on exploiting these high-throughput data with machine-learning models, especially the ones based on deep-learning approaches, to improve diagnosis.
Due to the high-dimensional small-sample nature of omics data, current deep-learning models end up with many parameters and have to be fitted with a limited training set. Furthermore, interactions between molecular entities inside an omics profile are not patient-specific but are the same for all patients. In this article, we propose AttOmics, a new deep-learning architecture based on the self-attention mechanism. First, we decompose each omics profile into a set of groups, where each group contains related features. Then, by applying the self-attention mechanism to the set of groups, we can capture the different interactions specific to a patient. The results of different experiments carried out in this paper show that our model can accurately predict the phenotype of a patient with fewer parameters than deep neural networks. Visualizing the attention maps can provide new insights into the essential groups for a particular phenotype.

Instalation

  1. Install miniconda
  2. Clone this repository: git clone https://forge.ibisc.univ-evry.fr/abeaude/AttOmics.git
  3. Navigate to the AttOmics folder: cd AttOmics
  4. Create a conda environment: conda env create -f environment.yml
  5. Activate the newly created environment: conda activate attomics

Data Format

Omics file contains the expression matrix of the different patient. Each row represents a patient and each columns represents the different features.

Here is an example of a pytorch dataset that can be used with AttOmics:

from torch.utils.data import Dataset

class OmicsDataset(Dataset):
    def __init__(self, omics, label, event=None):
            self.omics = omics
            self.label = label
            self.event = event

    def __len__(self):
        return self.label.shape[0]

    def __getitem__(self, index):
        if torch.is_tensor(index):
            index = index.tolist()

        sample = {"x": self.omics[index], "label": self.label[index]}
        if self.event is not None:
            sample.update({"event": self.event[index]})

        sample = {k: torch.as_tensor(v) for k,v in sample.items()}
        return sample
import numpy as np
from torch.utils.data import Dataloader 

X = np.load("rnaseq_train.npy")
Y = np.load("label_train.npy")

# create dataset 
dataset = OmicsDataset(omics=X, label=Y)

# create dataloader
train_loader = DataLoader(dataset, batch_size=256, shuffle=True, drop_last=True)

You can repeat this for the different split. You can download the data here

Supported grouping strategies

You can add support to new grouping strategies. The function must have the following signature:

  • Input arguments

    • in_features: int
    • proj_size: int
    • n_group: int
    • train_data: DataFrame = None
    • **kwargs
  • Output

    • idx_in: List[Tensor]. Each element i of the list reprents the feature in group i
    • group_name: List[str]. Name of the different group
    • grp_proj_dim: List[List[int]]. The dimension used to encode each group in the gFCN module.

If you add support for a new grouping strategy, please update GeneGroupCreation dictionnary to register your method.

GeneGroupCreation.update({"my_new_method": new_method_fun})

Create your model

model = AttOmics(
  n_group= 10,
    n_layers= 1,
    num_heads= 1,
    attention_norm= "layer_norm",
    grouping_method= "random",
    head_norm= "layer_norm",
    sa_residual_connection= True,
    head_residual_connection= False,
    head_dropout= 0.0,
    head_batch_norm= False,
    reuse_grp= True,
    constant_group_size= False,
    head_input_dim= 500,
    head_hidden_ratio=[0.5],
    input_dim=X.shape[1],  # a dict of dimension
    num_classes=n_class,
    label_type="cancer_type",
    class_weights=class_weights,
    train_data=X,
    optimizer_init=optimizer,
    scheduler_init=lr_scheduler)

Training a model

We use pytorch_lightning to train our models. To train you need, first, to setup a Trainer.

from pytorch_lightning import Trainer

trainer = Trainer(gpus=[0], 
                  logger=MLFlowLogger(experiment_name="AttOmics",save_dir= "./logs")
                  )

Fit the model on the training set:

trainer.fit(model, , train_dataloader=train_loader, val_dataloaders=val_loader)

Now you can get evaluate your model on the test set:

trainer.test(model, test_dataloader=test_loader)

Authors

AttOmics was developed by:

  • Aurélien Beaude
  • Milad R. Vahid
  • Franck Augé
  • Farida Zehraoui
  • Blaise Hanczar

License

AttOmics is licensed under the GNU GPL, version 3 or (at your option) any later version. AttOmics is Copyright (2023-) by the authors.