Methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, Check the superclass documentation for the generic This model inherits from PreTrainedModel. Michael Matena, YanqiĪfter such an Encoder Decoder model has been trained/fine-tuned, it can be saved/loaded just like any other models Tasks was shown in Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation To the decoder and should be fine-tuned on a downstream generative task, like summarization. Cross-attention layers are automatically added The encoder is loaded viaįrom_pretrained() function and the decoder is loaded viaįrom_pretrained() function. This class can be used to initialize a sequence-to-sequence model with any pretrained autoencoding model as theĮncoder and any pretrained autoregressive model as the decoder. ( config : typing.Optional = None encoder : typing.Optional = None decoder : typing.Optional = None ) > model = om_pretrained( 'my-model', config=encoder_decoder_config)Ĭlass transformers. > # loading model and config from pretrained folder > encoder_decoder_config = om_pretrained( 'my-model') > # set decoder config to causal lm > config_decoder.is_decoder = True > config_decoder.add_cross_attention = True > # Saving the model, including its configuration > model.save_pretrained( 'my-model') > # Accessing the model configuration > config_encoder = > # Initializing a Bert2Bert model from the bert-base-uncased style configurations > model = EncoderDecoderModel(config=config) > config = om_encoder_decoder_configs(config_encoder, config_decoder) > # Initializing a BERT bert-base-uncased style configuration > config_encoder = BertConfig() If there are only pytorchĬheckpoints for a particular encoder-decoder model, a workaround is:Ĭopied > from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel
Passing from_pt=True to this method will throw an exception. The from_pretrained() currently doesn’t support initializing the model from a
Sascha Rothe, Shashi Narayan, Aliaksei Severyn.Īfter such an EncoderDecoderModel has been trained/fine-tuned, it can be saved/loaded just likeĪny other models (see the examples for more information).Īn application of this architecture could be to leverage two pretrained BertModel as the encoderĪnd decoder for a summarization model as was shown in: Text Summarization with Pretrained Encoders by Yang Liu and Mirella Lapata. Was shown in Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks Pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder. The EncoderDecoderModel can be used to initialize a sequence-to-sequence model with any