Finetuning xlent from huggingface tensorflow

Question

So, I want to perform text classification using xlnet.

I initialized the xlnet model and over that I added 2 layers (1 fc layer and 1 softmax):

from transformers import XLNetTokenizer, XLNetModel
from tensorflow.keras import Model
from tensorflow.keras import layers
import tensorflow as tf

def create_model():

    tokenizer = XLNetTokenizer.from_pretrained('xlnet-large-cased')
    xlnet_model = XLNetModel.from_pretrained('xlnet-large-cased')
    text = ["Hello, my dog is cute","I'm a very happy","I'm sad"]
    inputs = tokenizer(text, return_tensors="pt", padding= True)
    outputs = xlnet_model(**inputs) 

    flatten = layers.Flatten()(outputs.last_hidden_state.detach().numpy())

    fc1 = layers.Dense(units=256,activation="relu")(flatten)
    softmax = layers.Dense(units=3, activation="softmax")(fc1)
    txt_model = Model(inputs =  tf.keras.Input(outputs.last_hidden_state.detach().numpy()) , outputs = softmax)

    return xlnet_model, txt_model 

def main():
    
    xlnet_model, txt_model = create_model()

I intend to train the fc layer which I why I'm initializing the model and this is where the problem occurs. The input of the model would be the output of the last layer of xlnet and the output of the model would be output of the softmax layer.

I have trouble initializing the model input with the output of the last layer of xlnet.

My predictions for the problems (in code):

Problems in Input to txt_model: I'm directly sending in the values (instead of sending something like model.layer.output in tf.keras.model) but XLNetModel in transformers is in pytorch and returns a pytorch tensor.
Problems in output to txt_model: I believe that since I'm using detach in output.last_hidden-state, a error might rise up for not having metadata for previous layers. (I have use detach as output.last_hidden-state is a pytorch tensor)

So, presuming my predictions for problems are right, I need the right method to initialize the txt_model input. Give your suggestions.