﻿<?xml version="1.0" encoding="utf-8"?>
<ArticleSet>
  <ARTICLE>
    <Journal>
      <PublisherName>مرکز منطقه ای اطلاع رسانی علوم و فناوری</PublisherName>
      <JournalTitle>Journal of Information Systems and Telecommunication (JIST) </JournalTitle>
      <ISSN>2322-1437</ISSN>
      <Volume>11</Volume>
      <Issue>43</Issue>
      <PubDate PubStatus="epublish">
        <Year>2023</Year>
        <Month>8</Month>
        <Day>20</Day>
      </PubDate>
    </Journal>
    <ArticleTitle>Deep Transformer-based Representation for Text Chunking</ArticleTitle>
    <VernacularTitle>Deep Transformer-based Representation for Text Chunking</VernacularTitle>
    <FirstPage>176</FirstPage>
    <LastPage>184</LastPage>
    <ELocationID EIdType="doi">10.61186/jist.19894.11.43.176</ELocationID>
    <Language>en</Language>
    <AuthorList>
      <Author>
        <FirstName>Parsa</FirstName>
        <LastName>Kavehzadeh</LastName>
        <Affiliation>Amirkabir University of Technology</Affiliation>
      </Author>
      <Author>
        <FirstName>Mohammad Mahdi </FirstName>
        <LastName>Abdollah Pour</LastName>
        <Affiliation>Amirkabir University of Technology</Affiliation>
      </Author>
      <Author>
        <FirstName>Saeedeh</FirstName>
        <LastName>Momtazi</LastName>
        <Affiliation>Amirkabir University of Technology</Affiliation>
      </Author>
    </AuthorList>
    <History PubStatus="received">
      <Year>2021</Year>
      <Month>7</Month>
      <Day>27</Day>
    </History>
    <Abstract>Text chunking is one of the basic tasks in natural language processing. Most proposed models in recent years were employed on chunking and other sequence labeling tasks simultaneously and they were mostly based on Recurrent Neural Networks (RNN) and Conditional Random Field (CRF). In this article, we use state-of-the-art transformer-based models in combination with CRF, Long Short-Term Memory (LSTM)-CRF as well as a simple dense layer to study the impact of different pre-trained models on the overall performance in text chunking. To this aim, we evaluate BERT, RoBERTa, Funnel Transformer, XLM, XLM-RoBERTa, BART, and GPT2 as candidates of contextualized models. Our experiments exhibit that all transformer-based models except GPT2 achieved close and high scores on text chunking. Due to the unique unidirectional architecture of GPT2, it shows a relatively poor performance on text chunking in comparison to other bidirectional transformer-based architectures. Our experiments also revealed that adding a LSTM layer to transformer-based models does not significantly improve the results since LSTM does not add additional features to assist the model to achieve more information from the input compared to the deep contextualized models.</Abstract>
    <ObjectList>
      <Object Type="Keyword">
        <Param Name="Value">Text Chunking</Param>
      </Object>
      <Object Type="Keyword">
        <Param Name="Value">Sequence labeling</Param>
      </Object>
      <Object Type="Keyword">
        <Param Name="Value">Contextualized Word Representation</Param>
      </Object>
      <Object Type="Keyword">
        <Param Name="Value">Deep learning</Param>
      </Object>
      <Object Type="Keyword">
        <Param Name="Value">Transformers</Param>
      </Object>
    </ObjectList>
    <ArchiveCopySource DocType="Pdf">http://jist.ir/fa/Article/Download/19894</ArchiveCopySource>
  </ARTICLE>
</ArticleSet>