Universal Dependencies for Learner English

We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpu...

Mô tả chi tiết

Lưu vào:
Hiển thị chi tiết
Tác giả chính: Berzak, Yevgeni, Kenney, Jessica, Spadine, Carolyn, Wang, Jing Xian, Lam, Lucia, Mori, Keiko Sophie, Garza, Sebastian, Katz, Boris
Định dạng: Technical Report
Ngôn ngữ:en_US
Thông tin xuất bản: Center for Brains, Minds and Machines (CBMM), arXiv 2016
Chủ đề:
Truy cập trực tuyến:http://hdl.handle.net/1721.1/103401
https://lib.cdythadong.edu.vn/handle/HDMC/24926
Từ khóa: Thêm từ khóa bạn đọc
Không có từ khóa, Hãy là người đầu tiên gắn từ khóa cho biểu ghi này!
id CDYTHD-HDMC-24926
record_format dspace
spelling CDYTHD-HDMC-249262024-12-03T14:41:58Z Universal Dependencies for Learner English Berzak, Yevgeni Kenney, Jessica Spadine, Carolyn Wang, Jing Xian Lam, Lucia Mori, Keiko Sophie Garza, Sebastian Katz, Boris Treebank of Learner English (TLE) English as Second Language (ESL) Universal Dependency (UD) Cambridge First Certificate in English (FCE) We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineate ESL annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. Finally, we benchmark POS tagging and dependency parsing performance on the TLE dataset and measure the effect of grammatical errors on parsing accuracy. We envision the treebank to support a wide range of linguistic and computational research o n second language acquisition as well as automatic processing of ungrammatical language. This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216. 2016-06-30T20:31:54Z 2016-06-30T20:31:54Z 2016-08-01 2023-04-13T10:03:45Z 2024-12-03T07:41:58Z 2023-04-13T10:03:45Z 2024-12-03T07:41:58Z Technical Report Working Paper Other http://hdl.handle.net/1721.1/103401 arXiv:1605.04278v2 [cs.CL] https://lib.cdythadong.edu.vn/handle/HDMC/24926 en_US CBMM Memo Series;052 Attribution-NonCommercial-ShareAlike 3.0 United States http://creativecommons.org/licenses/by-nc-sa/3.0/us/ application/pdf application/pdf CBMM-Memo-052.pdf Center for Brains, Minds and Machines (CBMM), arXiv https://lib.cdythadong.edu.vn/handle/HDMC/24926
institution Trường Cao đẳng Y tế Hà Đông
collection DSpace
language en_US
topic Treebank of Learner English (TLE)
English as Second Language (ESL)
Universal Dependency (UD)
Cambridge First Certificate in English (FCE)
spellingShingle Treebank of Learner English (TLE)
English as Second Language (ESL)
Universal Dependency (UD)
Cambridge First Certificate in English (FCE)
Berzak, Yevgeni
Kenney, Jessica
Spadine, Carolyn
Wang, Jing Xian
Lam, Lucia
Mori, Keiko Sophie
Garza, Sebastian
Katz, Boris
Universal Dependencies for Learner English
description We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineate ESL annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. Finally, we benchmark POS tagging and dependency parsing performance on the TLE dataset and measure the effect of grammatical errors on parsing accuracy. We envision the treebank to support a wide range of linguistic and computational research o n second language acquisition as well as automatic processing of ungrammatical language.
format Technical Report
author Berzak, Yevgeni
Kenney, Jessica
Spadine, Carolyn
Wang, Jing Xian
Lam, Lucia
Mori, Keiko Sophie
Garza, Sebastian
Katz, Boris
author_facet Berzak, Yevgeni
Kenney, Jessica
Spadine, Carolyn
Wang, Jing Xian
Lam, Lucia
Mori, Keiko Sophie
Garza, Sebastian
Katz, Boris
author_sort Berzak, Yevgeni
title Universal Dependencies for Learner English
title_short Universal Dependencies for Learner English
title_full Universal Dependencies for Learner English
title_fullStr Universal Dependencies for Learner English
title_full_unstemmed Universal Dependencies for Learner English
title_sort universal dependencies for learner english
publisher Center for Brains, Minds and Machines (CBMM), arXiv
publishDate 2016
url http://hdl.handle.net/1721.1/103401
https://lib.cdythadong.edu.vn/handle/HDMC/24926
work_keys_str_mv AT berzakyevgeni universaldependenciesforlearnerenglish
AT kenneyjessica universaldependenciesforlearnerenglish
AT spadinecarolyn universaldependenciesforlearnerenglish
AT wangjingxian universaldependenciesforlearnerenglish
AT lamlucia universaldependenciesforlearnerenglish
AT morikeikosophie universaldependenciesforlearnerenglish
AT garzasebastian universaldependenciesforlearnerenglish
AT katzboris universaldependenciesforlearnerenglish
_version_ 1817535795568836608