{"@attributes":{"version":"2.0"},"channel":{"title":{},"description":"A blog about technology and stuff related","link":"https:\/\/roomylee.github.io\/","pubDate":"Sun, 12 Dec 2021 14:51:51 +0000","lastBuildDate":"Sun, 12 Dec 2021 14:51:51 +0000","generator":"Jekyll v3.9.0","item":[{"title":"DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling (EMNLP 2020)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/2010.03099\">https:\/\/arxiv.org\/abs\/2010.03099<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Jiecao Chen, Liu Yang, Karthik Raman, Michael Bendersky, Jung-Jung Yeh, Yun Zhou, Marc Najork, Danyang Cai, Ehsan Emadzadeh<\/li>\n      <li>Google Research<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>EMNLP 2020<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>BERT \uac19\uc740 pre-trained models\uc774 \ub9ce\uc740 NLP\/IR \ud0dc\uc2a4\ud06c\uc5d0\uc11c \ub192\uc740 \uc131\ub2a5\uc744 \ubcf4\uc5ec\uc8fc\uace0 \uc788\uc9c0\ub9cc, \uc9c0\ub098\uce5c computational cost\ub85c \uc778\ud574 \uc2e4\uc81c \uc11c\ube44\uc2a4\uc5d0 deploy \ub418\uae30\ub294 \uc5b4\ub824\uc6c0<\/li>\n  <li>Knowledge Distillation (Hinton et al., 2015)\uc744 \ud1b5\ud574\uc11c \uac00\ubcbc\uc6b4 \ubaa8\ub378\uc744 \ub9cc\ub4e4 \uc218\ub294 \uc788\uc9c0\ub9cc, \ud604\uc7ac pairs (or tuples) of text\ub97c \uc704\ud55c \uc5f0\uad6c\ub294 \uc81c\ub300\ub85c \uc5c6\uc74c<\/li>\n  <li>Text pair \ud0dc\uc2a4\ud06c\ub97c \uc704\ud55c \uc0c8\ub85c\uc6b4 Distillation Framework\uc778 DiPair\ub97c \uc81c\uc548\ud568<\/li>\n  <li>DiPair\ub294 scalable\ud558\uba70 quality\uc640 speed\ub97c \ubaa8\ub450 \uac1c\uc120\ud588\uc74c<\/li>\n  <li>Academic\uacfc real-world\uc758 e-commerce benchmark\ub97c \ud1b5\ud574\uc11c \uc131\ub2a5\uc744 \ud3c9\uac00\ud588\uace0, Cross-attention BERT \ubaa8\ub378\uc5d0 \ube44\ud574 350\ubc30 \uc774\uc0c1\uc758 \uc18d\ub3c4 \ud5a5\uc0c1\uc774 \uc788\uc5c8\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"motivation\">Motivation<\/h2>\n\n<ul>\n  <li>\ud604\uc2e4\uc758 text matching task\ub294 trillion+ text pairs\uc5d0 \ub300\ud574\uc11c \uc810\uc218\ub97c \uacc4\uc0b0\ud574\uc57c \ud558\uace0 \uae30\uc874\uc758 BERT\uc640 \uac19\uc740 Cross-encoder \uad6c\uc870\uc758 \ubaa8\ub378\uc740 \uc774\ub97c \uacc4\uc0b0\ud558\ub294\ub370 \uba87 \ub144\uc774 \uac78\ub9b4 \uc218\ub3c4 \uc788\uc74c<\/li>\n  <li>Solution 1:\n    <ul>\n      <li>\uc774\ub7f0 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud55c \ub300\ud45c\uc801\uc778 \ubc29\ubc95\uc73c\ub85c Knowledge Distillation \uae30\ubc95\uc774 \uc788\uc74c. \ubaa8\ub378\uc758 \uc131\ub2a5\uc744 \ucd5c\ub300\ud55c \uc720\uc9c0\ud558\uba74\uc11c \uac00\ubccd\uac8c \ub9cc\ub4e4\uc5b4\uc11c inference speed\ub97c \ub192\uc774\ub294 \uac83<\/li>\n      <li>\ud558\uc9c0\ub9cc \uc5ec\uc804\ud788 \uac00\ubcbc\uc6b4\ub9cc\ud07c \uc131\ub2a5\uc774 \ub9ce\uc774 \ub5a8\uc5b4\uc9c0\ub294 quality-speed trade-off\uac00 \uc788\uc74c (BERT-Tiny; Turc et al., 2019)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Solution 2:\n    <ul>\n      <li>\ub610\ub2e4\ub978 \ubc29\ubc95\ub860\uc73c\ub85c\ub294 pair of text\ub97c \ub3c5\ub9bd\uc801\uc73c\ub85c \ubaa8\ub378\ub9c1\ud558\ub294 Dual-encoder \uad6c\uc870\ub97c \uc0ac\uc6a9\ud558\uae30\ub3c4 \ud568<\/li>\n      <li>\uc774 \uacbd\uc6b0 text\uc758 embedding\uc744 \ubbf8\ub9ac \uacc4\uc0b0\ud558\uc5ec caching\/indexing \ud560 \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0 \ub9e4\uc6b0 \ube60\ub978 inference\uac00 \uac00\ub2a5\ud568<\/li>\n      <li>\ud558\uc9c0\ub9cc \uc544\ubb34\ub798\ub3c4 pair \uc0ac\uc774\uc758 interaction\uc774 \uc5c6\uc5b4\uc11c \uc131\ub2a5\uc774 \ub9ce\uc774 \ub5a8\uc5b4\uc9d0<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Proposed Method (DiPair): \ub458\uc758 \uc7a5\uc810\uc744 \ucde8\ud574\uc11c Dual-encoder \uad6c\uc870\uc5d0\uc11c Cross-encoder\ub85c Distillation\ud574\uc11c \uc131\ub2a5\uacfc \uc18d\ub3c4\ub97c \uac1c\uc120\ud558\uc790<\/strong><\/li>\n<\/ul>\n\n<h2 id=\"method-dipair\">Method: DiPair<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-10-22-dipair\/figure2.png\" alt=\"figure2\" \/><\/p>\n\n<h4 id=\"dual-encoder\">Dual-Encoder<\/h4>\n\n<ul>\n  <li>Initialize with pre-trained BERT<\/li>\n  <li>Pair\ub97c \ub3c5\ub9bd\uc801\uc73c\ub85c \uc778\ucf54\ub529\ud558\ubbc0\ub85c\uc11c Head\uc5d0 \ub530\ub77c \uc18d\ub3c4\uac00 \uacb0\uc815\ub428. \ub530\ub77c\uc11c encoder\uc758 capacity\ub97c \ud0a4\uc6cc\ub3c4 inference time\uc774 \uc99d\uac00\ud558\uc9c0 \uc54a\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"truncated-output-sequences\">Truncated Output Sequences<\/h4>\n\n<ul>\n  <li>Transformer \uacc4\uc5f4\uc758 \ubaa8\ub378\uc744 input sequence \uae38\uc774\uc5d0 \ub530\ub77c quadratic\ud558\uac8c \uc18d\ub3c4\uac00 \ub290\ub824\uc9d0<\/li>\n  <li>\uadf8\ub807\ub2e4\uace0 input\uc744 truncate\ud558\uba74 \uc131\ub2a5\uc774 \uae09\uaca9\ud558\uac8c \ub5a8\uc5b4\uc9d0 (see Figure 4 in the paper)<\/li>\n  <li>Dual-encoder + Head \uad6c\uc870\ub97c \uace0\uc548\ud568\n    <ul>\n      <li>Dual-encoder\uc758 output\ub4e4\uc744 truncate and merge(concatenate)\ud574\uc11c \uc5bb\uc740 truncated output sequence\ub97c Head\uc758 input\uc73c\ub85c \uc0ac\uc6a9<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ubcd1\ubaa9\uc740 \uacb0\uad6d Head\uc778\ub370 input\uc758 \uae38\uc774\ub97c \uc9e7\uac8c \uc904\uc600\uae30\uc5d0 \ube60\ub978 inference\uac00 \uac00\ub2a5\ud568<\/li>\n<\/ul>\n\n<h4 id=\"projection-layer\">Projection Layer<\/h4>\n\n<ul>\n  <li>Encoder\uc758 output\uc744 \ub354 \uc791\uc740 \ucc28\uc6d0\uc73c\ub85c projection \uc2dc\ud0a4\ub294 layer\ub97c \ucd94\uac00\ud568<\/li>\n  <li>\uc774\ub7ec\uba74 1) caching\ud560 \uba54\ubaa8\ub9ac\ub97c \uc904\uc77c \uc218 \uc788\uace0, 2) Head\uc758 \uc18d\ub3c4\ub97c \ud5a5\uc0c1\uc2dc\ud0ac \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"transformer-based-head\">Transformer-Based Head<\/h4>\n\n<ul>\n  <li>BERT\n    <ul>\n      <li>Add position embedding and segment embedding<\/li>\n      <li>First token embedding (i.e., CLS embedding)\uc744 \ucd5c\uc885 input pair\uc758 representation\uc73c\ub85c \uc0ac\uc6a9<\/li>\n    <\/ul>\n  <\/li>\n  <li>FFNN\n    <ul>\n      <li>Feedforward neural network (FFNN)\uc740 transformer-head\ubcf4\ub2e4 \ube60\ub974\uba74\uc11c\ub3c4 \uaf64 \uad1c\ucc2e\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"a-two-stage-training-approach\">A Two-Stage Training Approach<\/h2>\n\n<ul>\n  <li>DiPair\uc758 \uad6c\uc870\ub294 pre-trained \ubaa8\ub378 \uc704\uc5d0 random initialized layer\uac00 \ucd94\uac00\ub41c \uac83\uc774\uae30 \ub54c\ubb38\uc5d0, \ubaa8\ub378 \uc804\uccb4\ub97c \ud55c\ubc88\uc5d0 \ud559\uc2b5\uc2dc\ud0a4\uba74 \ucd5c\ub300\uc758 \uc131\ub2a5\uc774 \ub098\uc624\uc9c0 \uc54a\uc74c\n    <ul>\n      <li>\ud559\uc2b5 \ucd08\uae30 \ub2e8\uacc4\uc5d0\uc11c random initialied weight\ub85c \uc778\ud574 \uc774\uc0c1\ud55c \uc608\uce21 \ubc0f gradient\uac00 \ubc1c\uc0dd\ud558\uace0 \uc774\ub85c \uc778\ud574 \uae30\uc874\uc758 pre-trained weight\uc5d0 \ub0b4\uc81c\ub41c knowledge\uac00 \ud30c\uad34\ub420 \uc218 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774\ub7f0 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574 two-stage training strategy\ub97c \uc81c\uc548\ud568 (similar with Wang et al., 2019)\n    <ol>\n      <li>\ucc98\uc74c\uc5d0\ub294 dual-encoder \ubd80\ubd84\uc744 freeze\ud558\uace0 \uc0c8\ub86d\uac8c \ucd94\uac00\ub41c parameters (Head, Projection \ub4f1)\ub9cc \ud559\uc2b5\uc744 \uc2dc\ud0b4<\/li>\n      <li>\uc5b4\ub290\uc815\ub3c4 \uc218\ub834\ub41c \ud6c4\uc5d0 dual-encoder\ub97c unfreeze\ud574\uc11c \uc804\uccb4 \ubaa8\ub378\uc744 \ud559\uc2b5\uc2dc\ud0b4<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<h2 id=\"main-results\">Main Results<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-10-22-dipair\/table3.png\" alt=\"table2\" \/><\/p>\n\n<ul>\n  <li>Q2P-MAT is a binary classification task derived from the <a href=\"https:\/\/microsoft.github.io\/MSMARCO-Passage-Ranking\/\">MSMARCO Passage Ranking\ndata<\/a><\/li>\n  <li>DiPiar + TSF \ubaa8\ub378\uc774 \uac00\uc7a5 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc600\uc73c\uba70, BERT-Tiny\uc640\ub294 \ube44\uc2b7\ud558\uc9c0\ub9cc \uc18d\ub3c4\ub294 8\ubc30 \ube68\ub790\uc74c<\/li>\n  <li>DE-Cos\ub294 Dual-encoder \uacb0\uacfc\ub97c dot-product\ub9cc \ud558\uae30 \ub54c\ubb38\uc5d0 \uc18d\ub3c4\ub294 \ub9e4\uc6b0 \ube68\ub790\uc9c0\ub9cc, input embedding \uc0ac\uc774\uc758 interaction\uc774 \ucda9\ubd84\uce58 \uc54a\uae30 \ub54c\ubb38\uc5d0 \uac00\uc7a5 \uc548\uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<blockquote>\n  <p>In this work, we reveal the importance of customizing models for problems with pairwise\/n-ary input and propose a new framework, DiPair, as an effective solution. This framework is flexible, and we can easily achieve more than 350x speedup over a BERT-based teacher model with no significant quality drop.<\/p>\n<\/blockquote>\n","pubDate":"Tue, 11 Aug 2020 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/dipair\/","guid":"https:\/\/roomylee.github.io\/dipair\/","category":["text-matching","pair-modeling","dipair","dual-encoder","distillation","large-scale","blog"]},{"title":"Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring (ICLR 2020)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1905.01969\">https:\/\/arxiv.org\/abs\/1905.01969<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston<\/li>\n      <li>Facebook AI Research<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>ICLR 2020<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Sequence \uac04\uc758 pairwise \ube44\uad50\ub97c \ud558\ub294 \ud0dc\uc2a4\ud06c\uc5d0 \ub300\ud574 \ubcf4\ud1b5 1) sequence pair\ub97c \ud55c\ubc88\uc5d0 \uc778\ucf54\ub529(full self-attention)\ud558\ub294 Cross-encoder \ubc29\uc2dd\uacfc 2) sequence pair\ub97c \uac01\uac01 \uc778\ucf54\ub529\ud558\ub294 Bi-encoder \ubc29\uc2dd\uc744 \uc0ac\uc6a9\ud569\ub2c8\ub2e4.<\/li>\n  <li>Cross-encoder\ub294 self-attention \uacfc\uc815\uc5d0\uc11c \ub450 sequence\uac00 \uc11c\ub85c\ub97c token-level\ub85c \ucc38\uc870\ud560 \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0 \uc131\ub2a5\uc774 \uc88b\uc9c0\ub9cc, pair\uc5d0 \ub300\ud55c \uc778\ucf54\ub529\uc744 \ud574\uc57c \ud558\uae30 \ub54c\ubb38\uc5d0 \uc2e4\uc81c\ub85c\ub294 \ub108\ubb34 \ub290\ub824\uc11c \uc0ac\uc6a9\ud558\uae30 \uc5b4\ub835\uc2b5\ub2c8\ub2e4.<\/li>\n  <li>Bi-encoder\ub294 \uac01 \ubb38\uc7a5\uc5d0 \ub300\ud574\uc11c\ub9cc \uc778\ucf54\ub529\ud558\uba74 \ub418\uae30 \ub54c\ubb38\uc5d0 \ube60\ub974\uc9c0\ub9cc \uc131\ub2a5\uc774 \uc57d\uac04 \ub5a8\uc5b4\uc9c0\ub294 \ub2e8\uc810\uc774 \uc788\uc2b5\ub2c8\ub2e4.<\/li>\n  <li>\uc774\ub7f0 \ub450 approach \ud55c\uacc4\ub97c \ucee4\ubc84\ud558\ub294 Poly-encoder\ub97c \uc81c\uc548\ud569\ub2c8\ub2e4.<\/li>\n  <li>Poly-encoder\ub97c \ud3ec\ud568\ud55c 3\uac1c\uc758 \uc778\ucf54\ub354\uc640 \ud559\uc2b5 \ubc29\ubc95\uc5d0 \ub300\ud574\uc11c \uc2e4\ud5d8\uc744 \ud1b5\ud574 \ube44\uad50\ud558\uc600\uace0, Cross-encoder\ubcf4\ub2e4\ub294 \ube60\ub974\uace0 Bi-encoder \ubcf4\ub2e4\ub294 \uc815\ud655\ud55c \uc131\ub2a5\uc744 \uc5bb\uc5c8\uc2b5\ub2c8\ub2e4.<\/li>\n  <li>\ub610\ud55c Poly-encoder\ub294 4\uac1c\uc758 \ud0dc\uc2a4\ud06c\uc5d0 \ub300\ud574\uc11c state-of-the-art \uc131\ub2a5\uc744 \uae30\ub85d\ud558\uc600\uc2b5\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n<h2 id=\"tasks\">Tasks<\/h2>\n\n<ul>\n  <li>Sentence selection in dialogue: \uc8fc\uc5b4\uc9c4 \ub300\ud654 \ucee8\ud14d\uc2a4\ud2b8\uc758 \ub2e4\uc74c\uc5d0 \uc62c \ub9d0\ub85c \uc801\uc808\ud55c \uac83 \ucc3e\uae30 (\uac1d\uad00\uc2dd 10~20\uac1c)\n    <ul>\n      <li>ConvAI2<\/li>\n      <li>DSTC7 challenge Track 1<\/li>\n      <li>Unbuntu V2 corpus<\/li>\n    <\/ul>\n  <\/li>\n  <li>Article search in IR: \uc8fc\uc5b4\uc9c4 \ubb38\uc7a5\uc774 \ub4f1\uc7a5\ud55c article \ucc3e\uae30 (\uac1d\uad00\uc2dd 10000\uac1c)\n    <ul>\n      <li>Wikipedia Article Search<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-08-11-poly-encoder\/table1.png\" alt=\"table1\" \/><\/p>\n\n<h2 id=\"methods\">Methods<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-08-11-poly-encoder\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<h3 id=\"bi-encoder\">Bi-encoder<\/h3>\n\n<ul>\n  <li>\n    <p>Figure 1 (a) \ucc98\ub7fc Context Encoder\uc640 Candidate Encoder\uac00 \uac01\uac01 context \ubb38\uc7a5\uacfc \ud574\ub2f9 context \ub2e4\uc74c\uc5d0 \uc62c \ud6c4\ubcf4 \ubb38\uc7a5\uc744 \uc778\ucf54\ub529\ud558\ub294 \uad6c\uc870\uc785\ub2c8\ub2e4. \uc218\ud559\uc801\uc73c\ub85c \ub2e4\uc74c\uacfc \uac19\uc2b5\ub2c8\ub2e4.<\/p>\n\n\\[y_{ctxt} = red(T_1 (ctxt)), \\quad y_{cand} = red(T_2 (cand))\\]\n\n    <ul>\n      <li>$T(x) = h_1, \u2026, h_N$ \ub294 Transformer Encoder\uc758 output\uc744 \uc758\ubbf8\ud558\uba70, $red(\\cdot)$ \uc740 \uc774\ub7f0 sequence output\uc744 \ud558\ub098\uc758 \ubca1\ud130\ub85c \ubcc0\ud658\uc2dc\ud0b5\ub2c8\ub2e4.<\/li>\n      <li>\ucd5c\uc885\uc801\uc73c\ub85c \uc5bb\uc5b4\uc9c0\ub294 $y$ \ub294 \uac01 Encoder\uc758 Contextualized Embedding\uc744 \uc758\ubbf8\ud569\ub2c8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>\n    <p>\uc778\ucf54\ub529 \uacb0\uacfc\ub85c Context Embedding, Candidate Embedding\uc744 \uc5bb\uc744 \uc218 \uc788\uace0, \ub450 \ubca1\ud130\uc758 \ub0b4\uc801(dot-product)\uc744 \ud1b5\ud574\uc11c \ub2e4\uc74c \ubb38\uc7a5\uc73c\ub85c \uc801\uc808\ud55c\uac00\uc5d0 \ub300\ud55c \uc810\uc218\ub97c \uacc4\uc0b0\ud569\ub2c8\ub2e4. \uc218\ud559\uc801\uc73c\ub85c \ub2e4\uc74c\uacfc \uac19\uc2b5\ub2c8\ub2e4.<\/p>\n\n\\[s(ctxt, cand_i) = y_{ctxt} \\cdot y_{cand_{i}}\\]\n  <\/li>\n  <li>\uc704\uc640 \uac19\uc740 \ubc29\uc2dd\uc73c\ub85c \uc8fc\uc5b4\uc9c4 $n$ \uac1c\uc758 \ud6c4\ubcf4\ub4e4\uc5d0 \ub300\ud574\uc11c \uc810\uc218\ub97c \uad6c\ud558\uace0, \uac00\uc7a5 \ub192\uc740 \uc810\uc218\uc758 \ud6c4\ubcf4 \ubb38\uc7a5\uc744 \uc8fc\uc5b4\uc9c4 context \ub2e4\uc74c\uc5d0 \uc62c \ubb38\uc7a5\uc774\ub77c\uace0 \uac04\uc8fc\ud569\ub2c8\ub2e4.<\/li>\n  <li>\ud559\uc2b5\ud560 \ub54c\ub294 batch \ub0b4\uc758 \uc0d8\ud50c\uc744 negatives\ub85c \uc0ac\uc6a9\ud558\uc5ec cross entropy\ub97c \ucd5c\uc18c\ud654\uc2dc\ud0a4\ub3c4\ub85d \ud559\uc2b5\ud569\ub2c8\ub2e4.\n    <ul>\n      <li>\ud558\ub098\uc758 batch\ub294 [($ctxt_1$, $cand_1$), ($ctxt_2$, $cand_2$), \u2026, ($ctxt_n$, $cand_n$)] \uc640 \uac19\uc774 \uad6c\uc131\ub429\ub2c8\ub2e4.<\/li>\n      <li>\uc774\ub54c \uccab\ubc88\uc9f8 $ctxt_1$\uc5d0 \ub300\ud55c loss\ub294 $cand_1$\ub294 positive \ub098\uba38\uc9c0 $cand_{j={2, \u2026, n}}$\ub294 negative\ub85c \ud558\uc5ec, \uc989 target\uc744 [1, 0, \u2026, 0]\ub85c \ud558\uc5ec cross entropy loss\ub97c \uacc4\uc0b0\ud560 \uc218 \uc788\uc74c. \uc774\ub97c \ubaa8\ub4e0 $ctxt$\uc5d0 \ub300\ud574\uc11c \uacc4\uc0b0\ud558\uace0 \uc774 loss\ub97c \ucd5c\uc18c\ud654\ud558\ub3c4\ub85d \ud559\uc2b5\ud569\ub2c8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Bi-encoder\ub294 context\uc640 candidate\uc744 \ub3c5\ub9bd\uc801\uc73c\ub85c \uc778\ucf54\ub529\ud558\uae30 \ub54c\ubb38\uc5d0, \uc2e4\uc81c \uc11c\ube44\uc2a4\ub97c \uc704\ud574 inference\ub97c \ud560 \ub54c \uac01 \ubb38\uc7a5\ub4e4\uc758 embedding\uc744 \ubbf8\ub9ac \uacc4\uc0b0\ud574\ub458 \uc218 \uc788\ub2e4\ub294 \uc7a5\uc810\uc774 \uc788\uc2b5\ub2c8\ub2e4. \uc774\ub294 <a href=\"#inference\">Inference<\/a> \uc139\uc158\uc5d0\uc11c \uc790\uc138\ud788 \ub2e4\ub8e8\uaca0\uc2b5\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n<h3 id=\"cross-encoder\">Cross-encoder<\/h3>\n\n<ul>\n  <li>Cross Encoder\ub294 \uc77c\ubc18\uc801\uc778 BERT\uc758 \ubc29\ubc95\ub860\uacfc \uc720\uc0ac\ud569\ub2c8\ub2e4. Figure 1 (b)\uc5d0\uc11c \ubcfc \uc218 \uc788\ub4ef\uc774, $ctxt$\uc640 $cand$\ub97c \uc774\uc5b4 \ubd99\uc5ec\uc11c \ud558\ub098\uc758 Encoder\ub97c \ud0dc\uc6b0\uace0 regression\uc744 \ud1b5\ud574\uc11c \ub2e4\uc74c \ubb38\uc7a5\uc73c\ub85c \uc801\uc808\ud55c \uac00\uc5d0 \ub300\ud55c \uc810\uc218\ub97c \uacc4\uc0b0\ud569\ub2c8\ub2e4.<\/li>\n  <li>Bi-encoder\uc640 \ub2ec\ub9ac \uc778\ucf54\ub529 \uacfc\uc815\uc5d0\uc11c context\uc640 candidate \uac04\uc758 self-attention\uc744 \uc801\uc6a9\ud560 \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0 \ub458 \uac04\uc758 \uad00\uacc4\ub97c \ud6e8\uc52c \uae4a\uac8c \ud30c\uc545\ud560 \uc218 \uc788\ub2e4\ub294 \uc7a5\uc810\uc774 \uc788\uace0, \uc131\ub2a5 \uc5ed\uc2dc \ubcf4\ud1b5 Bi-encoder \ubc29\uc2dd\uc5d0 \ube44\ud574 \ub354 \uc88b\uc2b5\ub2c8\ub2e4.<\/li>\n  <li>\ud559\uc2b5\uc740 \uc704\uc758 Bi-encoder\uc640 \ub9c8\ucc2c\uac00\uc9c0\ub85c batch \ub0b4\uc758 \uc0d8\ud50c\uc744 \uc774\uc6a9\ud558\uc5ec cross entropy loss\ub97c \ucd5c\uc18c\ud654\uc2dc\ud0a4\ub294 \ubc29\uc2dd\uc785\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n<h3 id=\"poly-encoder\">Poly-encoder<\/h3>\n\n<ul>\n  <li>\ub17c\ubb38\uc5d0\uc11c \uc81c\uc548\ud558\ub294 Poly-encoder\ub294 Bi-encoder\uc640 Cross-encoder\uc758 \uc7a5\uc810\uc744 \uc0b4\ub9ac\uace0 \ub2e8\uc810\uc744 \ubcf4\uc644\ud558\uace0\uc790 \ud569\ub2c8\ub2e4.<\/li>\n  <li>\uad6c\uc870\ub294 Figure 1 (c)\uc640 \uac19\uc73c\uba70, Bi-encoder\ucc98\ub7fc context\uc640 candidate\ub97c \ub3c5\ub9bd\uc801\uc73c\ub85c \uc778\ucf54\ub529\ud569\ub2c8\ub2e4.<\/li>\n  <li>Bi-encoder\uc640\uc758 \ucc28\uc774\uc810\uc740 Context Encoder\uc758 \uc704\uc5d0 \uc788\ub294 \uad6c\uc870\ub4e4\uc778\ub370 \ud558\ub098\uc529 \uc0b4\ud3b4\ubcf4\uaca0\uc2b5\ub2c8\ub2e4.\n    <ul>\n      <li>\n        <p>\uae30\uc874\uc5d0\ub294 Context Encoder\uc758 output\uc744 $red(\\cdot)$\uc744 \ud1b5\ud574 \ubc14\ub85c \ud558\ub098\uc758 \ubca1\ud130\ub85c \ud569\ucce4\uc9c0\ub9cc, Poly-encodeer\ub294 code vector\uc640\uc758 attention\uc744 \ud1b5\ud574\uc11c \uc5ec\ub7ec \uac1c\uc758 \ubca1\ud130\ub97c \ub9cc\ub4e4\uc5b4\ub0c5\ub2c8\ub2e4. (Candidate Encoder\ub294 \uae30\uc874\uacfc \ub3d9\uc77c\ud569\ub2c8\ub2e4.)<\/p>\n\n\\[y_{ctxt}^i = \\sum_j w_j^{c_i} h_j, \\quad \\text{where} \\quad (w_1^{c_i}, ..., w_N^{c_i}) = \\text{softmax}(c_i \\cdot h_1, ..., c_i \\cdot h_N)\\]\n      <\/li>\n      <li>\uc774\ub54c code vector\ub294 \uc77c\uc885\uc758 latent variable\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc73c\uba70, \ud559\uc2b5 \ucd08\uae30\uc5d0\ub294 random initialize \ub418\uace0 \ud559\uc2b5 \uacfc\uc815 \uc911\uc5d0 learnable parameter\ub85c\uc368 \ud568\uaed8 \ud559\uc2b5\ub429\ub2c8\ub2e4.\n        <ul>\n          <li>Code vector\ub294 \uc758\ubbf8\uc801\uc73c\ub85c \ud574\uc11d\ud574\ubd24\uc744 \ub54c, context\uc758 \uc5ec\ub7ec \uc758\ubbf8\ub97c \ud3ec\ucc29\ud558\ub294 \uc5ed\ud560\uc744 \ud55c\ub2e4\uace0 \ubcfc \uc218 \uc788\uc744 \uac83 \uac19\uc2b5\ub2c8\ub2e4.<\/li>\n        <\/ul>\n      <\/li>\n      <li>\n        <p>\uc774\ub807\uac8c \uc5bb\uc5b4\uc9c4 \ubca1\ud130\ub4e4(Figure 1 (c)\uc5d0\uc11c Emb 1 ~ Emb m)\uc5d0 \ub300\ud574\uc11c Candidate Embedding\uacfc\uc758 attention\uc744 \ud1b5\ud574\uc11c \ud55c \ubc88 \ub354 \ubca1\ud130\ub4e4\uc744 \ud569\uce58\uace0 \ucd5c\uc885 Context Embedding\uc744 \uad6c\ud569\ub2c8\ub2e4.<\/p>\n\n\\[y_{ctxt} = \\sum_i w_i y_{ctxt}^i, \\quad \\text{where} \\quad (w_1, ..., w_m) = \\text{softmax}(y_{cand_i} \\cdot y_{ctxt}^1, ..., y_{cand_i} \\cdot y_{ctxt}^m)\\]\n      <\/li>\n      <li>\ucd5c\uc885 score\ub294 \uc704\uc5d0\uc11c \uad6c\ud55c Context Embedding\uacfc Candidate Embedding\uc758 \ub0b4\uc801\uc744 \ud1b5\ud574\uc11c \uacc4\uc0b0\ud569\ub2c8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774\ub807\uac8c \ud558\uba74, Cross-encoder\ucc98\ub7fc context\uc640 candidate \uac04\uc758 \uad00\uacc4\ub97c \ubcf4\ub2e4 \uae4a\uac8c \ud30c\uc545\ud560 \uc218 \uc788\uc73c\uba74\uc11c, Bi-encoder\ucc98\ub7fc \ubb38\uc7a5\ub4e4\uc758 Embedding\uc744 \uc77c\uc815 \ubd80\ubd84 \ubbf8\ub9ac \uacc4\uc0b0\ud560 \uc218 \uc788\uc5b4\uc11c inference \uc0c1\ud669\uc5d0\uc11c \ub9e4\uc6b0 \ube60\ub974\uace0 \ud6a8\uacfc\uc801\uc785\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n<h2 id=\"inference\">Inference<\/h2>\n\n<ul>\n  <li>\uc2e4\uc81c \uc11c\ube44\uc2a4\uc5d0\uc11c\ub294 \uc218\ubc31, \uc218\ucc9c\ub9cc\uac1c\uc758 \ud6c4\ubcf4 \ubb38\uc7a5\ub4e4 \uc911\uc5d0\uc11c \ub2e4\uc74c\uc5d0 \uc624\uae30 \uc801\uc808\ud560 \ubb38\uc7a5\uc744 inference\ub97c \ud1b5\ud574\uc11c \ucc3e\uc544\uc57c \ud569\ub2c8\ub2e4.<\/li>\n  <li>\uc774\ub54c, Bi-encoder \ubc29\uc2dd\uc740 inference speed\uac00 \ub9e4\uc6b0 \ube60\ub974\ub2e4\ub294 \uc7a5\uc810\uc774 \uc788\uc2b5\ub2c8\ub2e4.\n    <ul>\n      <li>\ud6c4\ubcf4 \ubb38\uc7a5\uc774 \uc544\ubb34\ub9ac \ub9ce\uc544\ub3c4 \ubbf8\ub9ac \uc778\ucf54\ub529\uc744 \ud574\uc11c \ubb38\uc7a5 \ubcc4 Candidate Embedding\uc744 \uacc4\uc0b0\ud574\ub458 \uc218 \uc788\uae30 \ub54c\ubb38\uc785\ub2c8\ub2e4.<\/li>\n      <li>\uadf8\ub7ec\uba74, query\ub85c \ub4e4\uc5b4\uc624\ub294 context\ub9cc \ub531 \ud55c \ubc88 \uc778\ucf54\ub529\ud574\uc11c Context Embedding\uc744 \uacc4\uc0b0\ud558\uace0 \ubbf8\ub9ac \uacc4\uc0b0\ud574 \ub454 \uc218\ubc31, \uc218\ucc9c\ub9cc \uac1c\uc758 Candidate Embedding\uacfc \ub0b4\uc801 \uacc4\uc0b0\ub9cc \ud558\uba74 \ub418\ub294 \uac83\uc785\ub2c8\ub2e4.\n        <ul>\n          <li>Transformer Encoder\ub97c inference \ud558\ub294 \ube44\uc6a9\uc5d0 \ube44\ud558\uba74 \ub450 \ubca1\ud130\uc758 \ub0b4\uc801 \uacc4\uc0b0\uc740 \ub9e4\uc6b0 \ube60\ub974\uae30 \ub54c\ubb38\uc5d0 \uc778\ucf54\ub529\uc744 \ucd5c\ub300\ud55c \uc548\ud558\ub294 \uac8c \uc18d\ub3c4 \ud5a5\uc0c1\uc758 \ud3ec\uc778\ud2b8\uc785\ub2c8\ub2e4.<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>\ubc18\uba74, Cross-encoder \ubc29\uc2dd\uc740 \uc0ac\uc2e4\uc0c1 \uc2e4\uc81c \uc11c\ube44\uc2a4\uc5d0\uc11c\ub294 \uc0ac\uc6a9\ud560 \uc218 \uc5c6\uc744 \uc815\ub3c4\ub85c \ub290\ub9bd\ub2c8\ub2e4.\n    <ul>\n      <li>Cross-encoder\ub294 context\uc640 candidate\uc758 \uc30d\uc5d0 \ub300\ud55c \uacc4\uc0b0\uc744 \ud558\uae30 \ub54c\ubb38\uc5d0 context\uac00 \uc8fc\uc5b4\uc9c0\uc9c0 \uc54a\uc73c\uba74 \uc5b4\ub5a0\ud55c \uacc4\uc0b0\ub3c4 \ubbf8\ub9ac \ud560 \uc218\uac00 \uc5c6\uc2b5\ub2c8\ub2e4.<\/li>\n      <li>Query\ub85c context\uac00 \ub4e4\uc5b4\uc624\uba74 \uadf8\ub54c\ubd80\ud130 context\uc640 \uac01 candidate\ub97c \uc30d\uc73c\ub85c \uc778\ucf54\ub529\ud558\uc5ec \ub2e4\uc74c\uc5d0 \uc62c \ubb38\uc7a5\uc73c\ub85c \uc801\uc808\ud55c\uc9c0\ub97c \uacc4\uc0b0\ud574\uc57c \ud569\ub2c8\ub2e4.<\/li>\n      <li>\uc989, \ud558\ub098\uc758 query\ub97c \ucc98\ub9ac\ud558\ub824\uba74 Transformer Encoder\uc758 inference\ub97c \uc218\ubc31, \uc218\ucc9c\ub9cc\ubc88 \uc9c4\ud589\ud574\uc57c \ud558\ub294 \uac83\uc785\ub2c8\ub2e4. \uc774\ub294 \uc0ac\uc2e4\uc0c1 \ubd88\uac00\ub2a5\ud558\uc8e0.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Poly-encoder\ub294 \ub458\uc758 \uc7a5\uc810\uc744 \uc801\uc808\ud788 \ucde8\ud558\uace0\uc790 \ud569\ub2c8\ub2e4.\n    <ul>\n      <li>Context\uc640 candidate\uc740 \ub3c5\ub9bd\uc801\uc73c\ub85c Transformer Encoder\ub97c \ud0c0\uae30 \ub54c\ubb38\uc5d0 Candidate Embedding\uc744 \ubbf8\ub9ac \uacc4\uc0b0\ud558\uc5ec \uad6c\ud574\ub458 \uc218 \uc788\uc2b5\ub2c8\ub2e4.<\/li>\n      <li>\uc2e4\uc81c\ub85c query\ub85c \ub4e4\uc5b4\uc628 context\uc5d0 \ub300\ud574\uc11c \ud55c \ubc88 inference\ub97c \ud558\uace0, \ubbf8\ub9ac \uacc4\uc0b0\ud574 \ub454 Candidate Embedding\ub4e4\uc5d0 \ub300\ud574\uc11c attention\ub9cc \uba87 \ubc88\ud574\uc11c \ucd5c\uc885 score\ub97c \uacc4\uc0b0\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n        <ul>\n          <li>\ub2f9\uc5f0\ud788 attention\uc744 \uba87 \ubc88 \ub354 \ud558\uae30 \ub54c\ubb38\uc5d0 Bi-encoder \ubcf4\ub2e4\ub294 \ub290\ub9ac\uc9c0\ub9cc \uadf8\ub9ac \uc5f0\uc0b0\ub7c9\uc774 \ub9ce\uc9c0 \uc54a\uc544\uc11c \uac10\uc218\ud560\ub9cc \ud569\ub2c8\ub2e4.<\/li>\n        <\/ul>\n      <\/li>\n      <li>\uc18d\ub3c4\uc5d0 \uc190\ud574\ub97c \ubcf8 \ub300\uc2e0\uc5d0 \uc815\ud655\ub3c4 \uba74\uc5d0\uc11c \uc774\ub4dd\uc744 \ubcfc \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n        <ul>\n          <li>Bi-encoder\uc640 \ub2ec\ub9ac \ucd5c\uc885 score \uacc4\uc0b0 \uc804\uc5d0 context\uc640 candidate\uc744 \uc544\uc6b0\ub974\ub294 attention \uc5f0\uc0b0\uc744 \ud558\uae30 \ub54c\ubb38\uc5d0 \ub458\uc758 \uad00\uacc4\ub97c \uc880 \ub354 \uc798 \uc774\ud574\ud560 \uc218 \uc788\uac8c \ub418\uace0, \ub2f9\uc5f0\ud788 \ubcf4\ub2e4 \uc801\uc808\ud55c \ub2e4\uc74c \ubb38\uc7a5\uc744 \ucc3e\uc744 \uc218 \uc788\uac8c \ub429\ub2c8\ub2e4.<\/li>\n        <\/ul>\n      <\/li>\n      <li>\uc774\ub807\uac8c Bi-encoder\uc758 \uc7a5\uc810\uc778 inference \uc18d\ub3c4\uc640 Cross-encoder\uc758 \uc7a5\uc810\uc778 \uc815\ud655\ub3c4\ub97c \uc801\uc808\ud788 \uac16\ucd98 \uac83\uc785\ub2c8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"experiments\">Experiments<\/h2>\n\n<ul>\n  <li>\uc55e\uc11c \uc18c\uac1c\ud55c ConvAI2, DSTC7, Ubuntu v2 \ub4f1\uc758 \ub370\uc774\ud130\uc5d0 \ub300\ud574\uc11c \uc2e4\ud5d8\uc744 \uc9c4\ud589\ud55c \uacb0\uacfc\ub294 \ub2e4\uc74c\uacfc \uac19\uc2b5\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-08-11-poly-encoder\/table4.png\" alt=\"table4\" \/><\/p>\n\n<ul>\n  <li>Poly-encoder\ub294 Bi-encoder \ubcf4\ub2e4\ub294 \uc88b\uace0 Cross-encoder\uc5d0\ub294 \ubabb\ubbf8\uce58\ub294 \uc131\ub2a5\uc744 \ubcf4\uc600\uc2b5\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-08-11-poly-encoder\/table2.png\" alt=\"table2\" \/><\/p>\n\n<ul>\n  <li>\uadf8\ub9ac\uace0 \uc704\uc758 \uc18c\uac1c\ud55c \ubaa8\ub378\uc740 \ubaa8\ub450 batch \ub0b4\uc758 \uc0d8\ud50c\uc744 negative\ub85c \ud65c\uc6a9\ud558\uc5ec \ud559\uc2b5\uc744 \ud558\ub294\ub370, batch size\uac00 \ud074\uc218\ub85d \uc131\ub2a5\uc774 \ud5a5\uc0c1\ub418\ub294 \uacb0\uacfc\ub97c \uc5bb\uc5c8\uc2b5\ub2c8\ub2e4.<\/li>\n  <li>\ub9ce\uc740 negative\ub4e4 \uc18d\uc5d0\uc11c \uc9c4\uc9dc positive\ub97c \ucc3e\ub294 \uac83\uc774 \ub354 \uc5b4\ub835\uae30 \ub54c\ubb38\uc5d0, batch size\uc758 \ud06c\uae30\ub97c \ud0a4\uc6cc\uc11c negative\uc758 \uc218\ub97c \ub298\ub9b4\uc218\ub85d \ubaa8\ub378\uc744 \ub354 \ud6a8\uacfc\uc801\uc73c\ub85c \ud559\uc2b5\uc2dc\ud0ac \uc218 \uc788\uc5c8\ub358 \uac83 \uac19\uc2b5\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-08-11-poly-encoder\/table5.png\" alt=\"table5\" \/><\/p>\n\n<ul>\n  <li>\uc704\uc758 \ud45c\ub294 \uac01 \ubaa8\ub378\ub4e4\uc744 CPU\uc640 GPU \ud658\uacbd\uc5d0\uc11c inference \ud588\uc744 \ub54c \uac78\ub9ac\ub294 \uc2dc\uac04\uc744 \uce21\uc815\ud55c \uac83\uc785\ub2c8\ub2e4.<\/li>\n  <li>Bi-encoder\uac00 \uac00\uc7a5 \ube60\ub978 \uac83\uc744 \ubcfc \uc218 \uc788\uace0 Poly-encoder\ub3c4 \uc774\uc640 \ube44\uc2b7\ud558\uac8c \ube60\ub978 \uac83\uc744 \ubcfc \uc218 \uc788\uc2b5\ub2c8\ub2e4.<\/li>\n  <li>Cross-encoder\ub294 \uc774 \ub458\uc5d0 \ube44\ud574\uc11c \ub9d0\ub3c4 \uc548\ub418\uac8c \uc624\ub79c \uc2dc\uac04\uc774 \uac78\ub9bd\ub2c8\ub2e4. \ub530\ub77c\uc11c Cross-encoder \ubc29\uc2dd\uc73c\ub85c query\uc5d0 \ub300\ud55c \ubb38\uc7a5\uc744 \uac80\uc0c9\ud558\ub294 \uc11c\ube44\uc2a4 \ub9cc\ub4dc\ub294 \uac83\uc740 \ud604\uc2e4\uc801\uc73c\ub85c \uc5b4\ub824\uc6b8 \uac83\uc73c\ub85c \ubcf4\uc785\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<ul>\n  <li>\uae30\uc874\uc758 Bi-encoder\uc640 Cross-encoder\uc758 \ud55c\uacc4\ub97c \uadf9\ubcf5\ud558\ub294 Poly-encoder\ub97c \uc81c\uc548\ud568<\/li>\n  <li>\uc5ec\ub7ec \ud0dc\uc2a4\ud06c\uc5d0 \ub300\ud574\uc11c Bi-encoder \ubcf4\ub2e4 \uc88b\uc740 state-of-the-art \uc131\ub2a5\uc744 \uc5bb\uc74c<\/li>\n  <li>Bi-encoder\uc758 inference \uc18d\ub3c4\uc5d0 \uac00\uae4c\uc6b4 \ube60\ub978 inference \uc18d\ub3c4\ub97c \ubcf4\uc774\uba70 \uc2e4\uc81c \uc11c\ube44\uc2a4\uc5d0 \ub300\ud55c \ud65c\uc6a9 \uac00\ub2a5\uc131\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n","pubDate":"Tue, 11 Aug 2020 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/poly-encoder\/","guid":"https:\/\/roomylee.github.io\/poly-encoder\/","category":["poly-encoder","representation","blog"]},{"title":"ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR 2020)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/openreview.net\/pdf?id=r1xMH1BtvB\">https:\/\/openreview.net\/pdf?id=r1xMH1BtvB<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning<\/li>\n      <li>Stanford University &amp; Google Brain<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>ICLR 2020<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Masked language modeling\uc740 \ub300\ud45c\uc801\uc73c\ub85c BERT\uac00 input\uc744 $[MASK]$\ub85c \uce58\ud658\ud558\uace0 \uc774\ub97c \uc6d0\ubcf8 token\uc73c\ub85c \uc7ac\ubcf5\uc6d0\ud558\ub294 \ubc29\uc2dd\uc758 pre-training method\uc784<\/li>\n  <li>Downstream task\uc73c\ub85c transferring \ud588\uc744 \ub54c \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc774\uae34 \ud558\uc9c0\ub9cc \ub9ce\uc740 \uc591\uc758 \uacc4\uc0b0 \ube44\uc6a9\uc774 \ub4ec<\/li>\n  <li>\uc774\ub7f0 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574\uc11c, <strong>replaced token detection<\/strong> \uc774\ub77c\ub294 sample-efficient pre-training task\ub97c \uc81c\uc548\ud568<\/li>\n  <li>\uc774 \uae30\ubc95\uc740 input\uc744 masking\ud558\ub294 \uac83 \ub300\uc2e0\uc5d0 small generator network\ub85c\ubd80\ud130 sampling \ub41c \uadf8\ub7f4\uc2f8\ud55c token\uc73c\ub85c \uce58\ud658\ud568<\/li>\n  <li>\uadf8 \ub2e4\uc74c, \ubaa8\ub378\uc774 masked token\uc758 \uc6d0\ubcf8\uc744 \uc608\uce21\ud558\uba74\uc11c \ud559\uc2b5\ud558\ub294 \uac83 \ub300\uc2e0\uc5d0 discriminative model\uc744 \ub46c\uc11c \uac01 token\uc774 \uc2e4\uc81c token\uc778\uc9c0 generator\uac00 \ub9cc\ub4e0 \uac00\uc9dc token\uc778\uc9c0 \ud310\ubcc4\ud558\ub294 \ud559\uc2b5\uc744 \uc2dc\ud0b4<\/li>\n  <li>\uc774\ub7ec\ud55c Replaced token detection task\ub294 input token \uc804\uccb4\uc5d0 \ub300\ud574\uc11c \ud559\uc2b5\ud558\uae30 \ub54c\ubb38\uc5d0, \uc804\uccb4 token \uc911\uc5d0 \uc791\uc740 \ubd80\ubd84\ub9cc\uc744 masking\ud574\uc11c \ud559\uc2b5\ud558\ub294 \uae30\uc874 MLM \ubcf4\ub2e4 \ud6e8\uc52c \ud6a8\uc728\uc801\uc778 \ud559\uc2b5\uc744 \ud560 \uc218 \uc788\uc73c\uba70, \uc774\ub97c \ucca0\uc800\ud55c \uc2e4\ud5d8\uc744 \ud1b5\ud574\uc11c \uc99d\uba85\ud568<\/li>\n  <li>\uacb0\uacfc\uc801\uc73c\ub85c \ub3d9\uc77c\ud55c \ubaa8\ub378 \ud06c\uae30, \ub370\uc774\ud130, \ucef4\ud4e8\ud305 \ud30c\uc6cc\uc5d0 \ub300\ud574\uc11c BERT\ub97c outperform\ud588\uc74c<\/li>\n  <li>\uc774\ub7f0 \uacb0\uacfc\ub294 \ubaa8\ub378 \ud06c\uae30\uac00 \uc791\uc744 \ub54c \ub354\uc6b1 \ubd80\uac01\ub428. \uc774 \uae30\ubc95\uc73c\ub85c single GPU 4\uc77c \ud559\uc2b5\ud55c \ubaa8\ub378\uc774 GPT \ubcf4\ub2e4 GLUE\uc5d0\uc11c \uc88b\uc740 \uc131\ub2a5\uc744 \uc5bb\uc74c (GPT\uc758 \ud559\uc2b5\ub7c9\uc774 30\ubc30 \ub9ce\uc740\ub370\ub3c4 \ubd88\uad6c\ud558\uace0)<\/li>\n  <li>\ub610\ud55c  RoBERTa\uc640 XLNet\uc758 1\/4\uc758 \ucef4\ud4e8\ud305\uc73c\ub85c \ube44\uc2b7\ud55c \uc131\ub2a5\uc744 \ubf51\uc558\uace0, \uac19\uc740 \uc591\uc73c\ub85c \ud559\uc2b5\uc744 \uc2dc\ud0a4\uba74 outperform\ud568<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>\ud604\uc7ac state-of-the-art representation learning method\ub294 \uc77c\uc885\uc758 denoising autoencoder \ud559\uc2b5\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\uc8fc\ub85c MLM\uc774\ub77c\ub294 \uc6d0\ubcf8 input\uc5d0\uc11c \uc57d 15% \uc815\ub3c4\uc758 token\uc744 masking\ud558\uace0 \uc774\ub97c \ubcf5\uc6d0\uc2dc\ud0a4\ub294 task\ub97c \ud1b5\ud574 \ud559\uc2b5\uc744 \ud568<\/li>\n  <li>\uae30\uc874 (autoregressive) language modeling \ud559\uc2b5\uc5d0 \ube44\ud574 bidirectional \uc815\ubcf4\ub97c \uace0\ub824\ud55c\ub2e4\ub294 \uc810\uc5d0\uc11c \ud6a8\uacfc\uc801\uc778 \ud559\uc2b5\uc744 \ud560 \uc218 \uc788\uc5c8\uc74c<\/li>\n  <li>\ud558\uc9c0\ub9cc \uc774\ub7f0 MLM \uae30\ubc18\uc758 \uae30\ubc95\ub4e4\uc740 \ubb38\uc81c\uac00 \uc788\uc74c\n    <ol>\n      <li>\uace0\uc791 15%\ub9cc \ubc16\uc5d0 \ud559\uc2b5\uc744 \ubabb\ud568<\/li>\n      <li>(\uadf8\ub798\uc11c) \ud559\uc2b5\ud558\ub294\ub370 \ube44\uc6a9\uc774 \ub9ce\uc774 \ub4ec<\/li>\n      <li>\ud559\uc2b5 \ub54c\ub294 $[MASK]$ token\uc744 \ubaa8\ub378\uc774 \ucc38\uace0\ud558\uc5ec \uc608\uce21\ud558\uc9c0\ub9cc \uc2e4\uc81c(inference)\ub85c\ub294 $[MASK]$ token\uc774 \ub4e4\uc5b4\uc624\uc9c0 \uc54a\uc74c<\/li>\n    <\/ol>\n  <\/li>\n  <li>\uc774\ub7f0 \uc810\ub4e4\uc744 \ud574\uacb0\ud558\uae30 \uc704\ud574 <em>replaced token detection<\/em> \uc774\ub77c\ub294 \uc0c8\ub85c\uc6b4 pre-training task\ub97c \uc81c\uc548\ud568\n    <ul>\n      <li><strong>Replaced token detection<\/strong>: \uc2e4\uc81c input\uc758 \uc77c\ubd80 token\uc744 generator\ub97c \ud1b5\ud574 \uadf8\ub7f4\uc2f8\ud55c \uac00\uc9dc token\uc73c\ub85c \ubc14\uafb8\uace0 discriminator\uac00 \uac01 token\uc774 \uc6d0\ubcf8\uc5d0\uc11c \uc628 \uc9c4\uc9dc\uc778\uc9c0 \uc0dd\uc131(sampling)\ud574\ub0b8 \uac00\uc9dc\uc778\uc9c0 \ub9de\ucd94\ub294 task<\/li>\n    <\/ul>\n  <\/li>\n  <li>15%\uac00 \uc544\ub2cc input \ubb38\uc7a5\uc758 \uc804\uccb4 token\uc5d0 \ub300\ud574\uc11c \ud559\uc2b5\uc744 \ud560 \uc218 \uc788\uc5b4\uc11c \uc0c1\ub2f9\ud788 \ud6a8\uc728\uc801\uc774\uace0 \ud6a8\uacfc\uc801\uc784<\/li>\n  <li>\n    <p>\uc5bc\ud54f \ubcf4\uba74 GAN\uacfc \uc720\uc0ac\ud558\uc9c0\ub9cc generator\uac00 maximum likelihood\ub85c \ud559\uc2b5\ud55c\ub2e4\ub294 \uc810\uc5d0\uc11c adversarial\uc740 \uc544\ub2d8<\/p>\n  <\/li>\n  <li>\uc704\uc640 \uac19\uc740 \ubc29\uc2dd\uc73c\ub85c \ud559\uc2b5\uc2dc\ud0a8 pre-trained LM\uc778 ELECTRA (for \u201cEfficiently Learning an Encoder that Classifies Token Replacements Accurately\u201d)\ub97c \uc81c\uc548\ud568<\/li>\n  <li>ELECTRA\ub294 \uc55e\uc120 \uc124\uba85\uacfc \uac19\uc774 \ubaa8\ub4e0 token\uc5d0 \ub300\ud574\uc11c \ud559\uc2b5\uc744 \ud560 \uc218 \uc788\uc5b4\uc11c BERT\ubcf4\ub2e4 \ud6e8\uc52c \ube60\ub974\uac8c \ud559\uc2b5\uc774 \uac00\ub2a5\ud558\uace0 \ucd5c\uc885 \ud559\uc2b5\uc744 \uc644\ub8cc\ud558\uba74 downstream tasks\uc5d0\uc11c \ub354 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>Figure 1\uc744 \ubcf4\uba74 ELECTRA\uac00 \ub2e4\ub978 approaches\uc5d0 \ube44\ud574\uc11c \ub9e4\uc6b0 \ube60\ub974\uac8c \uc131\ub2a5\uc774 \ud5a5\uc0c1\ub418\ub294 \uac83\uc744 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\uac19\uc740 \ud06c\uae30\uc758 \ubaa8\ub378\ub4e4\uc5d0 \ube44\uad50\ud558\uba74 \ucd5c\uc885 \uc131\ub2a5 \ud3ec\ud568 \ubaa8\ub4e0 \uacfc\uc815\uc5d0\uc11c \ub354 \ub192\uc740 GLUE \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>ELECTRA-Small\uc740 single GPU\ub85c 4\uc77c\uc774\uba74 \ud559\uc2b5\uc774 \uac00\ub2a5 (\uc774\ub294 BERT-Large\uc758 1\/20 parameter, 1\/135 \uacc4\uc0b0\ub7c9(compute)\uc5d0 \ud574\ub2f9\ud558\ub294 \uc218\uce58)\n    <ul>\n      <li>\uadf8\ub7fc\uc5d0\ub3c4 \ubd88\uad6c\ud558\uace0 BERT-Small \ubcf4\ub2e4 GLUE\uc5d0\uc11c 5\uc810\uc774\ub098 \ub192\uace0, \uc2ec\uc9c0\uc5b4\ub294 GPT \ubcf4\ub2e4\ub3c4 \ub192\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>Large scale\uc5d0 \ub300\ud574\uc11c \uc5ed\uc2dc \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784\n    <ul>\n      <li>ELECTRA-Large\ub294 RoBERTa\ub098 XLNet \ubcf4\ub2e4 \ub354 \uc801\uc740 parameter, 1\/4\uc758 \uacc4\uc0b0\ub7c9\uc73c\ub85c \ud559\uc2b5\ud588\uc9c0\ub9cc \uc774\ub4e4\uacfc \ube44\uc2b7\ud55c \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>GLUE\uc5d0\uc11c ALBERT (Lan et al., 2019) \ubcf4\ub2e4\ub3c4 outperform\ud588\uace0 SQuAD 2.0\uc740 SOTA\ub97c \ucc0d\uc74c<\/li>\n  <li>\uc885\ud569\uc801\uc73c\ub85c \ubd24\uc744 \ub54c, \uc81c\uc548\ud55c discriminative task\ub97c \ud1b5\ud574\uc11c \ubaa8\ub378\uc774 \ub354 \uc5b4\ub824\uc6b4 negative sample\uc5d0 \ub300\ud574\uc11c \ud559\uc2b5\ud588\uace0, \uae30\uc874 language representation learning approaches\ubcf4\ub2e4 \ub354 \ud6a8\uacfc\uc801\uc774\uace0 \ud6a8\uc728\uc801\uc778 \ud559\uc2b5\uc744 \ud568<\/li>\n<\/ul>\n\n<h2 id=\"2-method\">2. Method<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/figure2.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>Replaced token detection\uc73c\ub85c \ud559\uc2b5\ud558\uae30 \uc704\ud574\uc11c generator $G$\uc640 discriminator $D$, \ub450 \uac1c\uc758 network\uac00 \ud544\uc694\ud568<\/li>\n  <li>\ub450 network\ub294 <strong>Transformer Encoder \uad6c\uc870<\/strong>\uc774\uba70, sequence of input tokens $\\textbf{x} = [x_1, x_2, \u2026, x_n]$\uc744 \uc785\ub825\uc73c\ub85c \ubc1b\uc544\uc11c sequence of contextualized vector representations $h(\\textbf{x}) = [h_1, h_2, \u2026, h_n]$\ub85c \ub9e4\ud551\uc2dc\ud0b4<\/li>\n<\/ul>\n\n<h4 id=\"generator\">Generator<\/h4>\n\n<ul>\n  <li><strong>Generator $G$\ub294 BERT\uc758 MLM\uacfc \ub3d9\uc77c\ud558\uac8c \ud559\uc2b5\ud568<\/strong>\n    <ol>\n      <li>Input $\\textbf{x} = [x_1, x_2, \u2026, x_n]$\uc5d0 \ub300\ud574\uc11c masking\ud560 position set $\\textbf{m} = [m_1, m_2, \u2026, m_k]$\uc744 \uacb0\uc815\ud568\n        <ul>\n          <li>Position\uc740 integers between 1 and $n$\uc774\uba70, \uc544\ub798\uc640 \uac19\uc774 \uc218\ud559\uc801\uc73c\ub85c \ud45c\ud604\ud560 \uc218 \uc788\uc74c\n            <ul>\n              <li>$m_i \\sim \\text{unif} { 1, n } \\; \\text{for} \\; i = 1 \\; \\text{to} \\; k$<\/li>\n            <\/ul>\n          <\/li>\n          <li>Masking\ud560 \uac1c\uc218 $k$\ub294 \ubcf4\ud1b5 $0.15n$\uc744 \uc0ac\uc6a9 (\uc804\uccb4 token\uc758 15%)<\/li>\n        <\/ul>\n      <\/li>\n      <li>\uacb0\uc815\ud55c position\uc5d0 \ud574\ub2f9\ud558\ub294 input token\uc744 $[MASK]$\ub85c \uce58\ud658\ud568\n        <ul>\n          <li>\uc774 \uacfc\uc815\uc744 $\\textbf{x}^{masked} = \\text{REPLACE}(\\textbf{x}, \\textbf{m}, [MASK])$\uc640 \uac19\uc774 \ud45c\ud604<\/li>\n        <\/ul>\n      <\/li>\n      <li>Masked input $\\textbf{x}^{masked}$\uc5d0 \ub300\ud574\uc11c generator\ub294 \uc544\ub798\uc640 \uac19\uc774 \uc6d0\ub798 token\uc774 \ubb34\uc5c7\uc774\uc5c8\uc744\uc9c0\ub97c \uc608\uce21\ud568\n        <ul>\n          <li>\n            <p>\uc774\ub7f0 \uacfc\uc815\uc744 \uc218\ud559\uc801\uc73c\ub85c \ud45c\ud604\ud558\uba74 \uc544\ub798\uc640 \uac19\uc74c ($t$ \ubc88\uc9f8 token\uc5d0 \ub300\ud55c \uc608\uce21).<\/p>\n\n\\[p_G (x_t | \\textbf{x}^{masked}) = \\exp(e(x_t)^T h_G(\\textbf{x}^{masked})_t) \/ \\sum_{x'} \\exp(e(x')^T h_G(\\textbf{x}^{masked})_t)\\]\n          <\/li>\n          <li>\n            <p>\ub610\ud55c $e(\\cdot)$\ub294 embedding\uc744 \uc758\ubbf8\ud568. \uc989, \uc704\uc758 \uc2dd\uc740 LM\uc758 output layer\ub97c embedding layer\uc640 tying(weight sharing)\ud558\uaca0\ub2e4\ub294 \uc758\ubbf8<\/p>\n          <\/li>\n        <\/ul>\n      <\/li>\n      <li>\ucd5c\uc885\uc801\uc73c\ub85c \uc544\ub798\uc640 \uac19\uc740 MLM loss\ub97c \ud1b5\ud574 \ud559\uc2b5<\/li>\n    <\/ol>\n\n\\[\\mathcal{L}_{\\text{MLM}}(\\textbf{x}, \\theta_G) = \\mathbb{E} \\left( \\sum_{i \\in \\textbf{m}} -\\log p_G (x_i | \\textbf{x}^{masked}) \\right)\\]\n  <\/li>\n<\/ul>\n\n<h4 id=\"discriminator\">Discriminator<\/h4>\n\n<ul>\n  <li><strong>Discriminator $D$\ub294 input tokens\uc5d0 \ub300\ud574\uc11c \uac01 token\uc774 <em>original<\/em>\uc778\uc9c0 <em>replaced<\/em>\uc778\uc9c0 binary classification\uc73c\ub85c \ud559\uc2b5\ud568<\/strong>\n    <ol>\n      <li>Generator\ub97c \uc774\uc6a9\ud574\uc11c masked input token\uc5d0 \ub300\ud55c \uc608\uce21\uc744 \uc9c4\ud589\ud568 (\uc704\uc758 generator\uc758 1~3\ub2e8\uacc4)<\/li>\n      <li>\n        <table>\n          <tbody>\n            <tr>\n              <td>Generator\uc5d0\uc11c masking\ud560 position set $\\textbf{m}$\uc5d0 \ud574\ub2f9\ud558\ub294 \uc704\uce58\uc758 token\uc744 $[MASK]$\uac00 \uc544\ub2cc generator\uc758 output distribution $p_G(x_t<\/td>\n              <td>\\textbf{x})$\uc5d0 \ub300\ud574 sampling\ud55c token\uc73c\ub85c \uce58\ud658\ud568. \uc774\ub97c corrupt\uc2dc\ud0a8\ub2e4\uace0 \ud568<\/td>\n            <\/tr>\n          <\/tbody>\n        <\/table>\n        <ul>\n          <li>Original Input: [\u201cthe\u201d, \u201cchef\u201d, \u201ccooked\u201d, \u201cthe\u201d, \u201cmeal\u201d]<\/li>\n          <li>Input for generator: [\u201d$[MASK]$\u201d, \u201cchef\u201d, \u201c$[MASK]$\u201d, \u201cthe\u201d, \u201cmeal\u201d]<\/li>\n          <li>Input for discriminator: [\u201cthe\u201d, \u201cchef\u201d, \u201cate\u201d, \u201cthe\u201d, \u201cmeal\u201d]\n            <ul>\n              <li>\uccab\ubc88\uc9f8 \ub2e8\uc5b4\ub294 generator\uac00 \uc62c\ubc14\ub974\uac8c \u201cthe\u201d\ub77c\uace0 \uc608\uce21\ud568<\/li>\n              <li>\uc138\ubc88\uc9f8 \ub2e8\uc5b4\ub294 generator\uac00 \uc6d0\ub798 \u201ccooked\u201d\uc778\ub370 \u201cate\u201d\ub77c\uace0 \uc798\ubabb \uc608\uce21\ud568<\/li>\n            <\/ul>\n          <\/li>\n          <li>\n            <p>\uc774 \uce58\ud658 \uacfc\uc815\uc740 \uc218\ud559\uc801\uc73c\ub85c \ub2e4\uc74c\uacfc \uac19\uc74c<\/p>\n\n\\[\\textbf{x}^{corrupt} = \\text{REPLACE}(\\textbf{x}, \\textbf{m}, \\hat{\\textbf{x}})\n\\\\\n\\hat{\\textbf{x}} \\sim p_G (x_i | \\textbf{x}^{masked}) \\; \\text{for} \\; i \\in \\textbf{m}\\]\n          <\/li>\n        <\/ul>\n      <\/li>\n      <li>Corrupted input $\\textbf{x}^{corrupt}$\uc5d0 \ub300\ud574\uc11c discriminator\ub294 \uc544\ub798\uc640 \uac19\uc774 \uac01 token\uc774 original input\uacfc \ub3d9\uc77c\ud55c\uc9c0 \ubcc0\ud615(corrupt)\uc774 \ub41c \uac83\uc778\uc9c0 \uc608\uce21(binary classification)\ud568\n        <ul>\n          <li>Target classes (2)\n            <ul>\n              <li><em>original<\/em>: \uc774 \uc704\uce58\uc5d0 \ud574\ub2f9\ud558\ub294 token\uc740 \uc6d0\ubcf8 \ubb38\uc7a5\uc758 token\uacfc \uac19\uc740 \uac83<\/li>\n              <li><em>replaced<\/em>: \uc774 \uc704\uce58\uc5d0 \ud574\ub2f9\ud558\ub294 token\uc740 Generator\uc5d0 \uc758\ud574\uc11c \ubcc0\ud615\ub41c \uac83<\/li>\n            <\/ul>\n          <\/li>\n          <li>\n            <p>\uc774\ub7f0 \uacfc\uc815\uc744 \uc218\ud559\uc801\uc73c\ub85c \ud45c\ud604\ud558\uba74 \uc544\ub798\uc640 \uac19\uc74c ($t$ \ubc88\uc9f8 token\uc5d0 \ub300\ud55c \uc608\uce21).<\/p>\n\n\\[D(\\textbf{x}^{corrupt}, t) = \\text{sigmoid}(w^T h_D(\\textbf{x}^{corrupt})_t)\\]\n          <\/li>\n        <\/ul>\n      <\/li>\n      <li>\ucd5c\uc885\uc801\uc73c\ub85c \uc544\ub798\uc640 \uac19\uc740 loss\ub97c \ud1b5\ud574 \ud559\uc2b5<\/li>\n    <\/ol>\n\n\\[\\mathcal{L}_{Disc} (\\textbf{x}, \\theta_{D}) = \\mathbb{E} \\left( \\sum_{t=1}^{n} -\\mathbb{1}(x_{t}^{corrput} = x_t) \\log D(\\textbf{x}^{corrput}, t) - \\mathbb{1}(x_{t}^{corrput} \\neq x_t) \\log (1-D(\\textbf{x}^{corrput}, t)) \\right)\\]\n  <\/li>\n<\/ul>\n\n<h4 id=\"gan\uacfc\uc758-\ucc28\uc774\uc810\">GAN\uacfc\uc758 \ucc28\uc774\uc810<\/h4>\n\n<ul>\n  <li>\uc704\uc758 training objective\uac00 GAN\uacfc \uc720\uc0ac\ud558\uc9c0\ub9cc \uba87 \uac00\uc9c0 \ub2e4\ub978 \uc810\uc774 \uc788\uc74c\n    <ol>\n      <li>Generator\uac00 origianl token\uacfc \ub3d9\uc77c\ud55c token\uc744 \uc0dd\uc131\ud588\ub2e4\uba74 \uc774\ub294 discriminator\uc5d0\uc11c positive sample\ub85c \ucc98\ub9ac (GAN\uc5d0\uc11c\ub294 \uadf8\ub798\ub3c4 fake\ub85c \ucc98\ub9ac\ud568)<\/li>\n      <li>Generator\uac00 discriminator\ub97c \uc18d\uc774\uae30 \uc704\ud574 adversarial\ud558\uac8c \ud559\uc2b5\ud558\ub294 \uac8c \uc544\ub2c8\uace0 \uadf8\ub0e5 maximum likelihood\ub85c \ud559\uc2b5\ud568. \uc131\ub2a5\ub3c4 maximum likelihood\uac00 \ub354 \uc88b\uc74c\n        <ul>\n          <li>\uc77c\ub2e8 generator\ub85c\ubd80\ud130 sampling\ud558\ub294 \uacfc\uc815 \ub54c\ubb38\uc5d0 adversarial\ud558\uac8c generator\ub97c \ud559\uc2b5\ud558\ub294 \uac8c \uc5b4\ub824\uc6c0 (back-propagation \ubd88\uac00\ub2a5)<\/li>\n          <li>\uadf8\ub798\uc11c reinforcement learning\uc73c\ub85c \uc774\ub97c \uad6c\ud604\ud574\ubcf4\uc558\uc9c0\ub9cc maximum likelihood\ub85c \ud559\uc2b5\uc2dc\ud0a4\ub294 \uac83\ubcf4\ub2e4 \uc131\ub2a5\uc774 \ubcc4\ub85c\uc600\uc74c (see Appendix F)<\/li>\n        <\/ul>\n      <\/li>\n      <li>Generator\uc758 input\uc73c\ub85c noise vector\ub97c \ub123\uc5b4\uc8fc\uc9c0 \uc54a\uc74c<\/li>\n    <\/ol>\n  <\/li>\n  <li>\n    <p>\ucd5c\uc885\uc801\uc73c\ub85c\ub294 large corpus $\\mathcal{X}$\uc5d0 \ub300\ud574\uc11c \uc704\uc758 generator\uc640 discriminator\uc758 loss\ub97c \ud569\uccd0\uc11c \ud559\uc2b5\ud568<\/p>\n\n\\[\\min_{\\theta_G, \\theta_D} \\sum_{\\textbf{x} \\in \\mathcal{X}} \\mathcal{L}_{\\text{MLM}}(\\textbf{x}, \\theta_G) + \\lambda \\mathcal{L}_{Disc} (\\textbf{x}, \\theta_{D})\\]\n\n    <ul>\n      <li>$\\lambda$\ub294 50\uc744 \uc37c\ub2e4\uace0 Appendix A\uc5d0 \ub098\uc640\uc788\uc74c. Discriminator\ub294 binary classification\uc774\uace0 generator\ub294 30000-way classification\uc774\uc5ec\uc11c \uc804\ubc18\uc801\uc73c\ub85c discriminator\uc758 loss\uac00 generator\uc5d0 \ube44\ud574 \ub9e4\uc6b0 \uc791\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc55e\uc5d0\uc11c \uc124\uba85\ud588\ub4ef\uc774 <strong>sampling \uacfc\uc815\uc774 \uc788\uae30 \ub54c\ubb38\uc5d0 discriminator\uc758 loss\ub294 generator\uc5d0\uac8c back-propagate \ub418\uc9c0 \uc54a\uc74c<\/strong><\/li>\n  <li>\uc704\uc758 \uad6c\uc870\ub85c pre-training\uc744 \ub9c8\uce5c \ub4a4, <strong>generator\ub294 \ubc84\ub9ac\uace0 discriminator\ub9cc \ucde8\ud574\uc11c downstream task\uc5d0 \ub300\ud55c fine-tuning\uc744 \uc9c4\ud589<\/strong>\ud568<\/li>\n<\/ul>\n\n<h2 id=\"experiments\">Experiments<\/h2>\n\n<h3 id=\"31-experimental-setup\">3.1. Experimental Setup<\/h3>\n\n<ul>\n  <li>GLUE, SQuAD\ub85c \uc2e4\ud5d8<\/li>\n  <li>\ub370\uc774\ud130(Wikipedia, BooksCorpus), \ubaa8\ub378 \ud06c\uae30, hyperparameter \ub4f1 \ub300\ubd80\ubd84\uc758 \uc2e4\ud5d8 \uc138\ud305\uc744 BERT\uc640 \ub3d9\uc77c\ud558\uac8c \uac00\uc838\uac10<\/li>\n  <li>Large \ubaa8\ub378\uc758 \uacbd\uc6b0 XLNet\uacfc \ub3d9\uc77c\ud558\uac8c \ub9de\ucda4<\/li>\n  <li>English\uc5d0 \ub300\ud574\uc11c\ub9cc \ud588\uace0, multilingual data\ub294 future work<\/li>\n  <li>\uc131\ub2a5\uc740 median of 10 fine-tuning runs from same pre-trained checkpoint\ub85c \uce21\uc815<\/li>\n  <li>Appendix\uc5d0 futher training details and hyperparameter values\uac00 \uc788\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"32-model-extensions\">3.2. Model Extensions<\/h3>\n\n<h4 id=\"weight-sharing\">Weight sharing<\/h4>\n\n<ul>\n  <li>Pre-training\uc758 \ud6a8\uc728\uc744 \ud5a5\uc0c1\uc2dc\ud0a4\uae30 \uc704\ud574\uc11c generator\uc640 discriminator\uc758 wieght\ub97c sharing\ud558\ub3c4\ub85d \ud574\ubd04\n    <ul>\n      <li>Generator\uc640 discriminator\uac00 \ub3d9\uc77c\ud55c \ud06c\uae30\uc758 Tranformer\ub77c\uba74 \ubaa8\ub4e0 weight\ub97c tying\ud560 \uc218 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ud558\uc9c0\ub9cc \ub3d9\uc77c\ud55c \ud06c\uae30\uc758 \ubaa8\ub378\uc744 tying\ud574\uc11c \uc4f0\ub294 \uac83\ubcf4\ub2e4 small generator\ub97c \uc4f0\ub294 \uac8c \ub354 \ud6a8\uc728\uc801\uc774\ub77c\ub294 \uc0ac\uc2e4\uc744 \uc54c\uac8c \ub428\n    <ul>\n      <li>\uc774 \uacbd\uc6b0\uc5d0\ub294 token and positional embedding\ub9cc sharing\ud558\ub3c4\ub85d \ud568<\/li>\n      <li>Embedding size\ub294 discriminator\uc758 hidden size\ub85c \uc7a1\uc558\uace0 generator\uc758 \uacbd\uc6b0 linear layer\ub97c \ub46c\uc11c generator\uc758 hidden size\uc5d0 \ub9de\uac8c projection\uc2dc\ud0b4<\/li>\n    <\/ul>\n  <\/li>\n  <li>Generator\uc758 input\uacfc output token embedding\uc740 BERT\uc640 \ub9c8\ucc2c\uac00\uc9c0\ub85c \ud56d\uc0c1 tying \ucc98\ub9ac<\/li>\n  <li>GLUE scores\n    <ul>\n      <li>No tying: 83.5<\/li>\n      <li>Tying token embeddings: 84.3<\/li>\n      <li>Tying all weight: 84.4<\/li>\n    <\/ul>\n  <\/li>\n  <li>Discriminator\ub294 input\uc73c\ub85c \ub4e4\uc5b4\uc628 token\ub9cc \ud559\uc2b5\ud558\uac8c \ub428. \ud558\uc9c0\ub9cc generator\ub294 output layer\uc5d0\uc11c softmax\ub97c \ud1b5\ud574 vocab\uc5d0 \uc788\ub294 \ubaa8\ub4e0 token\uc5d0 \ub300\ud574 densly \ud559\uc2b5\uc744 \ud568\n    <ul>\n      <li>\uadf8\ub798\uc11c embedding\uc744 sharing\ud558\ub294 \uac8c \ub3c4\uc6c0\uc774 \ub9ce\uc774 \ub418\uc9c0 \uc54a\uc558\uc744\uae4c \uc2f6\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ubc18\uba74, \ubaa8\ub4e0 weight\ub97c sharing\ud558\ub294 \uac83\uc740 \uc57d\uac04\uc758 \uc131\ub2a5 \ud5a5\uc0c1\uc774 \uc788\uc9c0\ub9cc generator\uc640 discriminator\ub97c \uac19\uc740 \ud06c\uae30\ub85c \ub9de\ucdb0\uc57c \ud55c\ub2e4\ub294 \ud070 \ub2e8\uc810\uc774 \uc788\uc74c<\/li>\n  <li>\uadf8\ub798\uc11c \uc774\ud6c4 \uc2e4\ud5d8\uc5d0\uc11c\ub294 embedding\ub9cc sharing\ud558\ub3c4\ub85d \uc138\ud305\ud568<\/li>\n<\/ul>\n\n<h4 id=\"smaller-generators\">Smaller Generators<\/h4>\n\n<ul>\n  <li>Generator\uc640 discriminator\uc758 \ud06c\uae30\uac00 \uac19\ub2e4\uba74 ELECTRA\ub97c \ud559\uc2b5\ud558\uae30 \uc704\ud574\uc11c \uc77c\ubc18 MLM \ubaa8\ub378\uc5d0 \ube44\ud574 \uac70\uc758 \ub450\ubc30 \uacc4\uc0b0\ub7c9\uc774 \ucee4\uc9d0<\/li>\n  <li>\uc774 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574\uc11c generator\uc758 \ud06c\uae30\ub97c \uc904\uc5ec\ubd04. \uad6c\uccb4\uc801\uc73c\ub85c\ub294 \ub2e4\ub978 hyperparameter\ub294 \uadf8\ub300\ub85c \ub450\uace0 layer\uc758 \ud06c\uae30\ub9cc \uc904\uc784\n    <ul>\n      <li>\uc5ec\uae30\uc11c layer\uc758 \ud06c\uae30\ub780, hidden size, FFN size, # of attention heads\ub97c \uc758\ubbf8<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uac70\uae30\uc5d0 \ucd94\uac00\ub85c \ud559\uc2b5 corpus\uc5d0 \ub4f1\uc7a5\ud558\ub294 unigram\uc758 distribution(frequency) \uae30\ubc18\uc73c\ub85c sampling\ud558\ub294 \ub9e4\uc6b0 \ub9e4\uc6b0 \uac04\ub2e8\ud55c unigram generator\ub85c\ub3c4 \uc2e4\ud5d8\uc744 \ud574\ubcf4\uc558\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/figure3.png\" alt=\"figure3\" \/><\/p>\n\n<ul>\n  <li>\uc2e4\ud5d8\uc758 \uacb0\uacfc\ub294 \uc704\uc758 Figure 3\uc758 \uc67c\ucabd\uacfc \uac19\uc558\uc73c\uba70 \ubaa8\ub450 \ub3d9\uc77c\ud558\uac8c 500K steps \ud559\uc2b5\uc2dc\ucf30\uc74c\n    <ul>\n      <li>\ubaa8\ub450 500K steps\ub9cc\ud07c \ud559\uc2b5\ud588\uae30\uc5d0 \uc791\uc740 \ubaa8\ub378 \uc785\uc7a5\uc5d0\uc11c\ub294 \uacc4\uc0b0\ub7c9 \ub300\ube44 \uc190\ud574\ub97c \ubcf8 \uc148. \ub611\uac19\uc740 \uacc4\uc0b0\ub7c9, \uc2dc\uac04\ub9cc\ud07c \ud559\uc2b5\ud588\ub2e4\uace0 \ud558\uba74 \uc791\uc740 \ubaa8\ub378\uc740 \ub354 \ub9ce\uc740 step\uc744 \ub3cc \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0<\/li>\n    <\/ul>\n  <\/li>\n  <li>(\uadf8\ub7fc\uc5d0\ub3c4 \ubd88\uad6c\ud558\uace0) 1\/4\uc5d0\uc11c 1\/2 \ud06c\uae30\uc758 generator\ub97c \uc37c\uc744 \ub54c \uac00\uc7a5 \uc131\ub2a5\uc774 \uc88b\uc558\uc74c<\/li>\n  <li>\uc65c \uc774\ub7f0 \uacb0\uacfc\uac00 \ubc1c\uc0dd\ud588\uc744\uae4c?\n    <ul>\n      <li>\uc544\ub9c8 generator\uac00 \ub108\ubb34 \uac15\ub825\ud558\uba74 discriminator\uc758 task\uac00 \ub108\ubb34 \uc5b4\ub824\uc6cc\uc838\uc11c \uc774\ub7f0 \ud604\uc0c1\uc774 \ubc1c\uc0dd\ud558\ub294 \uac8c \uc544\ub2d0\uae4c \ucd94\uce21\ud568<\/li>\n      <li>\uac8c\ub2e4\uac00 discriminator\uc758 parameter\ub97c \uc2e4\uc81c \ub370\uc774\ud130 \ubd84\ud3ec\uac00 \uc544\ub2cc generator\ub97c \ubaa8\ub378\ub9c1\ud558\ub294\ub370 \uc0ac\uc6a9\ud558\uac8c \ub420 \uc218\ub3c4 \uc788\uc74c. (Generator\uac00 \uac15\ub825\ud558\uba74 output \ubd84\ud3ec\uac00 \uce58\uc6b0\uccd0\uc838 \uc788\uc744 \uac00\ub2a5\uc131\uc774 \ub192\uace0 sampling\uc758 \uacb0\uacfc\uac00 \ub2e4\uc591\ud558\uc9c0 \uc54a\uc544\uc9c8 \uc218 \uc788\uc5b4\uc11c generator\uc5d0 \ud53c\ud305\ub41c\ub2e4\ub294 \ub9d0\uc774 \uc544\ub2d0\uae4c \uc2f6\uc74c)<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h4 id=\"training-algorithms\">Training Algorithms<\/h4>\n\n<ul>\n  <li>\uae30\ubcf8\uc801\uc73c\ub85c \uc55e\uc11c \uc18c\uac1c\ud55c training objective\ub85c generator\uc640 discriminator\ub97c jointly \ud559\uc2b5\ud568<\/li>\n  <li>\uc774 \ubc29\ubc95\uacfc \ub2e4\ub974\uac8c \uc544\ub798\uc640 \uac19\uc740 \ubc29\ubc95(two-stage)\uc73c\ub85c\ub3c4 \ud559\uc2b5\uc744 \uc2dc\ucf1c\ubd04\n    <ol>\n      <li>Generator\ub9cc $\\mathcal{L}_{\\text{MLM}}$\uc73c\ub85c $n$ steps \ud559\uc2b5\uc2dc\ud0b4<\/li>\n      <li>Discriminator\ub97c generator\uc758 trained weight\ub85c initialize\ud558\uace0 $\\mathcal{L}_{\\text{Disc}}$\ub85c generator\ub294 freeze\ud558\uace0 discriminator\ub9cc $n$ steps \ud559\uc2b5\uc2dc\ud0b4<\/li>\n    <\/ol>\n  <\/li>\n  <li>\ub610\ud55c GAN\ucc98\ub7fc adversarial training\ub3c4 \ud574\ubd04 (Appendix F\uc5d0 \uc790\uc138\ud788 \ub098\uc634)<\/li>\n  <li>\uacb0\uacfc\ub294 Figure 3\uc758 \uc624\ub978\ucabd\uacfc \uac19\uc73c\uba70 \uadf8\ub0e5 joint training\uc774 \uac00\uc7a5 \uc88b\uc558\uc74c<\/li>\n  <li>\uc704\uc758 two-stage \ubc29\ubc95\uc5d0\uc11c discriminative objective\ub85c \ubc14\uafb8\ub2c8\uae4c \uc131\ub2a5\uc774 \ucb49 \uc62c\ub790\ub2e4\ub294 \uac83\uc744 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>Adversarial training\uc774 maximum likelihood training\ubcf4\ub2e4 underperform \ud55c\ub2e4\ub294 \uac83\ub3c4 \uc54c \uc218 \uc788\uc5c8\uc74c. \uc774\ub7f0 \ud604\uc0c1\uc5d0 \ub300\ud55c \uc6d0\uc778\uc740 \ub2e4\uc74c\uacfc \uac19\uc74c\n    <ol>\n      <li>Masked language modeling \uc131\ub2a5\uc774 \uc548\uc88b\uc544\uc11c\n        <ul>\n          <li>MLM \uc131\ub2a5\uc740 58% \ubc16\uc5d0 \uc548\ub428. Maximum likelihood\ub85c \ud559\uc2b5\ud55c generator\ub294 65%<\/li>\n        <\/ul>\n      <\/li>\n      <li>\ud559\uc2b5\ub41c generator\uac00 \ub9cc\ub4dc\ub294 distribution\uc758 entropy\uac00 \ub0ae\uc544\uc11c\n        <ul>\n          <li>Output distribution\uc740 \ud558\ub098\uc758 token\uc5d0 \ud655\ub960\uc774 \uc3e0\ub824\uc788\uace0, \uc774\ub7ec\uba74 sampling\ud560 \ub54c \ub2e4\uc591\uc131\uc774 \ub9ce\uc774 \ub5a8\uc5b4\uc9d0<\/li>\n        <\/ul>\n      <\/li>\n    <\/ol>\n  <\/li>\n  <li>Text\ub97c \uc704\ud55c GAN\uc758 \uc774\uc804 \uc5f0\uad6c\uc5d0\uc11c \uc704 \ubb38\uc81c\uac00 \ubc1c\uacac\ub41c \ubc14 \uc788\uc74c (Caccia et al., 2018)<\/li>\n<\/ul>\n\n<h3 id=\"33-small-models\">3.3. Small Models<\/h3>\n\n<ul>\n  <li>\uc774 \uc5f0\uad6c\uc758 \ubaa9\uc801\uc740 pre-training\uc758 \ud6a8\uc728\uc131 \ud5a5\uc0c1\uc5d0 \uc788\uc74c. \uadf8\ub798\uc11c single GPU\ub85c\ub3c4 \ube60\ub974\uac8c \ud559\uc2b5\ud560 \uc218 \uc788\ub294 \uc218\uc900\uc73c\ub85c \uc791\uc740 \ubaa8\ub378\uc744 \ub9cc\ub4e4\uc5b4\ubd04<\/li>\n  <li>\n    <p>ELECTRA-Small \ubaa8\ub378\uc758 hyperparameters\ub294 \uc544\ub798\uc758 Table 6\uacfc \uac19\uc73c\uba70, \ub9e4\uc6b0 \uc791\ub2e4\ub294 \uac78 \uc54c \uc218 \uc788\uc74c<\/p>\n\n    <p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/table6.png\" alt=\"table6\" \/><\/p>\n  <\/li>\n  <li>\ub610\ud55c \uacf5\uc815\ud55c \ube44\uad50\ub97c \uc704\ud574 training FLOPs\ub97c \ub9de\ucdb0\uc11c BERT-Small\uc740 1.5M step, ELECTRA-Small\uc740 1M step \ud559\uc2b5\uc744 \uc2dc\ud0b4<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/table1.png\" alt=\"table1\" \/><\/p>\n\n<ul>\n  <li>\uacb0\uacfc\ub294 \uc704\uc758 Table 1\uacfc \uac19\uc774 ELECTRA-Small\uc774 BERT-Small\ubcf4\ub2e4 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc600\uace0, \uc2ec\uc9c0\uc5b4\ub294 \ud6e8\uc52c \ud070 \ubaa8\ub378\uc778 GPT \ubcf4\ub2e4\ub3c4 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>\ub610\ud55c \ub9e4\uc6b0 \ube60\ub978 \uc218\ub834 \uc18d\ub3c4\ub97c \ubcf4\uc784. Single GPU\ub85c 6\uc2dc\uac04 \ub9cc\uc5d0 \uaf64 \uc4f8\ub9cc\ud55c \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>ELECTRA-Base \uc5ed\uc2dc BERT-Base\ub97c \ub2a5\uac00\ud588\uc744 \ubfd0 \uc544\ub2c8\ub77c \uc2ec\uc9c0\uc5b4 BERT-Large \ubcf4\ub2e4\ub3c4 \ub354 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h3 id=\"34-large-models\">3.4. Large Models<\/h3>\n\n<ul>\n  <li>Large \ubaa8\ub378\uc5d0 \ub300\ud574\uc11c\ub3c4 \uc2e4\ud5d8\uc744 \ud574\ubcf4\uc558\uc74c. ELECTRA-Large\uc758 \ud06c\uae30\ub294 \uc5ed\uc2dc BERT-Large\uc758 \ud06c\uae30\uc5d0 \ub9de\ucdb0\uc11c \uc2e4\ud5d8\ud558\uc600\uace0 XLNet\uc5d0\uc11c \uc0ac\uc6a9\ud55c \ub370\uc774\ud130\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>\uacb0\uacfc\ub294 \uc544\ub798\uc758 Table 2(dev), Table 3(test)\uc640 \uac19\uc558\uc74c. \ubaa8\ub378\uba85 \uc606\uc5d0 \uc22b\uc790\ub294 \ud559\uc2b5 step\uc744 \uc758\ubbf8\ud568<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/table2.png\" alt=\"table2\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/table3.png\" alt=\"table3\" \/><\/p>\n\n<ul>\n  <li>ELECTRA-400K\ub294 RoBERTa(-500K)\ub098 XLNet\uc5d0 \ube44\ud574\uc11c 1\/4\uc758 \uacc4\uc0b0\ub7c9(FLOPs)\uc73c\ub85c comparable\ud55c \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>\ub354 \ub9ce\uc774 \ud559\uc2b5\uc2dc\ud0a8 ELECTRA-1.75M\uc740 \uc774\ub4e4\uc744 \ub6f0\uc5b4\ub118\ub294 \uc131\ub2a5\uc744 \ubcf4\uc600\uace0, \uc774 \uc5ed\uc2dc\ub3c4 \uacc4\uc0b0\ub7c9\uc740 \ub450 \ubaa8\ub378\ubcf4\ub2e4 \uc791\uc74c<\/li>\n  <li>SQuAD\uc5d0\uc11c\ub3c4 \ub9c8\ucc2c\uac00\uc9c0\ub85c \uac00\uc7a5 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784 (\ub17c\ubb38\uc758 Table 4 \ucc38\uace0)<\/li>\n<\/ul>\n\n<h3 id=\"35-efficiency-analysis\">3.5. Efficiency Analysis<\/h3>\n\n<ul>\n  <li>ELECTRA\uac00 \uc65c \uc798\ub418\ub294\uc9c0 \uc880 \ub354 \uc790\uc138\ud788 \uc774\ud574\ud574\ubcf4\uae30 \uc704\ud574 \ub2e4\uc74c\uc758 \uc2e4\ud5d8\uc744 \uc138\ud305\ud568 (\uc77c\uc885\uc758 ablation study \ub290\ub08c)\n    <ul>\n      <li><strong>ELECTRA 15%<\/strong>: ELECTRA\uc758 \uad6c\uc870\ub97c \uc720\uc9c0\ud558\ub418, discriminator loss\ub97c input tokens\uc758 15%\ub9cc\uc73c\ub85c \ub9cc\ub4e4\ub3c4\ub85d \uc138\ud305\n        <ul>\n          <li>\ubaa9\uc801: \ud559\uc2b5 \ud6a8\uc728(15% vs 100%)\ub85c \uc778\ud574 \uc131\ub2a5 \ucc28\uc774\uac00 \uc0dd\uacbc\ub2e4\ub294 \uac83\uc744 \ubcf4\uc774\uae30 \uc704\ud574\uc11c<\/li>\n        <\/ul>\n      <\/li>\n      <li><strong>Replace MLM<\/strong>: Discriminator\ub97c MLM \ud559\uc2b5\uc744 \ud558\ub418, $[MASK]$\ub85c \uce58\ud658\ud558\ub294 \uac8c \uc544\ub2c8\uace0 generator\uac00 \ub9cc\ub4e0 token\uc73c\ub85c \uce58\ud658\n        <ul>\n          <li>\ubaa9\uc801: Pre-training \ud560 \ub54c\ub9cc \uc4f0\uace0 fine-tuning \ub54c\ub294 \uc5c6\ub294 $[MASK]$ token\uc5d0 \ub300\ud55c \ubb38\uc81c\ub85c \uc778\ud55c \uc131\ub2a5 \ucc28\uc774\ub97c \ubcf4\uc774\uae30 \uc704\ud574\uc11c<\/li>\n        <\/ul>\n      <\/li>\n      <li><strong>All-Tokens MLM<\/strong>: Replace MLM\ucc98\ub7fc \ud558\ub418, \uc77c\ubd80(15%) token\ub9cc \uce58\ud658\ud558\ub294 \uac8c \uc544\ub2c8\uace0 \ubaa8\ub4e0 token\uc744 generator\uac00 \uc0dd\uc131\ud55c token\uc73c\ub85c \uce58\ud658\n        <ul>\n          <li>BERT\uc640 ELECTRA\ub97c \ud569\uce5c \ubc84\uc804<\/li>\n          <li>\uc774 \ubaa8\ub378\uc758 \uc131\ub2a5\uc744 \uc880 \ub354 \uacc4\uc120\ud558\uae30 \uc704\ud574 sigmoid layer\ub97c \ud1b5\ud574 input token\uc744 \uce74\ud53c\ud560\uc9c0\uc5d0 \ub300\ud55c \ud655\ub960 $D$\uc744 \ubf51\ub294 copy mechanism\uc744 \ub3c4\uc785<\/li>\n          <li>\uacb0\uacfc\uc801\uc73c\ub85c \ubaa8\ub378\uc758 output distribution\uc740 $D * \\text{input-token-distribution} + (1-D) * \\text{MLM-output-distribution}$\uc640 \uac19\uc740 \ud615\ud0dc\uc784<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/table5.png\" alt=\"table5\" \/><\/p>\n\n<ul>\n  <li>\uacb0\uacfc\ub294 \uc704\uc758 Table 5\uc640 \uac19\uc73c\uba70, ELECTRA 15%\uc640 Replace MLM \ubcf4\ub2e4 \ud6e8\uc52c \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc600\uace0 All-Tokens MLM\uc740 \uadf8\ub098\ub9c8 ELECTRA\uc5d0 \uac00\uae4c\uc6b4 \uc131\ub2a5\uc744 \ubcf4\uc774\uba70 BERT\uc640 ELELCTRA\uc758 \uc131\ub2a5 \ucc28\uc774\ub97c \ub9ce\uc774 \uc904\uc600\uc74c<\/li>\n  <li>\uc804\ubc18\uc801\uc73c\ub85c \uacb0\uacfc\ub97c \ubd24\uc744 \ub54c, ELECTRA\uac00 \ud559\uc2b5 \ud6a8\uc728\ub3c4 \uad49\uc7a5\ud788 \uc88b\uace0 $[MASK]$ token\uc5d0 \ub300\ud55c pre-train fine-tune mismatch \ubb38\uc81c\ub3c4 \uc0c1\ub2f9\ud788 \uc644\ud654\uc2dc\ucf30\ub2e4\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uc5c8\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-13-electra\/figure4.png\" alt=\"figure4\" \/><\/p>\n\n<ul>\n  <li>Figure 4\ub97c \ud1b5\ud574 Hidden size\uac00 \uc791\uc544\uc9c8\uc218\ub85d BERT\uc640 ELECTRA\uc758 \uc131\ub2a5 \ucc28\uc774\ub294 \ucee4\uc9c4\ub2e4\ub294 \uc0ac\uc2e4\uc744 \uc54c \uc218 \uc788\uc73c\uba70(see left and center), ELECTRA\ub294 \ubaa8\ub378\uc774 \uc791\uc544\ub3c4 \uad49\uc7a5\ud788 \ube60\ub974\uac8c \uc218\ub834\ud55c\ub2e4\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uc74c(see right)<\/li>\n  <li>\uacb0\ub860\uc801\uc73c\ub85c ELECTRA\uac00 BERT\ubcf4\ub2e4 parameter-efficient \ud558\ub2e4\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"4-related-work\">4. Related work<\/h2>\n\n<ul>\n  <li>pass<\/li>\n<\/ul>\n\n<h2 id=\"5-conclusion\">5. Conclusion<\/h2>\n\n<ul>\n  <li>Language representation learning\uc744 \uc704\ud55c \uc0c8\ub85c\uc6b4 self-supervised task\uc778 replaced token detection\uc744 \uc81c\uc548\ud568<\/li>\n  <li>\uc774 \ubc29\ubc95\uc758 key idea\ub294 small generator network\uac00 \ub9cc\ub4e4\uc5b4 \ub0b8 high-quality negative sample\uc640 input token\uc744 \uad6c\ubcc4\ud558\ub3c4\ub85d text encoder\ub97c \ud559\uc2b5\uc2dc\ud0a4\ub294\ub370 \uc788\uc74c<\/li>\n  <li>Masked language modeling\uc5d0 \ube44\ud574, \uc81c\uc548\ud558\ub294 pre-training objective\ub294 \ud6e8\uc52c compute-efficient\ud558\uace0 downstream tasks\uc5d0 \ub300\ud55c \uacb0\uacfc \uc5ed\uc2dc \ub354 \uc88b\uc74c<\/li>\n  <li>\uc0c1\ub300\uc801\uc73c\ub85c \uc801\uc740 compute\ub97c \uc0ac\uc6a9\ud560 \ub54c \ub354 \ud6a8\uacfc\uc801\uc774\uba70, \uc774 \uc5f0\uad6c\ub97c \ud1b5\ud574\uc11c \uc5f0\uad6c\uc790\ub4e4\uc774 \uc801\uc740 computing resource\ub85c\ub3c4 pre-trained text encoder\uc5d0 \ub300\ud55c \ub9ce\uc740 \uc5f0\uad6c\/\uac1c\ubc1c\uc744 \ud560 \uc218 \uc788\uac8c \ub418\uae38 \ubc14\ub78c<\/li>\n  <li>Pre-training\uc5d0 \ub300\ud55c future work\ub4e4\uc774 absolute performance\ub9cc\ud07c compute usage\uc640 parameter counts\ub4f1\uc758 efficiency\ub97c \uace0\ub824\ud588\uc73c\uba74 \ud558\ub294 \ubc14\ub78c\uc774 \uc788\uc74c<\/li>\n<\/ul>\n","pubDate":"Mon, 13 Apr 2020 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/electra\/","guid":"https:\/\/roomylee.github.io\/electra\/","category":["electra","pre-trained-language-model","pre-training","masked-language-modeling","replaced-token-detection","generator-discriminator","blog"]},{"title":"Adaptive Input Representations for Neural Language Modeling (ICLR 2019)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1809.10853\">https:\/\/arxiv.org\/abs\/1809.10853<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Alexei Baevski &amp; Michael Auli<\/li>\n      <li>Facebook AI Research<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>ICLR 2019<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Language modeling\uc744 \uc704\ud55c adaptive input representations\uc744 \uc81c\uc548<\/li>\n  <li>Grave et al. (2017)\uc774 \uc81c\uc548\ud55c adaptive softmax\ub97c input representation\uc73c\ub85c \ud655\uc7a5\ud568<\/li>\n  <li>Input\uacfc output layer\ub97c \uc5b4\ub5bb\uac8c factorize\ud560\uc9c0, \uadf8\ub9ac\uace0 words, characters, sub-word units\uc744 \uc5b4\ub5bb\uac8c \ubaa8\ub378\ub9c1\ud560\uc9c0\uc5d0 \ub300\ud55c \uba87\uac00\uc9c0 \uc120\ud0dd\uc9c0\uac00 \uc788\ub294\ub370, \uc774\uc911 self-attentional architecture\uc5d0\uc11c \ub9ce\uc774 \uc4f0\ub294 \ubc29\ubc95\ub4e4\uc744 \ube44\uad50\ud574\ubd04<\/li>\n  <li>\uc2e4\ud5d8\uc744 \ud1b5\ud574 adaptive embedding\uc774 \ub9ce\uc774\ub4e4 \uc4f0\ub294 character input CNN \ubcf4\ub2e4 \ub354 \uc801\uc740 parameter\ub97c \uc0ac\uc6a9\ud558\uba74\uc11c\ub3c4 2\ubc30 \uc774\uc0c1 \ube60\ub974\ub2e4\ub294 \uacb0\uacfc\ub97c \uc5bb\uc5c8\uc74c<\/li>\n  <li>WikiText-103 benchmark\uc5d0\uc11c \uae30\uc874 SOTA\ub97c 10 \uc774\uc0c1 \uac1c\uc120\ud55c 18.7\uc758 perplexity\ub97c \uc5bb\uc5c8\uace0, Billion Word menchmark\uc5d0\uc11c\ub294 23.02\uc758 perplexity\ub97c \uae30\ub85d\ud568<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>\ub17c\ubb38\uc5d0\uc11c \uc81c\uc548\ud558\ub294 <em>adaptive input embeddings (representations)<\/em> \ub294 adaptive softmax (Grave et al., 2017)\uc744 input word representations\ub85c \ud655\uc7a5\ud55c \uac83\uc784\n    <ul>\n      <li>Adaptive softmax (Grave et al., 2017): output word embeddings\uc5d0\uc11c \ube48\ub3c4\uc218\uac00 \ub192\uc740 \ub2e8\uc5b4\uc5d0 \ub354 \ub9ce\uc740 parameter \ud560\ub2f9\ud558\uace0 \uc801\uc73c\uba74(\ud76c\uadc0\ud558\uba74) \ub35c \ud560\ub2f9\ud558\ub294 variable capacity scheme\uc744 \uc81c\uc548\ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uac04\ub2e8\ud788 \ube48\ub3c4\uc218\uac00 \ub192\uc740 \ub2e8\uc5b4\uc5d0 \ub354 \ub9ce\uc740 capacity\ub97c \ubd80\uc5ec\ud558\uace0 \ub0ae\uc740 \ub2e8\uc5b4\ub294 capacity\ub97c \uc904\uc774\ub294 \uac83\uc774\uace0, \uc774\ub97c \ud1b5\ud574 \ud76c\uadc0\ud55c \ub2e8\uc5b4\uc5d0 \ub300\ud55c overfitting\uc744 \uc644\ud654\ud560 \uc218\ub3c4 \uc788\uc74c<\/li>\n  <li>Adaptive input embeddings\uc744 input\uacfc output layer\uc5d0 \uc801\uc6a9\ud588\uc744 \ub54c parameter \uc218\ub97c 23% \uac10\uc18c\uc2dc\ud0ac \uc218 \uc788\uc5c8\uace0 fixed size embeddings\uc5d0 \ub300\ud574\uc11c \ub354 \ub192\uc740 accuracy\ub97c \uc5bb\uc74c<\/li>\n  <li>\uac8c\ub2e4\uac00 Adaptive input representations\uc744 output\uc758 adaptive softmax\uc640 weight tying\uc744 \ud558\uba74  61%\ub098 parameter\ub97c \uac10\uc18c\uc2dc\ud0ac \uc218 \uc788\uc74c<\/li>\n  <li><a href=\"#abstract\">Abstract<\/a>\uc5d0 \uc5b8\uae09\ud55c \uac83\ucc98\ub7fc \ub9ce\uc740 \uc131\ub2a5 \ud5a5\uc0c1\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>pass<\/li>\n<\/ul>\n\n<h2 id=\"3-adaptive-input-representations\">3. Adaptive Input Representations<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>\uba3c\uc800 a number of clusters that partitions the frequency ordered vocabulary $\\mathcal{V}$\uc744 \uc815\uc758\ud568\n    <ul>\n      <li>$\\mathcal{V} = \\mathcal{V_1} \\cup \\mathcal{V_2}, \u2026, \\mathcal{V_{n-1}} \\cup \\mathcal{V_n}$ such that $\\mathcal{V_i} \\cap \\mathcal{V_j} = \\emptyset$ for $\\forall i, j$, and $i \\neq j$<\/li>\n      <li>$\\mathcal{V_1}$\uc740 most frequent words\uc758 \uc9d1\ud569\uc774\uace0, $\\mathcal{V_n}$\uc740 least frequent words\uc758 \uc9d1\ud569<\/li>\n    <\/ul>\n  <\/li>\n  <li>Capacity\ub97c \uc904\uc774\uae30 \uc704\ud574 \uac01 cluster\uc5d0 \ub300\ud574\uc11c factor $k$\ub97c \uc774\uc6a9\ud574\uc11c embedding dimension\uc744 \uc904\uc784\n    <ul>\n      <li>$\\mathcal{V_1}$\uc758 embedding dimension\uc740 $d$, $\\mathcal{V_n}$\uc758 dimension\uc740 $\\frac{d}{k^{n-1}}$<\/li>\n      <li>\ubcf4\ud1b5 $k$\ub294 Grave et al. (2017)\uc744 \ub530\ub77c 4\ub97c \uc500<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uadf8 \ub2e4\uc74c, linear projections $W_1 \\in \\mathbb{R}^{d \\times d}, \u2026, W_n \\in \\mathbb{R}^{d\/k^{n-1} \\times d}$\ub97c \uc0ac\uc6a9\ud558\uc5ec \uac01 cluster\uc758 embeddings\uc744 \ub3d9\uc77c\ud55c $d$ dimension\uc73c\ub85c \ub9e4\ud551\ud558\uc5ec concat\ud568 (\uc774\ud6c4 \ubaa8\ub378 input\uc5d0\uc11c \ud3b8\ud558\uac8c \uc0ac\uc6a9\ud558\uae30 \uc704\ud568)\n    <ul>\n      <li>Figure 1\uc774 \ud574\ub2f9 \uacfc\uc815\uc744 \uc798 \ud45c\ud604\ud558\uace0 \uc788\uc73c\uba70, \uc774\ubbf8 $d$\ucc28\uc6d0\uc778 $\\mathcal{V_1}$\ub3c4 \ub9c8\ucc2c\uac00\uc9c0\ub85c \uac19\uc740 $d$ \ucc28\uc6d0\uc5d0 projection\uc744 \ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc815\ub9ac\ud558\uc790\uba74, input words\uc5d0 \ub300\ud574\uc11c 1) \uac01 \ub2e8\uc5b4\ub97c \ube48\ub3c4\uc5d0 \ub530\ub77c cluster\uc5d0 \ub9de\uac8c partitioning\uc744 \ud558\uace0, 2) \ud574\ub2f9 cluster\uc758 embedding lookup\uc744 \ud55c \ub4a4, 3) \ubaa8\ub450 \ub3d9\uc77c\ud558\uac8c $d$ \ucc28\uc6d0\uc73c\ub85c projection\ud558\uace0 \uc6d0\ub798\uc758 \uc21c\uc11c\uc5d0 \ub9de\uac8c concat\ud568<\/li>\n<\/ul>\n\n<h4 id=\"weight-sharing\">Weight sharing<\/h4>\n\n<ul>\n  <li>Output layer\uc5d0 adaptive softmax\ub97c \uc801\uc6a9\ud558\uc600\uace0, $\\mathcal{V}$\uc758 partitions, $d$, $k$\uc774 \ub3d9\uc77c\ud558\ub2e4\uba74 weight tying (Inan et al., 2016; Press &amp; Wolf, 2017)\uc744 \ud560 \uc218 \uc788\uc74c<\/li>\n  <li>Tying\uae4c\uc9c0\ud558\uba74 parameter \uc218\ub294 \ub354\uc6b1 \uac10\uc18c\ud558\uace0 \uc131\ub2a5\ub3c4 \ud5a5\uc0c1\ub428<\/li>\n  <li>WikiText-103\uc5d0\uc11c\ub294 embedding, projection \ub458 \ub2e4 \uacf5\uc720\ud560 \ub54c \uac00\uc7a5 \uc88b\uc558\uace0, Billion Word\ub294 embedding\ub9cc \uacf5\uc720\ud560 \ub54c \uac00\uc7a5 \uc88b\uc558\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"4-experiments-setup\">4. Experiments Setup<\/h2>\n\n<h3 id=\"41-model\">4.1. Model<\/h3>\n\n<ul>\n  <li>Transformer (Vaswani et al., 2017) decoder\ub97c \uc57d\uac04 \ubcc0\ud615\ud574\uc11c \uc0ac\uc6a9<\/li>\n  <li>Block 16\uac1c\ub97c \uc313\uc558\uc73c\uba70, layer normalization\uc744 self-attention\uacfc FFNN \uc804\uc5d0 \ubd99\uc600\uc74c (\uc6d0\ub798 \ud574\ub2f9 layer \ub2e4\uc74c\uc5d0 \ubd99\uc74c)<\/li>\n<\/ul>\n\n<h3 id=\"42-datasets\">4.2. Datasets<\/h3>\n\n<ul>\n  <li>WikiText-103: 100M tokens and \uc57d 260K vocab<\/li>\n  <li>Billion Word: 768M tokens and \uc57d 800K vocab<\/li>\n<\/ul>\n\n<h3 id=\"43-batching\">4.3. Batching<\/h3>\n\n<ul>\n  <li>pass<\/li>\n<\/ul>\n\n<h3 id=\"44-input-and-output-layer-hyperparameters\">4.4. Input and Output Layer Hyperparameters<\/h3>\n\n<h4 id=\"embedding-size\">Embedding size<\/h4>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/figure7.png\" alt=\"figure7\" \/><\/p>\n\n<ul>\n  <li>Fixed size word input layers and softmax output layers\uc758 embedding size\ub294 \ubcf4\ud1b5 512\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>Fixed input and adaptive softmax\ub294 Billion Word\uc5d0\uc11c\ub294 256, WikiText-103\uc5d0\uc11c\ub294 64\uc758 embedding size\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc774\uc5d0 \ub300\ud55c \uc2e4\ud5d8\uc740 Figure 7\uc744 \ucc38\uace0<\/li>\n<\/ul>\n\n<h4 id=\"character-cnn\">Character CNN<\/h4>\n\n<ul>\n  <li>Kim et al., (2015)\uc758 character input modeling \ubc29\ubc95\uc744 \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h4 id=\"adaptive-input-representations-and-adaptive-softmax\">Adaptive input representations and adaptive softmax<\/h4>\n\n<ul>\n  <li>Adaptive word inputs and adaptive softmax\ub97c \uc704\ud574\uc11c embedding size $d=1024$, reducing factor of $k=4$\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>Cluster\ub294 3\uac1c\ub85c \ud558\uc600\uace0, \ub530\ub77c\uc11c $d=1024$, $d=256$, $d=64$\uc758 \ucc28\uc6d0\uc744 \uac01 cluster\uac00 \uc0ac\uc6a9\ud568\n    <ul>\n      <li>WikiText-103\uc740 cluster $\\mathcal{V}_1, \\mathcal{V}_2, \\mathcal{V}_3$\uc744 20K, 40K, 200K\uc758 vocab size\ub85c \uc138\ud305\ud568<\/li>\n      <li>Billion Word\ub294 60K, 100K, 640K\ub85c \uc138\ud305\ud568<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h4 id=\"sub-word-models\">Sub-word models<\/h4>\n\n<ul>\n  <li>BPE\ub294 \uac01 \ub370\uc774\ud130\uc14b\uc5d0 \ub300\ud574\uc11c vocab size\ub97c 32K\ub85c \ud559\uc2b5\uc744 \uc2dc\ud0b4<\/li>\n  <li>Input\/output\uc758 embedding size\ub294 1024\ub85c \ud558\uc600\uace0, \ub2e8\uc5b4\uc758 \ud655\ub960\uc740 sub-word units\uc758 \uacf1\uc73c\ub85c \ucc98\ub9ac\ud558\uc600\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"optimization\">Optimization<\/h4>\n\n<ul>\n  <li>Nesterov\u2019s accelerated gradient method (Sutskever et al., 2013)\uc744 \uc0ac\uc6a9\ud568<\/li>\n  <li>WikiText-103\uc740 286K step \ud559\uc2b5<\/li>\n  <li>Billion Word\ub294 975K step \ud559\uc2b5<\/li>\n<\/ul>\n\n<h2 id=\"5-experiments\">5. Experiments<\/h2>\n\n<h3 id=\"51-main-results\">5.1. Main Results<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/table1.png\" alt=\"table1\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/table2.png\" alt=\"table2\" \/><\/p>\n\n<h3 id=\"52-comparison-of-input-and-output-layer-factorization\">5.2. Comparison of Input and Output Layer Factorization<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/table3.png\" alt=\"table3\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/table2.png\" alt=\"table4\" \/><\/p>\n\n<h3 id=\"53-analysis\">5.3. Analysis<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/figure2.png\" alt=\"figure2\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/figure3.png\" alt=\"figure3\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/figure4.png\" alt=\"figure4\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/figure5.png\" alt=\"figure5\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/table5.png\" alt=\"table5\" \/><\/p>\n\n<h3 id=\"54-adaptive-softmax-vs-full-softmax\">5.4. Adaptive Softmax vs. Full Softmax<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-09-adaptive-input-representations\/table6.png\" alt=\"table6\" \/><\/p>\n\n<h2 id=\"6-conclusion\">6. Conclusion<\/h2>\n\n<ul>\n  <li>Adaptive input embeddings\uc740 input word\uc758 embedding size\ub97c \ub2e4\ub974\uac8c \uac00\uc838\uac00\uba74\uc11c \uc131\ub2a5\uc744 \ud5a5\uc0c1\uc2dc\ucf30\uace0 \ubaa8\ub378 parameter\uc758 \uc218\ub294 \ud06c\uac8c \uac10\uc18c\uc2dc\ucf30\uc74c<\/li>\n  <li>Adaptive softmax(output)\uacfc tying\uc744 \ud1b5\ud574 parameter sharing\uc744 \ud558\uba74 \ub354\uc6b1 parameter \uc218\ub97c \uac10\uc18c\uc2dc\ud0a4\uba74 \uc131\ub2a5\ub3c4 \uc62c\ub9ac\uace0 \ud559\uc2b5 \uc18d\ub3c4\ub97c \ube60\ub974\uac8c \ud560 \uc218 \uc788\uc74c<\/li>\n  <li>\ub2e4\ub978 input and output layer factorization \ubc29\ubc95\ub4e4\uc5d0 \ub300\ud574 \ube44\uad50 \uc2e4\ud5d8\uc744 \uc9c4\ud589\ud558\uc600\uace0 adaptive input representation\uc774 \uac00\uc7a5 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc600\uc74c<\/li>\n  <li>Future work\uc73c\ub85c \ub2e4\ub978 \ud0dc\uc2a4\ud06c\uc5d0 \uc774 \ubc29\ubc95\uc744 \uc801\uc6a9\ud574\ubcfc \uac83\uc784<\/li>\n<\/ul>\n","pubDate":"Thu, 09 Apr 2020 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/adaptive-input-representations\/","guid":"https:\/\/roomylee.github.io\/adaptive-input-representations\/","category":["adaptive-input-representation","language-modeling","blog"]},{"title":"Generalization through Memorization: Nearest Neighbor Language Models (ICLR 2020)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1911.00172\">https:\/\/arxiv.org\/abs\/1911.00172<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis<\/li>\n      <li>Stanford University &amp; Facebook<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>ICLR 2020<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>$k$-nearest neighbors($k$NN) \ubaa8\ub378\uc744 \ud1b5\ud55c pre-trained neural language model(LM)\uc778 $k$NN-LMs\ub97c \uc81c\uc548\ud568<\/li>\n  <li>Nearest neighbors\ub294 LM \ud559\uc2b5\ub370\uc774\ud130\uc5d0 \uc788\ub294 text \ub370\uc774\ud130\uc5d0 \ub300\ud574 pre-trained LM embedding space \uc0c1\uc758 \uac70\ub9ac\uc5d0 \ub530\ub77c \uacc4\uc0b0\ub428<\/li>\n  <li>\uc774 $k$NN \uae30\ubc18\uc758 augmentation \uae30\ubc95\uc744 WikiText-103 LM\uc744 \uc801\uc6a9\ud558\uc600\uace0 state-of-the-art perplexity of 15.79\ub97c \uae30\ub85d\ud558\uc600\ub2e4. \uc774\ub294 2.9 point \ud5a5\uc0c1\uc774\uba70 \ucd94\uac00 \ud559\uc2b5\uc740 \ud558\uc9c0 \uc54a\uc74c<\/li>\n  <li>Large training set\uc5d0 \ub300\ud574\uc11c\ub3c4 \ud6a8\uc728\uc801\uc73c\ub85c scaling up\uc774 \ub418\uace0, domain adaptation\ub3c4 \ud6a8\uacfc\uc801\uc73c\ub85c \uc774\ub8e8\uc5b4\uc9d0. \uc5ed\uc2dc \ucd94\uac00 \ud559\uc2b5\uc740 \ud544\uc694 \uc5c6\uc74c<\/li>\n  <li>\uc815\uc131\uc801\uc73c\ub85c \ubcf4\uc558\uc744 \ub54c, factual knowledge \uac19\uc740 \ud76c\uadc0\ud55c \ud328\ud134\uc744 \uc608\uce21\ud558\ub294\ub370 \uc0c1\ub2f9\ud788 \ub3c4\uc6c0\uc774 \ub428<\/li>\n  <li>\uc774\uc640 \ud568\uaed8, \uc774\ub7f0 \uc2e4\ud5d8 \uacb0\uacfc\ub4e4\uc744 \ud1b5\ud574 \ub2e4\uc74c \ub2e8\uc5b4\ub97c \uc608\uce21\ud558\ub294 \uac83\ubcf4\ub2e4 sequence of text\uc5d0 \ub300\ud55c similarity\ub97c \ud559\uc2b5\ud558\ub294 \uac8c \ub354 \uc27d\uace0, nearest neighbor search\uac00 long tail \ud328\ud134\uc5d0 \ub300\ud55c Language modeling\uc5d0 \ud6a8\uacfc\uc801\uc774\ub77c\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>LM\ub294 \uc804\ud615\uc801\uc73c\ub85c \ub2e4\uc74c\uc758 \ub450\uac00\uc9c0 subproblems\uc744 \ud488\n    <ol>\n      <li>sentence prefix\ub97c fixed-sized representation\uc5d0 \ub9e4\ud551\uc2dc\ud0a4\ub294 \ubb38\uc81c<\/li>\n      <li>\uc774 representation\uc744 \uc0ac\uc6a9\ud574\uc11c \ub2e4\uc74c \ub2e8\uc5b4\ub97c \uc608\uce21\ud558\ub294 \ubb38\uc81c<\/li>\n    <\/ol>\n  <\/li>\n  <li>\u201cRepresentation learning \ubb38\uc81c\uac00 \ub2e4\uc74c \ub2e8\uc5b4 \uc608\uce21 \ubb38\uc81c\ubcf4\ub2e4 \ub354 \uc27d\ub2e4\u201d\ub294 \uac00\uc815 \ud558\uc5d0 \uc0c8\ub85c\uc6b4 language modeling approach\ub97c \uc81c\uc548\ud568<\/li>\n  <li>\uae30\uc874 LM\uc758 prefix embedding\uc744 \uc0ac\uc6a9\ud574\uc11c LM\uc774 \uccab\ubc88\uc9f8 \ubb38\uc81c\ub97c \ub354 \uc798\ud55c\ub2e4\ub294 \uac15\ub825\ud55c \uc99d\uac70\ub97c \uc81c\uc2dc\ud568<\/li>\n  <li>(\uc774\ud558 Abatract\uc758 \ub0b4\uc6a9\uacfc \uac19\uc74c)<\/li>\n<\/ul>\n\n<h2 id=\"2-nearest-neighbor-language-modeling\">2. Nearest Neighbor Language modeling<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>\n    <table>\n      <tbody>\n        <tr>\n          <td>LM\uc740 \uae30\ubcf8\uc801\uc73c\ub85c \uc8fc\uc5b4\uc9c4 <em>context<\/em> sequence of tokens $c_t = (w_1, \u2026, w_{t-1})$\uc5d0 \ub300\ud574\uc11c \ub2e4\uc74c\uc5d0 \uc62c <em>target<\/em> token\uc778 $w_t$\uc5d0 \ub300\ud55c \ubd84\ud3ec $p(w_t<\/td>\n          <td>c_t)$\ub97c \uc608\uce21\ud568<\/td>\n        <\/tr>\n      <\/tbody>\n    <\/table>\n  <\/li>\n  <li>\uac04\ub2e8\ud788 $k$NN-LM\uc740 pre-trained LM\uc5d0 nearest neighbors retrieval mechanism\uc744 \ub354\ud55c \ud615\ud0dc\ub85c\uc11c \ucd94\uac00\uc801\uc778 \ud559\uc2b5\uc740 \ud558\uc9c0 \uc54a\uc74c<\/li>\n  <li>*context<strong>-<em>target<\/em> \uc30d\uc740 **inference \uc2dc\uc5d0 \uc0ac\uc6a9<\/strong>\ud560 key-value datastore\uc5d0 \uc800\uc7a5\ud568 (Figure 1 \ucc38\uace0)<\/li>\n<\/ul>\n\n<h4 id=\"datastore\">Datastore<\/h4>\n\n<ul>\n  <li>$f(\\cdot)$\ub294 \uc5b4\ub5a4 context $c$\ub97c pre-trained LM\uc744 \uc774\uc6a9\ud574\uc11c fixed-length vector representation\uc73c\ub85c \ub9e4\ud551\ud574\uc8fc\ub294 \ud568\uc218\uc784<\/li>\n  <li>$i$-th training example $(c_i, w_i) \\in \\mathcal{D}$\uc5d0 \ub300\ud574\uc11c key-value \uc30d\uc744 $(k_i, v_i)$ \ub9cc\ub4ec. \uc5ec\uae30\uc11c $k_i$\ub294 context representation $f(c_i)$\uac00 \ub418\uace0, $v_i$\ub294 target word $w_i$\uac00 \ub428<\/li>\n  <li>\ucd5c\uc885\uc801\uc73c\ub85c datastore $(\\mathcal{K}, \\mathcal{V})$\ub294 $\\mathcal{D}$\uc5d0 \uc788\ub294 \ubaa8\ub4e0 training examples\ub85c \ub9cc\ub4e0 \ubaa8\ub4e0 key-value \uc30d\uc774 \ub418\uace0, \ub2e4\uc74c\uacfc \uac19\uc740 \uc218\uc2dd\uc73c\ub85c \ub098\ud0c0\ub0bc \uc218 \uc788\uc74c:<\/li>\n<\/ul>\n\n\\[(\\mathcal{K}, \\mathcal{V}) = \\{(f(c_i), w_i) | (c_i, w_i) \\in \\mathcal{D}\\}\\]\n\n<h4 id=\"inference\">Inference<\/h4>\n\n<ul>\n  <li>\n    <table>\n      <tbody>\n        <tr>\n          <td>\uc77c\ub2e8 \uc8fc\uc5b4\uc9c4 input context $x$\uc5d0 \ub300\ud574\uc11c \ubaa8\ub378\uc740 \ub2e4\uc74c \ub2e8\uc5b4\uc5d0 \ub300\ud55c distribution $p_{LM}(y<\/td>\n          <td>x)$\uc640 context representation $f(x)$\ub97c \ub9cc\ub4e4\uc5b4\ub0c4<\/td>\n        <\/tr>\n      <\/tbody>\n    <\/table>\n  <\/li>\n  <li>\uc5ec\uae30\uc11c \ubaa8\ub378\uc740 datastore\uc5d0 $f(x)$\uc5d0 \ub300\ud55c $k$-nearest neighbors $\\mathcal{N}$\uc744 \uac80\uc0c9 \ucffc\ub9ac\ub85c \ub358\uc9d0 (\uac80\uc0c9\uc740 distance function $d(\\cdot, \\cdot)$\uc744 \ub530\ub974\uba70 \ub17c\ubb38\uc758 \uc2e4\ud5d8\uc5d0\uc11c\ub294 $L^2$\ub97c \uc0ac\uc6a9)<\/li>\n  <li>\uadf8\ub807\uac8c \uc5bb\uc5b4\ub0b8 nearest neighbor $\\mathcal{N}$\uc5d0 \ub300\ud574\uc11c negative distance\uc5d0 \ub300\ud55c softmax\ub97c \uae30\ubc18\uc73c\ub85c neighbor\uc5d0 \ub300\ud55c distribution\uc744 \uad6c\ud568. \uac80\uc0c9\ub41c target\uc774 \uc911\ubcf5\ub420 \uacbd\uc6b0 \uc774\ub4e4\uc758 softmax probability\uc744 aggregating\ud568<\/li>\n<\/ul>\n\n\\[p_{\\text{kNN}}(y|x) \\propto \\sum_{(k_i, v_i)\\in\\mathcal{N}} \\mathbb{1}_{y=v_i} \\exp{(-d(k_i, f(x)))}\\]\n\n<ul>\n  <li>\ucd5c\uc885\uc801\uc73c\ub85c \uae30\uc874 $p_{\\text{LM}}$\uacfc $p_{\\text{kNN}}$ distribution\uc744 tuned parameter $\\lambda$\ub97c \uc0ac\uc6a9\ud574 interpolation\ud558\uc5ec \ub2e4\uc74c\uc758 \ucd5c\uc885 $k$NN-LM\uc758 distribution\uc744 \ub9cc\ub4e4\uc5b4\ub0c4:<\/li>\n<\/ul>\n\n\\[p(y|x) = \\lambda \\; p_{\\text{kNN}}(y|x) + (1-\\lambda) \\; p_{\\text{LM}}(y|x)\\]\n\n<h4 id=\"implementation\">Implementation<\/h4>\n\n<ul>\n  <li>Datastore\uc740 billions \ub2e8\uc704\uc758 examples\uc744 \ub2e4\ub904\uc57c \ud568<\/li>\n  <li>\uc774\ub7f0 \ub300\uc6a9\ub7c9\uc758 datastore\uc5d0 \ub300\ud55c \uac80\uc0c9\uc744 \uc704\ud574 FAISS (Johnson et al., 2017)\uc744 \uc0ac\uc6a9\ud568<\/li>\n  <li>FAISS\ub294 high-dimensional vector space\uc5d0 \ub300\ud574 \ube60\ub974\uace0 memory-efficient \ud558\uac8c nearest neighbor\ub97c \uac80\uc0c9\ud560 \uc218 \uc788\ub294 \ub77c\uc774\ube0c\ub7ec\ub9ac\uc784<\/li>\n  <li>\ub4a4\uc5d0 \uc2e4\ud5d8 \ud30c\ud2b8\uc5d0\uc11c \ub098\uc624\uc9c0\ub9cc distance metric\uc73c\ub85c $L^2$\ub97c \uc37c\uc744 \ub54c\uac00 inner product\ub97c \uc37c\uc744 \ub54c\ubcf4\ub2e4 \ub354 \uc88b\uc558\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"3-experimental-setup\">3. Experimental Setup<\/h2>\n\n<h4 id=\"data\">Data<\/h4>\n\n<ul>\n  <li><strong>WikiText-103<\/strong>: standard benchmark by Merity et al. (2017) for autoregressive language modeling. training 103M tokens, devset &amp; testset 250K, respectively<\/li>\n  <li><strong>Books<\/strong>: Toronto Books Cropus (Zhu et al., 2015). 0.7B tokens<\/li>\n  <li><strong>Wiki-3B<\/strong>: English Wikipedia. 2.87B tokens<\/li>\n  <li><strong>Wiki-100M<\/strong>: random 100M token subset of <strong>Wiki-3B<\/strong><\/li>\n  <li>WikiText-103\uc744 \uc81c\uc678\ud558\uace0\ub294 \ubaa8\ub450 BPE\ub97c \uc0ac\uc6a9<\/li>\n<\/ul>\n\n<h4 id=\"model-architecture\">Model Architecture<\/h4>\n\n<ul>\n  <li>LM\uc73c\ub85c Transformer (Vaswani et al., 2017) Decoder\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>Baevski &amp; Auli (2019)\uc5d0\uc11c \uc124\uba85\ud55c \uc815\ud655\ud55c \uad6c\uc870\uc640 optimization\uc744 \ub530\ub790\uc74c\n    <ul>\n      <li>\uad6c\uc870: Transformer Decoder + sinusoidal position embedding and stack 16 block (\uc790\uc138\ud55c \ub0b4\uc6a9\uc740 \ud574\ub2f9 \ub17c\ubb38\uc758 Section 4.1 \ucc38\uace0)<\/li>\n      <li>Optimization: Nesterov\u2019s accelerated gradient method (Sutskever et al., 2013)\uc744 \uc0ac\uc6a9 (\uc790\uc138\ud55c \ub0b4\uc6a9\uc740 \ud574\ub2f9 \ub17c\ubb38\uc758 Section 4.5\ub97c \ucc38\uace0)<\/li>\n    <\/ul>\n  <\/li>\n  <li>This model consists of 16 layers, each with 16 self-attention heads, 1024 dimensional hidden states, and 4096 demensional feedforward layers, amounting to 247M trainalble parameters.<\/li>\n  <li>Adaptive inputs (Baevski &amp; Auli, 2019)\uacfc adaptive softmax (Grave et al., 2017b) with tied weights (Press &amp; Wolf, 2017)\uc744 \uc801\uc6a9\ud558\uc600\uc74c\n    <ul>\n      <li>WikiText-103\uc5d0\ub9cc \uc801\uc6a9\ud558\uace0 \ub098\uba38\uc9c0 \ub370\uc774\ud130\uc14b\uc5d0\ub294 \uc801\uc6a9\ud558\uc9c0 \uc54a\uc74c<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h4 id=\"evaluation\">Evaluation<\/h4>\n\n<ul>\n  <li>Trained to minimize the negative log-likelihood of the training corpus<\/li>\n  <li>Evaluated by perplexity (exponentiated negative log-likelihood) on held out data<\/li>\n  <li>\ucd5c\ub300 2560 tokens (in WikiText-103) \uc744 extra prior context\ub85c \uc8fc\uc5b4\uc9c0\uace0 \uac01 test example \ubcc4\ub85c 512 tokens\uc5d0 \ub300\ud574\uc11c ppl scoring\uc744 \ud568. \ub2e4\ub978 \ub370\uc774\ud130\uc14b\uc5d0 \ub300\ud574\uc11c\ub294 512 tokens\uc744 extra prior context\ub85c \uc90c<\/li>\n<\/ul>\n\n<h4 id=\"knn-lm\">$k$NN-LM<\/h4>\n\n<ul>\n  <li>Sentence prefix\uc758 representation\uc774\uc790 $k$NN-LM\uc758 datastore\uc5d0 \uc0ac\uc6a9\ud560 key\ub294 1024-dimensional representation\uc774\uba70, Transformer LM\uc758 final layer\uc5d0\uc11c feed forward network\uc5d0 \ub4e4\uc5b4\uac00\uae30 \uc804 hidden vector\ub97c \uc0ac\uc6a9\ud568 (after self-attention and layernorm; Section 5\uc5d0\uc11c \uc790\uc138\ud788 \uc124\uba85)<\/li>\n  <li>\ud559\uc2b5\ub41c LM\ub85c training set\uc5d0 \ub300\ud574 single forwarding \ud558\uace0 \uc774\ub97c datastore\uc758 key(vector), value(next word)\ub85c\uc11c \ud65c\uc6a9\ud568<\/li>\n  <li>Forwarding\ud560 \ub54c, \uac01 target token\uc5d0 \ub300\ud574 WikiText-103\ub294 \ucd5c\uc18c 1536 tokens\uc774 prior context\ub97c \uc81c\uacf5\ud558\uc600\uace0, \uc774\uc678\uc758 \ub370\uc774\ud130\uc14b\uc740 512 tokens\uc744 \uc81c\uacf5\ud568<\/li>\n  <li>FAISS\n    <ul>\n      <li>Index\ub294 1M \uac1c\uc758 randomly sampled key\ub97c \uc0ac\uc6a9\ud558\uc5ec 4096\uac1c\uc758 cluster centroid\ub97c \ud559\uc2b5\uc2dc\ucf1c\uc11c \ub9cc\ub4ec<\/li>\n      <li>\ud6a8\uc728\uc744 \uc704\ud574 key(vector)\ub294 64-bytes\ub85c quantization\ud568. \ub2e4\ub9cc WikiText-103\uc740 full precision\uc744 \uc0ac\uc6a9<\/li>\n      <li>Inference \uc2dc\uc5d0 $k=1024$ neighbors\ub97c \uac80\uc0c9\ud558\uace0 \ucd5c\ub300 32\uac1c\uc758 cluster centroid\ub9cc\uc744 \ubcf4\ub3c4\ub85d \uc81c\ud55c\ud568<\/li>\n      <li>Distance metric\uc740 $L^2$\ub97c \uc0ac\uc6a9\ud568<\/li>\n      <li>Interpolation parameter $\\lambda$\ub294 validation set\uc73c\ub85c tuning\ud568<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h4 id=\"computational-cost\">Computational Cost<\/h4>\n\n<ul>\n  <li>\uc544\ubb34\ub9ac $k$NN-LM\uc774 \ucd94\uac00 \ud559\uc2b5 \uc5c6\uc774 \uae30\uc874 LM\uc744 \uc4f8 \uc218 \uc788\ub2e4\uace0\ub294 \ud558\uc9c0\ub9cc \uc77c\ub2e8 key, value\ub97c \uc800\uc7a5\ud558\uae30 \uc704\ud574 \ud559\uc2b5 \ub370\uc774\ud130\ub97c 1\ubc88 forwarding \ud574\uc57c \ud558\uace0, \uc774\ub294 \uc57d 1 epoch \uc815\ub3c4 \ud559\uc2b5\ud558\ub294 \ube44\uc6a9\uc774 \ub4ec<\/li>\n  <li>\uadf8\ub9ac\uace0 \uc800\uc7a5\ud574\ub454 key\ub97c FAISS\uc5d0 \uc62c\ub9ac\ub294\ub370(build) single CPU \uae30\uc900\uc73c\ub85c \ub300\ub7b5 2\uc2dc\uac04 \uc815\ub3c4 \uac78\ub9bc<\/li>\n  <li>\ucd5c\uc885\uc801\uc73c\ub85c validation set\uc5d0 \ub300\ud574 1024\uac1c\uc758 nearest key\ub97c \uac80\uc0c9\ud558\ub294\ub370 \uc57d 25\ubd84 \uac78\ub9bc<\/li>\n  <li>Large cache\ub97c build \ud558\ub294 \ube44\uc6a9\uc740 \uc0d8\ud50c \uc218\uc5d0 \ub530\ub77c\uc11c \uc120\ud615\uc801\uc73c\ub85c \uc99d\uac00\ud558\uc9c0\ub9cc, \ubcd1\ub82c\ucc98\ub9ac\uac00 \uc27d\uace0 GPU \uae30\ubc18\uc758 \ud559\uc2b5\uc774 \ud544\uc694 \uc5c6\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"4-experiments\">4. Experiments<\/h2>\n\n<h3 id=\"41-using-the-training-data-as-the-datastore\">4.1. Using the Training Data as the Datastore<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/table1.png\" alt=\"table1\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/table2.png\" alt=\"table2\" \/><\/p>\n\n<ul>\n  <li>\uccab \ubc88\uc9f8 \uc2e4\ud5d8\uc73c\ub85c LM\uc744 \ud559\uc2b5\uc2dc\ud0a8 \ub370\uc774\ud130\ub97c \uadf8\ub300\ub85c \ud65c\uc6a9\ud574\uc11c datastore\ub97c \ub9cc\ub4e0 \uacbd\uc6b0\ub97c \ubd04<\/li>\n  <li>WikiText-103 \ub370\uc774\ud130\uc5d0 \ub300\ud574\uc11c nearest neighbor mechanism\uc744 \ub354\ud588\uc744 \ub54c \uc131\ub2a5\uc774 \ud5a5\uc0c1\ub418\uc5c8\uace0 new state-of-the-art\ub97c \uae30\ub85d\ud55c \uac83\uc744 Table 1\uc744 \ud1b5\ud574 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\ubc31\uacfc\uc0ac\uc804\uc2dd\uc758 Wikipedia\uac00 \uce90\uc2f1\uc5d0 \uc788\uc5b4\uc11c \uc720\ub9ac\ud55c \uc810\uc774 \uc788\uc744 \uc218\ub3c4 \uc788\uc5b4\uc11c Books \ucf54\ud37c\uc2a4\ub85c\ub3c4 \ub3d9\uc77c\ud55c \uc2e4\ud5d8\uc744 \uc9c4\ud589\ud558\uc600\uace0, \uc5ed\uc2dc SOTA \uc131\ub2a5\uc744 \uae30\ub85d\ud558\uc600\uc74c (Table 2)<\/li>\n<\/ul>\n\n<h3 id=\"42-more-data-without-training\">4.2. More Data without Training<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/table3.png\" alt=\"table3\" \/><\/p>\n\n<ul>\n  <li>LM\uc758 \ud559\uc2b5 \ub370\uc774\ud130\uc640 \ub2e4\ub978 \ub370\uc774\ud130\uc14b\uc73c\ub85c datastore\ub97c build\ud574\ub3c4 \ub611\uac19\uc774 \ud5a5\uc0c1\uc774 \uc788\uc744\uc9c0\uc5d0 \ub300\ud55c \uc2e4\ud5d8\uc744 \uc9c4\ud589\n    <ul>\n      <li>training LM: Wiki-100M<\/li>\n      <li>Building datastore: Wiki-3B (larger than the training set)<\/li>\n      <li>\uc704 \uc138\ud305\uc73c\ub85c \ud559\uc2b5\ud55c $k$NN-LM\uc744 Wiki-3B\ub85c\ub9cc \ud559\uc2b5\ud55c vanilla LM\uacfc \ube44\uad50<\/li>\n    <\/ul>\n  <\/li>\n  <li>Table 3\uc744 \ubcf4\uba74 \uc131\ub2a5\uc774 \ub9e4\uc6b0 \ud5a5\uc0c1\ub41c\ub2e4\ub294 \uac83\uc744 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\ub2f9\uc5f0\ud788 Wiki-3B\ub85c \ud559\uc2b5\ud55c LM\uc774 Wiki-100M\uc73c\ub85c \ud559\uc2b5\ud55c LM \ubcf4\ub2e4 \uc131\ub2a5\uc774 \uc88b\uc558\uace0 \uc774\uc5d0 \ub300\ud574 retrieval\uc744 \ubd99\uc774\uba74 \uc131\ub2a5\uc774 \ub354\uc6b1 \ud5a5\uc0c1\ub428<\/li>\n  <li>\uc774 \uacb0\uacfc\ub85c \ubbf8\ub8e8\uc5b4\ubcf4\uc544, LM\uc758 \ud559\uc2b5 \ub370\uc774\ud130\ub97c \ud0a4\uc6b0\ub294 \uac83\ubcf4\ub2e4 \uc791\uc740 \ub370\uc774\ud130\uc14b\uc73c\ub85c representation\uc744 \ud559\uc2b5\ud558\uace0 large corpus\uc5d0 \ub300\ud55c $k$NN-LM\uc73c\ub85c augmentation\uc744 \ud558\ub294 \uac8c \ub354 \uc88b\ub2e4\uace0 \ud560 \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/figure2.png\" alt=\"figure2\" \/><\/p>\n\n<ul>\n  <li>$k$NN retrieval\uc744 \uc704\ud55c \ub370\uc774\ud130 \uc591\uc774 \uc131\ub2a5\uc5d0 \uc5bc\ub9c8\ub098 \uc601\ud5a5\uc744 \uc8fc\ub294\uc9c0 \uc54c\uc544\ubcf4\uae30 \uc704\ud574\uc11c Wiki-3B \ub370\uc774\ud130\ub97c \uc0d8\ud50c\ub9c1\ud558\uc5ec \uc2e4\ud5d8\uc744 \ud574\ubcf4\uc558\uace0, \uacb0\uacfc\ub294 Figure 2a\uc640 \uac19\uc558\uc74c<\/li>\n  <li>1.6B\ub9cc \uc37c\uc744 \ub54c \uc774\ubbf8 Wiki-3B\ub85c fully \ud559\uc2b5\ud55c \ubaa8\ub378\ubcf4\ub2e4 \uc131\ub2a5\uc774 \uc88b\uc558\uace0, 3B\ub97c \ub2e4 \uc368\ub3c4 \uc131\ub2a5\uc774 saturate\ub418\uc9c0 \uc54a\ub294 \uac83\uc73c\ub85c \ubcf4\uc544 \ub370\uc774\ud130 \ud06c\uae30\ub97c \ub354 \ud0a4\uc6b0\uba74 \ub354 \uc131\ub2a5\uc774 \ud5a5\uc0c1\ub420 \uac83\uc73c\ub85c \ubcf4\uc784<\/li>\n  <li>Figure 2b\ub294 datastore\uc758 \ud06c\uae30\uac00 \ucee4\uc9c8\uc218\ub85d $k$NN component\uc5d0 \ub354 \uc758\uc9c0($\\lambda$)\ud55c\ub2e4\ub294 \uac83\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h3 id=\"43-domain-adaptation\">4.3. Domain Adaptation<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/table4.png\" alt=\"table4\" \/><\/p>\n\n<ul>\n  <li>Domain adaptation \uc131\ub2a5\uc744 \ubcf4\uc774\uae30 \uc704\ud574, Wiki-3B\ub85c \ud559\uc2b5\ud55c LM\uc5d0 \ub300\ud574\uc11c Books \ucf54\ud37c\uc2a4\uc5d0 \ub300\ud574\uc11c test(inference)\ud558\ub294 \uc2e4\ud5d8\uc744 \ud568<\/li>\n  <li>\uae30\ubcf8\uc801\uc73c\ub85c \uadf8\ub0e5 \ud558\uba74 \uc131\ub2a5\uc774 \uc0c1\ub2f9\ud788 \ub5a8\uc5b4\uc9d0<\/li>\n  <li>\uc774\ub54c Books \ucf54\ud37c\uc2a4\ub85c \ub9cc\ub4e0 datastore\ub97c \ubd99\uc774\uba74 \uc131\ub2a5\uc774 \uc0c1\ub2f9\ud788 \uac1c\uc120\ub418\ub294 \uac83\uc744 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\uc989, target domain\uc5d0 \ub300\ud55c \ucf54\ud37c\uc2a4\ub9cc \uc788\uc73c\uba74 \ud574\ub2f9 \ucf54\ud37c\uc2a4\uc5d0 \ub300\ud574 \uc0c8\ub85c LM\uc744 \ud559\uc2b5\uc2dc\ud0a4\uc9c0 \uc54a\uc544\ub3c4 \ucda9\ubd84\ud788 domain adaptation\uc774 \uac00\ub2a5\ud558\ub2e4\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"5-tuning-nearest-neighbor-search\">5. Tuning Nearest Neighbor Search<\/h2>\n\n<h4 id=\"key-function\">Key Function<\/h4>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/figure3_and_table5.png\" alt=\"figure3-table5\" \/><\/p>\n\n<ul>\n  <li>Retrieval\uc744 \uc704\ud574 Context $c$\uc758 representation\uc744 \uc5b4\ub5bb\uac8c \ubf51\uc744\uc9c0\uc5d0 \ub300\ud55c \uc2e4\ud5d8<\/li>\n  <li>\uae30\ubcf8\uc801\uc73c\ub85c LM\uc758 intermediate state $f(c)$\ub97c \ud1b5\ud574 \ucd94\ucd9c\ud568<\/li>\n  <li>Transformer LM\uc758 \uac01 layer\ub294 Figure 3\ucc98\ub7fc \uc0dd\uacbc\uc74c. Table 5\ub294 LM layer\uc758 \uac01 \uc138\ubd80 layer\ub97c representation\uc73c\ub85c \uc37c\uc744 \ub54c \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>FFNN \uc804 Layer Norm \ud6c4\uc758 vector\ub97c prefix\uc758 representation\uc73c\ub85c \uc37c\uc744 \ub54c, $k$NN-LM\uc758 \uc131\ub2a5\uc774 \uac00\uc7a5 \uc88b\uc558\uc74c<\/li>\n  <li>\uc774\ub7f0 \uacb0\uacfc\ub85c \ubbf8\ub8e8\uc5b4\ubcf4\uc544, FFNN \uc804\uc758 self-attention\ub9cc \ud0d4\uc744 \ub54c\uac00 representation\uc73c\ub85c\ub294 \ub354 \uc88b\uace0, FFNN \ud6c4\ub294 \ub2e4\uc74c \ub2e8\uc5b4 prediction\uc744 \uc704\ud574 \uc880 \ub354 \ud53c\ud305\ub418\uc5b4\uc788\uc9c0 \uc54a\ub098 \uc2f6\uc74c<\/li>\n  <li>Second-last layer\uc5d0 \ub300\ud574\uc11c\ub3c4 \uc2e4\ud5d8\ud574\ubd24\ub294\ub370, \uc0b4\uc9dd \uc548\uc88b\uc558\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"number-of-neighbors-per-query\">Number of Neighbors per Query<\/h4>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/figure4_and_figure5.png\" alt=\"figure4-and-figure5\" \/><\/p>\n\n<ul>\n  <li>Nearest neighbor\uc758 \uc218 $k$\uc5d0 \ub300\ud55c \uc2e4\ud5d8<\/li>\n  <li>$k$\uac00 \ud074\uc218\ub85d \uc131\ub2a5\uc774 \uacc4\uc18d \uc88b\uc544\uc9d0<\/li>\n  <li>\uadfc\ub370 $k$\uac00 8\ub85c \uc791\uc544\ub3c4 \uc774\ubbf8 SOTA\uc784<\/li>\n<\/ul>\n\n<h4 id=\"interpolation-parameter\">Interpolation parameter<\/h4>\n\n<ul>\n  <li>LM\uc758 distribution\uacfc $k$NN search\uc758 distribution\uc744 interpolation\ud558\ub294 parameter $\\lambda$\uc5d0 \ub300\ud55c \uc2e4\ud5d8<\/li>\n  <li>\uacb0\uacfc\ub294 Figure 5\uc640 \uac19\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"precision-of-similarity-function\">Precision of Similarity Function<\/h4>\n\n<ul>\n  <li>FAISS\uc5d0\uc11c quantization\uc744 \ud55c \uac83\ubcf4\ub2e4 full precision\uc73c\ub85c \ud588\uc744 \ub54c \uc131\ub2a5\uc774 \uc62c\ub790\uc74c (16.5 -&gt; 16.06)<\/li>\n<\/ul>\n\n<h2 id=\"6-analysis\">6. Analysis<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/figure6.png\" alt=\"figure6\" \/><\/p>\n\n<h4 id=\"qualitative-analysis\">Qualitative Analysis<\/h4>\n\n<ul>\n  <li>$k$NN-LM\uc774 \uc65c \uc88b\uc740\uc9c0 \uc774\ud574\ud558\ub824\uace0 $p_{\\text{KNN}}$\uac00 $p_{\\text{LM}}$ \ubcf4\ub2e4 \ub098\uc740 \uacbd\uc6b0\ub97c \ud655\uc778\ud574\ubd04<\/li>\n  <li>Figure 6\uacfc Appendix A\uac00 \uc774\uc5d0 \ud574\ub2f9\ud558\ub294 \uc608\uc2dc\uc784<\/li>\n  <li>\uc8fc\ub85c \uc0ac\uc2e4\uc801 \uc9c0\uc2dd\uc774\ub098 \uc774\ub984, training set\uc5d0 \uac70\uc758 \ube44\uc2b7\ud558\uac8c \uc874\uc7ac\ud558\ub294 \ud328\ud134 \ub4f1 \ud76c\uadc0\ud55c \ud328\ud134\uc5d0 \ub300\ud574\uc11c \uad49\uc7a5\ud788 \uc798\ud55c\ub2e4\ub294 \uc0ac\uc2e4\uc744 \uc54c \uc218 \uc788\uc5c8\uc74c<\/li>\n  <li>\uc774\ub7f0 \uc608\uc2dc\ub97c \ud1b5\ud574\uc11c Similar representation\uc744 \ub9cc\ub4e4\uc5b4\uc11c explicit\ud558\uac8c nearest neighbor\ub97c \ucc3e\ub294 \uac83\uc774 model parameter\ub97c \ud1b5\ud574 implicit\ud558\uac8c \ub2e4\uc74c \ub2e8\uc5b4\ub97c \uae30\uc5b5\ud558\ub294 \uac83\ubcf4\ub2e4 \uc27d\ub2e4\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"simple-vs-neural-representation\">Simple vs Neural Representation<\/h4>\n\n<p><img src=\"\/assets\/images\/blog\/2020-04-06-nearest-neighbor-language-models\/figure7_and_figure8.png\" alt=\"figure7-and-figure8\" \/><\/p>\n\n<ul>\n  <li>Long-tail \ud604\uc0c1\uc774 \uc8fc\ub85c \ud76c\uadc0\ud55c $n$-gram \ud328\ud134(\uc774\ub984 \ub4f1)\uc5d0\uc11c \uc790\uc8fc \ub098\uc628\ub2e4\ub294 \uac83\uc744 \uc54c\uac8c \ub428<\/li>\n  <li>\uadf8\ub7ec\uba74 $k$NN \ub300\uc2e0\uc5d0 \uadf8\ub0e5 $n$-gram LM\uc744 \uc4f0\uba74 \ub611\uac19\uc774 \uc131\ub2a5\uc774 \ud5a5\uc0c1\ub418\uc9c0 \uc54a\uc744\uae4c? \ub77c\ub294 \uc0dd\uac01\uc744 \ud574\ubd04<\/li>\n  <li>$n$-gram LM\uc5d0 \ub300\ud55c \uc2e4\ud5d8\uc744 Figure 7\uacfc \uac19\uc774 \ud574\ubcf4\uc558\uace0, \uacb0\uacfc\uc801\uc73c\ub85c \uc57d\uac04\uc758 \uc131\ub2a5 \ud5a5\uc0c1\uc740 \uc788\uc5c8\uc9c0\ub9cc $k$NN\uc774 \ub354 \uc88b\uc558\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"implicit-vs-explicit-memory\">Implicit vs Explicit memory<\/h4>\n\n<ul>\n  <li>Neural network parameter\uac00 implicit\ud558\uac8c training dataset\uc744 \uc678\uc6cc\ubc84\ub9b0\ub2e4\uba74, explicit\ud55c datastore\ub97c \ub300\uccb4\ud560 \uc218 \uc788\uc744\uae4c?\uc5d0 \ub300\ud55c \uc2e4\ud5d8\n    <ul>\n      <li>1\ub2e8\uacc4:\n        <ul>\n          <li>\uc77c\ub2e8 \ubaa8\ub378\uc774 training dataset\uc744 \uc644\uc804\ud788 \uc678\uc6b8 \uc218 \uc788\ub294\uc9c0 \ubcf4\uae30 \uc704\ud574 dropout\uc744 \uc81c\uac70\ud574\uc11c overfitting\uc744 \uc2dc\ucf1c\ubcf4\uc558\uace0, \uacb0\uacfc\ub294 Figure 8\ucc98\ub7fc loss\uac00 0\uc5d0 \uc218\ub834\ud558\uba74\uc11c training set\uc744 \uc644\uc804\ud788 \uc678\uc6b0\ub294 \uac8c \uac00\ub2a5\ud568<\/li>\n          <li>\ub2e4\ub9cc overfitting\uc774 \ub418\uc5c8\uae30\uc5d0 devset\uc5d0 \ub300\ud55c \uc131\ub2a5\uc740 28.59\ub85c dropout\uc744 \uc368\uc11c generalize\ud55c LM\uc758 17.96\ubcf4\ub2e4 \ubcc4\ub85c\uc784<\/li>\n          <li>\uadf8\ub798\ub3c4 \uc77c\ub2e8 \uacb0\uacfc\uc801\uc73c\ub85c Transformer\uac00 training set\uc744 \uc678\uc6b8 \uc815\ub3c4\uc758 \ucda9\ubd84\ud55c capacity\ub294 \uac16\uace0 \uc788\ub2e4\ub294 \uac83\uc744 \ud655\uc778\ud568<\/li>\n        <\/ul>\n      <\/li>\n      <li>2\ub2e8\uacc4:\n        <ul>\n          <li>\uc774\uc81c nearest neighbor \ub300\uc2e0\uc5d0 training set\uc744 \uc678\uc6b4 LM(memorizing LM)\uc744 \uc368\uc11c original LM\uacfc interpolation\uc744 \ud574\ubd04<\/li>\n        <\/ul>\n      <\/li>\n      <li>\uacb0\uacfc\n        <ul>\n          <li>Original LM\ub9cc \uc37c\uc744 \ub54c\ubcf4\ub2e4 0.1 \uc815\ub3c4 \uc131\ub2a5 \ud5a5\uc0c1\uc744 \uc2dc\ucf1c\uc8fc\uc9c0\ub9cc, $k$NN-LM\uc740 1.9\uc758 \ud5a5\uc0c1\uc744 \uc2dc\ucf1c\uc90c<\/li>\n          <li>\uacb0\uacfc\ub85c \ubbf8\ub8e8\uc5b4\ubcf4\uc544, training example\uc744 \ub2e4 \uc678\uc6b8\uc9c0\ub77c\ub3c4 context representation\uc774 \ucda9\ubd84\ud788 generalization\uc774 \ub418\uc9c0\ub294 \uc54a\uc740 \uac83\uc73c\ub85c \ubcf4\uc784<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc65c $k$NN-LM\uc774 \uc131\ub2a5\uc744 \ud5a5\uc0c1\uc2dc\ud0ac\uae4c? (\ucd94\uce21)\n    <ul>\n      <li>Transformer LM\uc774 simliarity \uad00\uc810\uc5d0\uc11c \ud6a8\uacfc\uc801\uc778 represenation\ub97c \uc798 \ud559\uc2b5\ud568<\/li>\n      <li>Transformer\uac00 training set\uc744 \uc678\uc6b0\uae30\uc5d0 \ucda9\ubd84\ud55c capacity\ub97c \uac16\uace0 \uc788\uc9c0\ub9cc \uadf8\ub0e5 \ub2e4 \uc678\uc6cc\ubc84\ub9ac\uba74 generalization \uad00\uc810\uc5d0\uc11c \ub35c \ud6a8\uacfc\uc801\uc784.<\/li>\n      <li>\ud558\uc9c0\ub9cc $k$NN-LM\uc740 \ubaa8\ub378\uc774 \ud6a8\uacfc\uc801\uc778 similarity function\uc744 \ud559\uc2b5\ud558\uba74\uc11c\ub3c4 training set\uc744 \uc678\uc6c0<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"7-related-work\">7. Related Work<\/h2>\n\n<ul>\n  <li>pass<\/li>\n<\/ul>\n\n<h2 id=\"8-conclusion-and-future-work\">8. Conclusion and Future Work<\/h2>\n\n<ul>\n  <li>Test time\uc5d0 training example\uc744 \uc9c1\uc811\uc801\uc73c\ub85c \ud65c\uc6a9\ud558\uba70 \uae30\uc874 standart LM \ubcf4\ub2e4 \uc0c1\ub2f9\ud788 \uc131\ub2a5 \uac1c\uc120\uc744 \ud55c $k$NN-LMs\uc744 \uc81c\uc548\ud568<\/li>\n  <li>\uc774 approach\ub294 \uc5b4\ub5a4 neural LM\uc5d0\ub3c4 \uc801\uc6a9\uc774 \uac00\ub2a5\ud568<\/li>\n  <li>\uc774\uac8c \uc798 \ub418\ub294 \uc774\uc720\ub294 contexts \uac04\uc758 similarity function\uc744 \ud559\uc2b5\ud558\ub294 \uac8c \uc8fc\uc5b4\uc9c4 context\uc5d0 \ub300\ud574 \ub2e4\uc74c \ub2e8\uc5b4\ub97c \uc608\uce21\ud558\ub294 \uac83\ubcf4\ub2e4 \uc27d\uae30 \ub54c\ubb38\uc774\ub77c\uace0 \ubd04<\/li>\n  <li>Future work\uc73c\ub85c similarity function\uc744 \ud559\uc2b5\uc2dc\ud0a4\ub294 \ubc29\ubc95\uc5d0 \ub300\ud55c \ud0d0\uad6c\uc640 datastore\ub97c \uc904\uc774\ub294 \ubc29\ubc95\uc744 \uc0dd\uac01 \uc911\uc784<\/li>\n<\/ul>\n","pubDate":"Mon, 06 Apr 2020 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/nearest-neighbor-language-models\/","guid":"https:\/\/roomylee.github.io\/nearest-neighbor-language-models\/","category":["nearest-neighbor-language-model","language-model","generalization","memorization","retrieval","faiss","blog"]},{"title":"Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (EMNLP 2019)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1908.10084\">https:\/\/arxiv.org\/abs\/1908.10084<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Nils Reimers and Iryna Gurevych<\/li>\n      <li>Technische Universitat Darmstadt<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>EMNLP 2019<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>BERT(Devlin et al., 2018)\ub098 RoBERTa(Liu et al., 2019)\uac00 semantic textual similarity(STS)\uc640 \uac19\uc740 sentence-pair regression tasks\uc5d0\uc11c state-of-the-art \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>\ud558\uc9c0\ub9cc \uc774\ub7f0 \ubaa8\ub378\uc744\uc740 input sentence pair\uac00 \ud55c\ubc88\uc5d0 feeding \ub418\uc5b4\uc57c \ud55c\ub2e4\ub294 \ub2e8\uc810\uc774 \uc788\uc74c<\/li>\n  <li>\ub9cc\uc57d 10000\uac1c\uc758 \ubb38\uc7a5 \uc911 \uac00\uc7a5 \uc720\uc0ac\ud55c pair\ub97c \ucc3e\ub294\ub2e4\uace0 \ud558\uba74 \uc57d 50M\uc758 inference computations\uc774 \ud544\uc694\ud568 (65 hours)<\/li>\n  <li>\n    <p>\uc774\ub7f0 BERT\uc758 \uad6c\uc870\ub294 semantic similarity search\uc5d0 \uc801\ud569\ud558\uc9c0 \uc54a\uc74c<\/p>\n  <\/li>\n  <li>\uc774 \ub17c\ubb38\uc5d0\uc11c\ub294 BERT\ub97c siamese and triplet network \ud615\ud0dc\ub85c \ubc14\uafbc Sentence-BERT(SBERT)\ub97c \uc81c\uc548\ud568<\/li>\n  <li>\uc774\ub7f0 \ub124\ud2b8\uc6cc\ud06c \uad6c\uc870\ub294 \ubb38\uc7a5\uc758 \uc758\ubbf8\ub97c sentence embedding\uc774 \ud6a8\uacfc\uc801\uc73c\ub85c \ud45c\ud604\ud560 \uc218 \uc788\uac8c \ud574\uc8fc\uba70, cosine-similarity\ub97c \ud1b5\ud574 \uc27d\uac8c \uc720\uc0ac\ub3c4\ub97c \uacc4\uc0b0\ud560 \uc218 \uc788\uac8c \ud574\uc90c<\/li>\n  <li>SBERT\ub97c \uc774\uc6a9\ud558\uba74 \uc704\uc5d0\uc11c BERT\/RoBERTa\uac00 65\uc2dc\uac04 \uac78\ub9ac\ub358 \uac78 5\ucd08\ub9cc\uc5d0 \ub05d\ub0bc \uc218 \uc788\uc74c<\/li>\n  <li>\uc6b0\ub9ac\uac00 \uc81c\uc548\ud558\ub294 SBERT\/SRoBERTa\ub294 STS\ub97c \ube44\ub86f\ud55c transfer tasks\uc5d0\uc11c \ub2e4\ub978 SOTA sentence embedding method\ub97c outperform \ud588\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2020-02-20-sentence-bert\/figure1-and-2.png\" alt=\"figure1-and-2\" \/><\/p>\n\n<h2 id=\"3-model\">3. Model<\/h2>\n\n<h4 id=\"pooling-strategy\">Pooling Strategy<\/h4>\n\n<ol>\n  <li>CLS: Output of CLS-token<\/li>\n  <li>MEAN: Mean of all output vectors<\/li>\n  <li>MAX: Max-over-time of the output vectors<\/li>\n<\/ol>\n\n<h4 id=\"objective-functions\">Objective Functions<\/h4>\n\n<ol>\n  <li>\n    <p>Classification Objective Function<\/p>\n\n\\[o = \\text{softmax}(W_t(u, v, |u-v|))\\]\n\n    <ul>\n      <li>$W_t \\in \\mathbb{R}^{3n \\times k}$<\/li>\n      <li>Optimize cross-entropy loss<\/li>\n      <li>dipicted in Figure 1<\/li>\n    <\/ul>\n  <\/li>\n  <li>\n    <p>Regression Objective Function<\/p>\n\n\\[o = \\sigma(\\text{cosine\\_similarity}(u, v))\\]\n\n    <ul>\n      <li>Cosine similarity between two sentence embedding $u$ and $v$<\/li>\n      <li>Optimize mean-squared-error loss<\/li>\n      <li>dipicted in Figure 2<\/li>\n    <\/ul>\n  <\/li>\n  <li>\n    <p>Triplet Objective Function<\/p>\n\n    <ul>\n      <li>Anchor sentence $a$ , positive sentence $p$ , negative sentence $n$ \uc774 \uc788\ub2e4\uace0 \ud574\ubcf4\uc790<\/li>\n      <li>Triplet loss\ub294 $a$ \uc640 $p$ \uc0ac\uc774\uc758 \uac70\ub9ac\ub294 \uac00\uae5d\uac8c, $a$ \uc640 $n$ \uc0ac\uc774\uc758 \uac70\ub9ac\ub294 \uba40\uac8c \ud574\uc90c<\/li>\n      <li>\uc544\ub798\uc640 \uac19\uc740 loss function\uc744 minimize\ud568<\/li>\n    <\/ul>\n\n\\[\\max{(||s_a - s_p|| - ||s_a - s_n|| + \\epsilon, 0)}\\]\n\n    <ul>\n      <li>Distance metric\uc73c\ub85c\ub294 Euclidean\uc744 \uc0ac\uc6a9<\/li>\n      <li>$\\epsilon$ \uc740 1\ub85c \uc138\ud305<\/li>\n    <\/ul>\n  <\/li>\n<\/ol>\n\n<h3 id=\"31-training-details\">3.1. Training Details<\/h3>\n\n<ul>\n  <li>\uc544\ub798\uc758 \ub450 dataset\uc744 \uc870\ud569\ud558\uc5ec \ud559\uc2b5(fine-tuning)\ud568\n    <ul>\n      <li><a href=\"https:\/\/nlp.stanford.edu\/projects\/snli\/\">SNLI<\/a> (Bowman et al., 2015)<\/li>\n      <li><a href=\"https:\/\/www.nyu.edu\/projects\/bowman\/multinli\/\">MNLI<\/a> (Williams et al., 2018)<\/li>\n      <li>\ub458 \ub2e4 \uc8fc\uc5b4\uc9c4 sentence pair \uc0ac\uc774\uc758 \uad00\uacc4\ub97c <em>contradiction<\/em>, <em>entailment<\/em>, <em>neutral<\/em> \uc911 \ud558\ub098\ub85c \ubd84\ub958\ud558\ub294 \ubb38\uc81c<\/li>\n    <\/ul>\n  <\/li>\n  <li>We fine-tune SBERT with a 3-way softmaxclassifier objective function for one epoch<\/li>\n  <li>Batch size = 16<\/li>\n  <li>Adam optimizer<\/li>\n  <li>Learning rate = 2e-5<\/li>\n  <li>Linear learning rate warm-up over 10% of training data<\/li>\n  <li>Pooling strategy = MEAN<\/li>\n<\/ul>\n\n<h2 id=\"4-evaluation---semantic-textual-similarity\">4. Evaluation - Semantic Textual similarity<\/h2>\n\n<h3 id=\"41-unsupervised-sts\">4.1. Unsupervised STS<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-02-20-sentence-bert\/table1.png\" alt=\"table1\" \/><\/p>\n\n<ul>\n  <li>\uac01 \ubaa8\ub378\ub85c\ubd80\ud130 \uc5bb\uc740 sentence embedding\uc73c\ub85c \uad6c\ud55c cosine similarity\uc640 gold label \uc0ac\uc774\uc758 correlation\uc744 \ubcf4\uc784<\/li>\n  <li>\uc704\uc758 \ubaa8\ub4e0 \ubaa8\ub378\ub4e4\uc740 STS \ub370\uc774\ud130\ub97c \ud559\uc2b5\ud55c \uc801\uc774 \uc5c6\uc74c<\/li>\n  <li>\uc989, STS\uc5d0 \ub300\ud55c \ud559\uc2b5 \uc5c6\uc774 sentence embedding\uc744 \ubf51\uc544\uc11c consine similarity\ub97c \uad6c\ud55c \uac83<\/li>\n  <li>(NLI \ub370\uc774\ud130\ub97c \ud559\uc2b5\ud574\uc11c \uadf8\ub7f0\uc9c0 SBERT\/SRoBERTa\uac00 \uc131\ub2a5\uc774 \uaf64 \uc88b\uc74c)<\/li>\n<\/ul>\n\n<h3 id=\"42-supervised-sts\">4.2. Supervised STS<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2020-02-20-sentence-bert\/table2.png\" alt=\"table2\" \/><\/p>\n\n<ul>\n  <li>\uc774\ubc88\uc5d0\ub294 STS\ub97c \ud559\uc2b5(fine-tuning)\ud55c \ubc84\uc804<\/li>\n  <li>\uc2e4\ud5d8\uc744 \uc704\ud55c \ub450\uac00\uc9c0 setup\uc774 \uc788\uc74c\n    <ol>\n      <li>Only training on STSb<\/li>\n      <li>First training on NLI, then training on STSb<\/li>\n    <\/ol>\n  <\/li>\n  <li>NLI\ub97c \ud559\uc2b5\ud55c \uacbd\uc6b0\uac00 1-2 \ud3ec\uc778\ud2b8 \uc815\ub3c4 \ub354 \uc88b\uc558\uc74c<\/li>\n  <li>BERT cross-encoder\ub294 NLI\ub97c \ud559\uc2b5\ud558\uba74 3-4 \ud3ec\uc778\ud2b8\ub098 \ub354 \ud5a5\uc0c1\ub428<\/li>\n  <li>BERT\uc640 RoBERTa \uc0ac\uc774\uc5d0\ub294 \ud070 \ucc28\uc774\ub294 \uc5c6\ub2e4\uace0 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h2 id=\"5-evaluation---senteval\">5. Evaluation - SentEval<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-02-20-sentence-bert\/table5.png\" alt=\"table5\" \/><\/p>\n\n<ul>\n  <li>SentEval(Conneau and Kiela, 2018)\n    <ul>\n      <li>MR: Sentiment prediction for movie reviews snippets on a five start scale (Pang and Lee, 2005)<\/li>\n      <li>CR: Sentiment prediction of customer product reviews (Hu and Liu, 2004)<\/li>\n      <li>SUBJ: Subjectivity prediction of sentences from movie reviews and plot summaries (Pang and Lee, 2004)<\/li>\n      <li>MPQA: Phrase level opinion polarity classification from newswire (Wiebe et al., 2005)<\/li>\n      <li>SST: Stanford Sentiment Treebank with binary labels (Socher et al., 2013)<\/li>\n      <li>TREC: Fine grained question-type classification from TREC (Li and Roth, 2002)<\/li>\n      <li>MRPC: Microsoft Research Paraphrase Corpus from parallel news sources (Dolan et al., 2004)<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"6-ablation-study\">6. Ablation Study<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2020-02-20-sentence-bert\/table6.png\" alt=\"table6\" \/><\/p>\n","pubDate":"Thu, 20 Feb 2020 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/sentence-bert\/","guid":"https:\/\/roomylee.github.io\/sentence-bert\/","category":["sentence-bert","sbert","siamese-network","sentence-embedding","representation","blog"]},{"title":"Attentional Encoder Network for Targeted Sentiment Classification (arXiv 2019)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1902.09314\">https:\/\/arxiv.org\/abs\/1902.09314<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Youwei Song, Jiahai Wang, Tao Jiang, Zhiyue Liu, Yanghui Rao<\/li>\n      <li>Sun Yat-sen University<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>arXiv 2019<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Targeted sentiment classification (=aspect-based sentiment analysis) \ubb38\uc81c\ub294 \ud2b9\uc815 target\uc5d0 \ub300\ud55c sentimental tendency\ub97c \ub9de\ucd94\ub294 \ubb38\uc81c\uc784<\/li>\n  <li>\ub300\ubd80\ubd84\uc758 previous approaches\ub294 RNN\uacfc attention\uc744 \uae30\ubc18\uc73c\ub85c \ud558\uace0 \uc788\uc74c<\/li>\n  <li>\ud558\uc9c0\ub9cc, RNN\uc740 \ubcd1\ub82c\ud654\uac00 \uc5b4\ub835\uace0, backpropagation through time (BPTT) \ubb38\uc81c\uac00 \uc788\uc5b4\uc11c long-term pattern\uc5d0 \ub300\ud574 \uae30\uc5b5\uc744 \uc720\uc9c0\ud558\uae30 \uc5b4\ub824\uc6c0<\/li>\n  <li>\uc774\ub7f0 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30\uc704\ud574, \uc6b0\ub9ac\ub294 Attentional Encoder Network (AEN)\uc744 \uc81c\uc548\ud568. AEN\uc740 recurrence\ub97c \ud53c\ud558\uace0 context\uc640 target\uc744 \ubaa8\ub378\ub9c1\ud558\uae30 \uc704\ud574 attention \uae30\ubc18 encoder\ub97c \ub3c4\uc785\ud588\uc74c<\/li>\n  <li>\ub610\ud55c \uc6b0\ub9ac\ub294 label\uc758 \uc2e0\ub8b0\ub3c4 \ubb38\uc81c\uac00 \uc788\uc74c\uc744 \uc81c\uae30\ud558\uace0 \uc774\ub97c \uc704\ud55c label smoothing regularization \uae30\ubc95\uc744 \uc18c\uac1c\ud568<\/li>\n  <li>\uadf8\ub9ac\uace0 pre-trained BERT\uc5d0 \uc774\ub7f0 \uac83\ub4e4\uc744 \uc801\uc6a9\ud558\uc5ec SOTA \uc131\ub2a5\uc744 \uc5bb\uc5c8\uace0 \uc2e4\ud5d8\uc744 \ud1b5\ud574 \ubaa8\ub378\uc758 effectiveness\uc640 lightweight\ub97c \ubcf4\uc784<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Targeted sentiment classification\ub294 fine-grained sentiment analysis task\ub85c\uc11c \ubb38\uc7a5\uc5d0 \ub098\ud0c0\ub09c \ud2b9\uc815 opinion target \ub2e8\uc5b4\uc758 sentiment polarity\ub97c \uacb0\uc815\ud558\ub294 \ubb38\uc81c\uc784<\/li>\n  <li>\uc608\ub97c \ub4e4\uc5b4, <em>\u201cI hated their service, but their food was great.\u201d<\/em> \ub77c\ub294 \ubb38\uc7a5\uc774 \uc788\uc744 \ub54c, <em>service<\/em> \ub294 negative, <em>food<\/em> \ub294 positive\uc758 polarity\ub97c \uac16\uc74c<\/li>\n  <li>Target\uc740 \ubcf4\ud1b5 entity \ud639\uc740 entity aspect\uc784<\/li>\n  <li>\uc774\uc804 \uc5f0\uad6c\uc758 \uccab\ubc88\uc9f8 \ubb38\uc81c\ub294 RNN\uc5d0 \uad49\uc7a5\ud788 \uc758\uc874\uc801\uc774\ub77c\ub294 \uac83\uc784. \ub54c\ubb38\uc5d0 RNN\uc758 \ubb38\uc81c\ub4e4\uc774 \uc0b4\uc544\uc788\ub2e4\ub294 \uac83\uc784<\/li>\n  <li>\ub450\ubc88\uc9f8 \ubb38\uc81c\ub294 \uc774\uc804 \uc5f0\uad6c\ub4e4\uc774 label unreliability issue\ub97c \uac04\uacfc\ud558\uace0 \uc788\ub2e4\ub294 \uac83\uc784. <em>neutral<\/em> sentiment\ub294 \ubcf5\uc7a1\ud55c sentimental state\ub85c\uc11c \ubaa8\ub378\uc758 \ud559\uc2b5\uc744 \uc5b4\ub835\uac8c \ub9cc\ub4ec. \uc6b0\ub9ac\uac00 \uc54c\uae30\ub860 \uc774 \ubd84\uc57c\uc5d0\uc11c \uc774\ub7f0 \uc774\uc288\ub97c \uc81c\uae30\ud558\ub294 \uac83\uc740 \ucc98\uc74c\uc784<\/li>\n  <li>\uc6b0\ub9ac\ub294 \uc774\ub7f0 \ubb38\uc81c\ub97c attention encoder\uc640 label smoothing regularization\uc73c\ub85c \ud574\uacb0\ud558\uace0\uc790 \ud568<\/li>\n  <li>Pre-trained BERT\ub97c \uc0ac\uc6a9\ud558\uc600\uc73c\uba70, SOTA \uc131\ub2a5\uc744 \ubcf4\uc600\uace0 \uae30\uc874 best RNN \ubaa8\ub378\ubcf4\ub2e4 \uac00\ubcbc\uc6c0<\/li>\n  <li>\uc6b0\ub9ac\uc758 main contribution\uc740 \ub2e4\uc74c\uacfc \uac19\uc74c:\n    <ol>\n      <li>Target\uacfc context words \uac04\uc758 \uc758\ubbf8\uc801 \uad00\uacc4\ub97c \ud45c\ud604\ud558\uae30 \uc704\ud574 attentional encoder network\ub97c \ub514\uc790\uc778\ud568<\/li>\n      <li>Label unreliability issue\ub97c \ucc98\uc74c\uc73c\ub85c \uc81c\uae30\ud558\uace0, \uc774\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574 label smoothing regularization term\uc744 loss function\uc5d0 \ucd94\uac00\ud568<\/li>\n      <li>Pre-trained BERT\ub97c \uc774 \ud0dc\uc2a4\ud06c\uc5d0 \ub3c4\uc785\ud558\uc600\uace0, \uc77c\ubc18 basic BERT\ubcf4\ub2e4 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc774\uac8c \uc218\uc815\ud568. \ucd5c\uc885 SOTA \uc131\ub2a5\uc744 \uc5bb\uc74c<\/li>\n      <li>\ub2e4\ub978 \ubaa8\ub378\uacfc model size\ub97c \ube44\uad50\ud558\ubbc0\ub85c\uc11c \uc81c\uc548\ud558\ub294 \ubaa8\ub378\uc758 lightweight\ub97c \ubcf4\uc784<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>pass<\/li>\n<\/ul>\n\n<h2 id=\"3-proposed-methodology\">3. Proposed Methodology<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-attention-encoder-targeted-sentiment\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li><strong>w^c<\/strong>: context sequence<\/li>\n  <li><strong>w^t<\/strong>: target sequence, (subsequence of <strong>w^c<\/strong>)<\/li>\n  <li>Figure 1\uc774 proposed Attention Encoder Network (AEN)\uc784<\/li>\n  <li>embedding layer, attention encoder layer, target-specific attention layer, output layer \ub4f1\uc73c\ub85c \uad6c\uc131\ub428<\/li>\n  <li>embedding layer\uc5d0 GloVe embedding\uacfc BERT embedding\uc744 \uc0ac\uc6a9\ud588\uc73c\uba70, \uc774\ub97c \uac01\uac01 AEN-GloVe, AEN-BERT\ub77c\uace0 \ubd80\ub984<\/li>\n<\/ul>\n\n<h3 id=\"31-embedding-layer\">3.1 Embedding Layer<\/h3>\n\n<h4 id=\"311-glove-embedding\">3.1.1 GloVe Embedding<\/h4>\n\n<ul>\n  <li>pre-trained GloVe \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h4 id=\"312-bert-embedding\">3.1.2 BERT Embedding<\/h4>\n\n<ul>\n  <li>pre-trained BERT \uc0ac\uc6a9\ud568<\/li>\n  <li>context\ub294 \u201c[CLS] + context + [SEP]\u201d, target\uc740 \u201c[CLS] + target + [SEP]\u201d\ub85c \ub9cc\ub4e4\uc5b4\uc11c output word vectors of sequence\ub97c \uc5bb\uc5c8\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"32-attentional-encoder-layer\">3.2 Attentional Encoder Layer<\/h3>\n\n<ul>\n  <li>LSTM\uc744 \ub300\uc2e0\ud574\uc11c \ubcd1\ub82c\ucc98\ub9ac\uac00 \uac00\ub2a5\ud55c attention \uae30\ubc18\uc758 encoder layer\ub85c \ub300\uccb4\ud568\\<\/li>\n  <li>\ud06c\uac8c Multi-Head Attention (MHA)\uc640 Point-wise Convolution Transformation (PCT) \ub450\uac00\uc9c0 submodules\ub85c \uad6c\uc131\ub428<\/li>\n<\/ul>\n\n<h4 id=\"321-multi-head-attention\">3.2.1 Multi-Head Attention<\/h4>\n\n<ul>\n  <li>Transformer (Vaswani et al., 2017) \uc758 Multi-Head Attention (MHA)\ub97c \ucc28\uc6a9\ud568<\/li>\n  <li>\uc57d\uac04 \uc774\ub97c \ubcc0\ud615\ud574\uc11c 1) context words \uc790\uccb4\uc801\uc778 \ubaa8\ub378\ub9c1\uc744 \uc704\ud55c Intra-MHA\uc640 target word\ub97c 2) context\ub97c \uace0\ub824\ud558\uc5ec \ubaa8\ub378\ub9c1\ud558\uae30 \uc704\ud55c Inter-MHA\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>(\ubcc0\ud615\ud588\ub2e4\uace0 \ud558\ub294\ub370 \ub531\ud788\u2026 \uadf8\ub0e5 Intra-MHA\ub294 self-attention\uc774\uace0, Inter-MHA\ub294 \uc77c\ubc18 MHA\ub85c \ubcf4\uc784)<\/li>\n<\/ul>\n\n<h4 id=\"322-point-wise-convolution-transformation\">3.2.2 Point-wise Convolution Transformation<\/h4>\n\n<ul>\n  <li>Transformer\uc5d0\uc11c\uc758 FFNN (1D convolution)\uc744 \ub9d0\ud558\ub294 \uac83\uc73c\ub85c \ubcf4\uc784<\/li>\n  <li>\uc5ec\uae30\uc11c\ub294 \uc774 convolutional layer\ub97c 2\uac1c \uc313\uc544\uc11c \ub9cc\ub4ec<\/li>\n<\/ul>\n\n<h3 id=\"33-target-specific-attention-layer\">3.3 Target-specific Attention Layer<\/h3>\n\n<ul>\n  <li>\ub9c8\uc9c0\ub9c9\uc73c\ub85c context\uc640 target\uc73c\ub85c\ubd80\ud130 \uc5bb\uc740 hidden representation\uc5d0 \ub300\ud574\uc11c MHA\uc744 \ud55c \ubc88 \ub354 \uac70\uce68<\/li>\n<\/ul>\n\n<h3 id=\"34-output-layer\">3.4 Output Layer<\/h3>\n\n<ul>\n  <li>\ucd5c\uc885 \uc544\uc6c3\ud48b\uc740 1) context repr, 2) target representation, 3) target-specific context repr, \uac01\uac01\uc744 average pooling\ud55c \uacb0\uacfc\ub97c concat\ud558\uc5ec \ub9cc\ub4ec<\/li>\n  <li>\uc774\uc5d0 \ub300\ud574\uc11c \ub9c8\uc9c0\ub9c9\uc73c\ub85c classification layer\ub97c \ud558\ub098 \ubd99\uc5ec\uc11c \ucd5c\uc885 sentiment polarity distribution\uc744 \uc5bb\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"35-regularization-and-model-training\">3.5 Regularization and Model Training<\/h3>\n\n<ul>\n  <li><em>neutral<\/em> sentiment\ub294 very fuzzy sentimental state\uc774\uae30 \ub54c\ubb38\uc5d0, \ud559\uc2b5 \uc0d8\ud50c \uc911 <em>neutral<\/em> \uc778 \uc560\ub4e4\uc744 \uc2e0\ub8b0\ud558\uae30 \uc5b4\ub824\uc6c0<\/li>\n  <li>\uadf8\ub798\uc11c \uc6b0\ub9ac\ub294 Label Smoothing Regularization (LSR) term\uc744 loss function\uc5d0 \ucd94\uac00\ud558\uc600\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-attention-encoder-targeted-sentiment\/eq151617.png\" alt=\"eq151617\" \/><\/p>\n\n<ul>\n  <li>\uc218\uc2dd\uc774\ub791 \uc124\uba85\uc774 \uc880 \uc774\ud574\ud558\uae30 \uc5b4\ub835\uae34\ud55c\ub370 \uc758\ubbf8\uc801\uc73c\ub85c \ubd24\uc744 \ub54c, \uae30\uc874 one-hot hard label\uacfc u(k)\ub97c \uac00\uc911\ud569\ud574\uc11c target ground truth label\uc744 \ub9cc\ub4e4\uaca0\ub2e4\ub294 \uac83\uc784<\/li>\n  <li>\uc608\ub97c \ub4e4\uc5b4, \uc5b4\ub5a4 \uc0d8\ud50c\uc774 [1, 0, 0]\uc758 ground truth label\uc744 \uac16\uace0 \uc788\ub2e4\uace0 \ud588\uc744 \ub54c, (1-eps) [1, 0, 0] + eps [1\/3, 1\/3, 1\/3]\uc744 \ud574\uc11c smoothing\ub41c label\uc744 \ub9cc\ub4e4\uc5b4\uc11c \ubaa8\ub378\uc774 \uc774\ub97c target\uc73c\ub85c \ud559\uc2b5\ud558\uac8c \ud558\uaca0\ub2e4\ub294 \uac83\uc784<\/li>\n  <li>(\uc704\uc758 \uc218\uc2dd \uc804\uac1c\uac00 \uc644\ubcbd\ud788 \uc774 \uc758\ubbf8\uc640 \ub3d9\uce58\uc778\uc9c0 \ubaa8\ub974\uaca0\uc74c. \ubb54\uac00 \uc218\uc2dd\uc774 \ud2c0\ub9b0 \ub290\ub08c)<\/li>\n  <li>L2-regularization term\ub3c4 \ucd94\uac00\ud568<\/li>\n<\/ul>\n\n<h2 id=\"4-experiments\">4. Experiments<\/h2>\n\n<h3 id=\"41-dataset-and-experimental-settings\">4.1 Dataset and Experimental Settings<\/h3>\n\n<ul>\n  <li>SemEval 2014 Task 4 (LAP14, REST14) (Pontiki et al., 2014)<\/li>\n  <li>TWITTER (Dong et al., 2014)<\/li>\n  <li>3\uac00\uc9c0 sentiment polarity: <em>positive<\/em>, <em>negetive<\/em>, <em>neutral<\/em><\/li>\n  <li>AEN-GloVe\ub294 word embedding\uc744 \ud559\uc2b5\uc2dc \uc5c5\ub370\uc774\ud2b8 \ud558\uc9c0 \uc54a\uc558\uace0, AEN-BERT\uc758 \uacbd\uc6b0 fine-tuning\uc744 \ud558\uc600\uc74c<\/li>\n  <li>\ubaa8\ub4e0 weights\ub294 Glorot initialization\uc744 \ud558\uc600\uc73c\uba70, LSR\uc758 eps\uc740 0.2\ub97c \uc8fc\uc5c8\uc74c<\/li>\n  <li>L2\uc758 coefficient\ub294 1e-5, dropout rate\ub294 0.1\uc744 \uc8fc\uc5c8\uc74c<\/li>\n  <li>Adam \uc37c\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"42-model-comparisons\">4.2 Model Comparisons<\/h3>\n\n<ul>\n  <li>None-RNN based baselines:\n    <ul>\n      <li>Feature-based SVM (Kiritchenko et al., 2014)<\/li>\n      <li>Rec-NN (Dong et al., 2014)<\/li>\n      <li>MemNet (Tang et al., 2016b)<\/li>\n    <\/ul>\n  <\/li>\n  <li>RNN based baselines:\n    <ul>\n      <li>TD-LSTM (Tang et al., 2016a)<\/li>\n      <li>ATAE-LSTM (Wang et al., 2016)<\/li>\n      <li>IAN (Ma et al., 2017)<\/li>\n      <li>RAM (Chen et al., 2017)<\/li>\n    <\/ul>\n  <\/li>\n  <li>AEN-GloVe ablations:\n    <ul>\n      <li>AEN-GloVe w\/o PCT<\/li>\n      <li>AEN-GloVe w\/o MHA<\/li>\n      <li>AEN-GloVe w\/o LSR<\/li>\n      <li>AEN-GloVe-BiLSTM (replaces the attentional encoder layer with two Bi-LSTM)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Basic BERT-based model:\n    <ul>\n      <li>BERT-SPC (feeds sequences \u201c[CLS] + context + [SEP] + target + [SEP]\u201d into the basic BERT model for sentence pair classification task)<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"43-main-results\">4.3 Main Results<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-attention-encoder-targeted-sentiment\/table2.png\" alt=\"table2\" \/><\/p>\n\n<ul>\n  <li>\ud655\uc2e4\ud788 BERT\ub97c \uae30\ubc18\uc73c\ub85c \ud55c BERT-SPC\uc640 AEN-BERT\uac00 \ub2e4\ub978 \ubaa8\ub378 \ub300\ube44 \uc544\uc8fc \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc600\uc74c<\/li>\n  <li>\uadf8\uc911\uc5d0\uc11c\ub3c4 BERT-SPC\ubcf4\ub2e4 AEN-BERT\uac00 \uc131\ub2a5\uc774 \uc88b\uc558\uace0, \uc774\ub97c \ud1b5\ud574 BERT\ub97c \ub2e8\uc21c\ud788 \uc0ac\uc6a9\ud558\ub294 \uac83\uc774 \uc544\ub2c8\uace0 sepcific task\uc5d0 \ub9de\ucdb0\uc11c customize\ud560 \ud544\uc694\uac00 \uc788\ub2e4\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uc74c<\/li>\n  <li>RAM\uc774 \ub2e4\ub978 RNN baselines \ubcf4\ub2e4\ub294 \uc131\ub2a5\uc774 \uc88b\uc558\uc9c0\ub9cc small and ungrammatical text\uc778 TWITTER\uc5d0\uc11c\ub294 \uc798 \uc791\ub3d9\ud558\uc9c0 \uc54a\uc740 \uac83\uc73c\ub85c \ubcf4\uc544 Bi-LSTM\uc774 \uc774\ub7f0 \ud2b9\uc131\uc758 dataset\uc5d0 \ub300\ud574 \uc798 \uc791\ub3d9\ud558\uc9c0 \uc54a\ub294 \uac83 \uac19\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"44-model-analysis\">4.4 Model Analysis<\/h3>\n\n<ul>\n  <li>Table 2\uc5d0\uc11c AEN-GloVe\uc758 \uacb0\uacfc\ub97c \ubcf4\uba74 \ubaa8\ub4e0 component\uac00 \uc131\ub2a5 \ud5a5\uc0c1\uc5d0 \uc88b\uc740 \uc601\ud5a5\uc744 \uc900\ub2e4\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uc74c<\/li>\n  <li>AEN-GloVe\uc640 AEN-GloVe-BiLSTM\uc740 \uc804\ubc18\uc801\uc73c\ub85c \uac70\uc758 \ube44\uc2b7\ud55c \uc131\ub2a5\uc744 \ubcf4\uc774\uace0 \uc788\uc9c0\ub9cc, AEN-GloVe\uc758 \ud30c\ub77c\ubbf8\ud130 \uc218\uac00 \ub354 \uc801\uace0 \ubcd1\ub82c\ud654\uac00 \uac00\ub2a5\ud558\ub2e4\ub294 \uc7a5\uc810\uc774 \uc788\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-attention-encoder-targeted-sentiment\/table3.png\" alt=\"table3\" \/><\/p>\n\n<ul>\n  <li>\ud30c\ub77c\ubbf8\ud130 \uc0ac\uc774\uc988\uc640 \uba54\ubaa8\ub9ac \ud06c\uae30\uc5d0 \ub300\ud574\uc11c \ube44\uad50\ud574\uc11c AEN\uc774 \uc88b\uc74c<\/li>\n  <li>(\uba54\ubaa8\ub9ac\uac00 \uc815\ud655\ud788 \ubb58 \ub9d0\ud558\ub294 \uac74\uc9c0 \ubaa8\ub974\uaca0\uc74c)<\/li>\n<\/ul>\n\n<h2 id=\"5-conclusion\">5. Conclusion<\/h2>\n\n<ul>\n  <li>Attentional encoder network\ub97c \uc81c\uc548\ud568<\/li>\n  <li>Attention \uae30\ubc18 encoder\ub97c \ub3c4\uc785\ud588\uace0, label unreliability issue\ub97c \uc704\ud574 LSR\ub97c \uc801\uc6a9\ud568<\/li>\n  <li>\ub610\ud55c pre-trained BERT\ub97c ABSA \ud0dc\uc2a4\ud06c\uc5d0 \ub3c4\uc785\ud558\uc5ec SOTA \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>\uc2e4\ud5d8 \ubc0f \ubd84\uc11d\uc744 \ud1b5\ud574 \uc81c\uc548 \ubaa8\ub378\uc758 effectiveness\uc640 lightweight\ub97c \ubcf4\uc784<\/li>\n<\/ul>\n","pubDate":"Sun, 08 Dec 2019 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/attention-encoder-targeted-sentiment\/","guid":"https:\/\/roomylee.github.io\/attention-encoder-targeted-sentiment\/","category":["attention-encoder-network","aspect-based-sentiment-analysis","blog"]},{"title":"Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks (EMNLP 2019)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/www.aclweb.org\/anthology\/D19-1464\/\">https:\/\/www.aclweb.org\/anthology\/D19-1464\/<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Chen Zhang, Qiuchi Li, and Dawei Song<\/li>\n      <li>Beijing Institute of Technology<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>EMNLP 2019<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Aspect-based sentiment classification \ud0dc\uc2a4\ud06c\uc5d0\uc11c attention\uacfc CNN\uc744 \ub9ce\uc774 \uc500<\/li>\n  <li>\ud558\uc9c0\ub9cc \uc774\ub7f0 \ubaa8\ub378\ub4e4\uc740 relevant syntactical constraints\uc640 long-range word dependency\ub97c \ucda9\ubd84\ud788 \uace0\ub824\ud558\uc9c0 \ubabb\ud558\uace0 \uc788\uc74c<\/li>\n  <li>\uadf8\ub798\uc11c aspect sentiment\uc758 \ud310\ub2e8\uc5d0 \ub300\ud55c \ub2e8\uc11c\ub85c syntactically irrelevant\ud55c contextual word\ub97c \ud0dd\ud558\ub294 \ubb38\uc81c\uac00 \uc0dd\uae40<\/li>\n  <li>\uc774\ub7f0 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574, dependency tree\uc5d0 \ub300\ud574 Graph Convolutional Network (GCN)\uc744 \uc0ac\uc6a9\ud568. \uc774\ub97c \ud1b5\ud574 syntactical information\uacfc word dependency\ub97c \uc798 \uc0ac\uc6a9\ud558\uace0\uc790 \ud568<\/li>\n  <li>\uad6c\uccb4\uc801\uc73c\ub85c\ub294 aspect-specific GCN\uc774\ub77c\ub294 \ubaa8\ub378\uc744 \uc81c\uc548\ud568<\/li>\n  <li>3\uac00\uc9c0 \ubca4\uce58\ub9c8\ud06c \ub370\uc774\ud130\ub97c \uc774\uc6a9\ud574 SOTA\ub77c\ub294 \uac83\uc744 \ubcf4\uc600\uace0, syntactical information\uacfc long-range word dependency\uc5d0 \ub300\ud55c \ubb38\uc81c\uac00 \uc798 \ud574\uacb0\ub418\uc5c8\ub2e4\ub294 \uac83\uc744 graph convolution structure\ub97c \ud1b5\ud574 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Aspect-based (aka aspect-level) sentiment classification\uc740 \uc8fc\uc5b4\uc9c4 \ubb38\uc7a5\uc5d0\uc11c \ub098\ud0c0\ub09c aspect \ub2e8\uc5b4\uc758 sentiment polarity\ub97c \uc608\uce21\ud558\ub294 \ubb38\uc81c\uc784<\/li>\n  <li>\uc608\ub97c \ub4e4\uc5b4, <em>\u201cFrom the speed to the multi-touch gestures this operating system beats Windows easily.\u201d<\/em> \ub77c\ub294 \ubb38\uc7a5\uc774 \uc788\uc5c8\uc744 \ub54c, aspect\ub85c \uc9c0\uc815\ub41c <em>\u201coperating system\u201d<\/em> \uc740 <em>positive<\/em> , <em>\u201cWindows\u201d<\/em> \ub294 <em>negative<\/em> \uc758 polarity\ub97c \uac16\uc74c. \uc774\uac78 \ub9de\ucd94\ub294 \ubb38\uc81c\uc784<\/li>\n  <li>Limitation 1 of previous work\n    <ul>\n      <li>Attention \uae30\ubc18\uc758 RNN \ubaa8\ub378\ub4e4\uc774 \ucd5c\uadfc \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc774\uace0 \uc788\uc9c0\ub9cc, \uc774\ub4e4\uc740 \ubb38\uc7a5 \ub0b4\uc758 context word\uc640 aspect \uac04\uc758 syntactical dependency\ub97c \ud6a8\uacfc\uc801\uc73c\ub85c \ucea1\uccd0\ub9c1\ud558\uae30 \uc5b4\ub824\uc6c0<\/li>\n      <li>\uc989, \ud604\uc7ac\uc758 attention mechanism \uae30\ubc18 \ubaa8\ub378\ub4e4\uc740 \uc8fc\uc5b4\uc9c4 aspect\uac00 \ubb38\uc7a5\uc5d0\uc11c \uad00\ub828 \uc5c6\ub294 context word\uc5d0 \uc798\ubabb attending\ud558\uac8c \ud560 \uc218 \uc788\ub2e4\ub294 \uac83\uc784\n        <ul>\n          <li>\uc608\ub97c \ub4e4\uc5b4, <em>\u201cIts size is ideal and the weight is acceptable\u201d<\/em> \uc774\ub77c\ub294 \ubb38\uc7a5\uc5d0\uc11c attention-based model\uc740 \uc885\uc885 aspect\uc778 <em>size<\/em> \uc758 descripor\ub97c <em>acceptable<\/em> \uc774\ub77c\uace0 \ubcf4\ub294 \uacbd\uc6b0\uac00 \uc788\uc74c (\uc6d0\ub798\ub294 <em>ideal<\/em> \uc774 \ub9de\uc74c)<\/li>\n        <\/ul>\n      <\/li>\n      <li>\uc774\ub7f0 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574, He et al. (2018) \uc740 attention weight\uc5d0 syntactical costraint\ub97c \uc8fc\ub824\uace0 \ud588\uc73c\ub098, syntactical structure\ub97c \uc644\uc804\ud788 \ud65c\uc6a9\ud558\uc9c0 \ubabb\ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li>Limitation 2 of previous work\n    <ul>\n      <li>Attention \uae30\ubc18\uc758 CNN \ubaa8\ub378\ub4e4\uc740 aspect\uc758 multi-word phrases\uc744 \uc704\ud574 \ub3c4\uc785\ub428\n        <ul>\n          <li>finding of Fan et al. (2018): sentiment of aspect\ub294 \ubcf4\ud1b5 individual words\uac00 \uc544\ub2c8\uace0 key phrases\uc5d0 \uc758\ud574 \uacb0\uc815\ub41c\ub2e4\uace0 \uc8fc\uc7a5\ud568<\/li>\n        <\/ul>\n      <\/li>\n      <li>\ud558\uc9c0\ub9cc sentiment\ub294 \uaf2d \ubd99\uc5b4\uc788\ub294 \uc5ec\ub7ec \ub2e8\uc5b4\uc5d0 \uc758\ud574 depict \ub418\ub294 \uac83\uc774 \uc544\ub2d8. \ub530\ub77c\uc11c \uc778\uc811\ud55c \uc5f0\uc18d\ub41c word sequence\ub97c multi-word feature\ub85c \uc4f0\ub294 CNN\uc740 \uc704\uc758 finding\uc744 \ubaa8\ub378\ub9c1\ud558\uae30 \uc704\ud55c \ubc29\ubc95\uc73c\ub85c \uc801\uc808\ud558\uc9c0 \uc54a\uc74c\n        <ul>\n          <li>\uc608\ub97c \ub4e4\uc5b4, <em>\u201cThe staff should be a bit more friendly.\u201d<\/em> \ub77c\ub294 \ubb38\uc7a5\uc774 \uc788\ub2e4\uace0 \ud588\uc744 \ub54c, <em>staff<\/em> \ub77c\ub294 aspect\uc5d0 \ub300\ud574\uc11c CNN\uc740 <em>more friendly<\/em> \ub97c key phrase\ub85c \ubcf4\uace0 \uae0d\uc815\uc801\uc774\ub77c\uace0 \uc608\uce21\uc744 \ud558\ub294\ub370 \uc774\ub294 \uc62c\ubc14\ub974\uc9c0 \ubabb\ud568. \uc55e\uc5d0 <em>should be<\/em> \uac00 \uc788\uae30 \ub54c\ubb38\uc5d0 \ubc18\ub300\uc758 \uc758\ubbf8\ub85c \uc791\uc6a9\ud558\uace0 \ub530\ub77c\uc11c \ubd80\uc815\uc73c\ub85c \ubcf4\ub294 \uac8c \ub9de\uc74c<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc6b0\ub9ac\ub294 \uc704\uc758 \ub450 limitation\uc744 \ud574\uacb0\ud558\uae30 \uc704\ud574, dependency tree\ub97c \uae30\ubc18\uc73c\ub85c \ud55c GCN\uc744 \ub3c4\uc785\ud568. GCN\uc740 syntactically relevant word\ub97c \uc798 \ubaa8\ub378\ub9c1\ud560 \uc218 \uc788\uace0 long-range multi-word relation\uacfc syntactical infromation\uc744 \uc798 \ud65c\uc6a9\ud560 \uc218 \uc788\uc74c. \ub610\ud55c \uc544\uc9c1 \uc774 \ubd84\uc57c\uc5d0\uc11c GCN\uc740 \uc81c\ub300\ub85c \uc0ac\uc6a9\ub41c \uc801\uc774 \uc5c6\uc74c<\/li>\n  <li>\uc6b0\ub9ac\uc758 contribution\uc740 \ub2e4\uc74c\uacfc \uac19\uc74c\n    <ul>\n      <li>Aspect-based sentiment classification\uc744 \uc704\ud574 syntactiacal dependency structure\ub97c \ud65c\uc6a9\ud558\ub294 \ubc29\ubc95\uc744 \uc81c\uc548\ud558\uace0 long-range multi-word dependency issue\ub97c \ud574\uacb0\ud568<\/li>\n      <li>\uc774\ub97c \uc704\ud574 Aspect-specific GCN (ASGCN)\uc774\ub77c\ub294 novel architecture\ub97c \uc81c\uc548\ud568. \uc6b0\ub9ac\uac00 \uc54c\uae30\ub860 \uc774 \ubd84\uc57c\uc5d0\uc11c GCN\uc740 \uc6b0\ub9ac\uac00 \ucc98\uc74c \ud55c \uac70\uc784<\/li>\n      <li>Syntactical information\uacfc long-range word dependency\ub97c leveraging\ud558\ub294 \uac83\uc758 \uc911\uc694\uc131\uacfc \uc6b0\ub9ac \ubaa8\ub378\uc774 \uc774\ub7f0 \ubd80\ubd84\uc5d0 \uac15\uc810\uc744 \ubcf4\uc778\ub2e4\ub294 \uac83\uc744 \uc2e4\ud5d8\uc744 \ud1b5\ud574 \ubcf4\uc5ec\uc90c<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"2-background-graph-convolutional-network\">2. Background: Graph Convolutional Network<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>GCNs\uc740 local information\ub9cc \ubcf4\ub294 \uae30\uc874 CNNs\uc758 \ud655\uc7a5\ud310\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc74c. \ub2e8\uc21c\ud788 \uc778\uc811\ud55c word\uc5d0 \ub300\ud55c convolution\uc774 \uc544\ub2c8\ub77c, structural informaiton\uc744 \uae30\ubc18\uc73c\ub85c \ud55c convoluiton\uc774\ub77c\uace0 \ud560 \uc218 \uc788\uc74c<\/li>\n  <li>Graph convolution\uc744 \uc704\ud574\uc11c\ub294 \uc6b0\uc120 <em>k<\/em> \uac1c(\ub2e8\uc5b4\uc758 \uac1c\uc218\uc5d0 \ud574\ub2f9\ud568)\uc758 node\uc5d0 \ub300\ud55c \uc778\uc811\ud589\ub82c <strong>A<\/strong> (<em>k<\/em> x <em>k<\/em>) \uc774 \ud544\uc694\ud568<\/li>\n  <li>Graph convolution\uc740 node representation\uc744 \ub2e4\uc74c\uacfc \uac19\uc774 \ub9cc\ub4ec:<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/eq1.png\" alt=\"eq1\" \/><\/p>\n\n<ul>\n  <li>W^l \uc740 linear transformation weight\uc774\uace0 b^l \uc740 bias term\uc784. sigma\ub294 non-linear function (ReLU)<\/li>\n  <li>\uac04\ub2e8\ud788 \uae30\uc874 convolution layer\ub97c <strong>A<\/strong> \ub97c \uae30\ubc18\uc73c\ub85c \ud558\ub294 \uac83\uc784. \uae30\uc874 convolution\uc740 <strong>A<\/strong> \uc758 \uc5f0\uacb0 \uad00\uacc4\uac00 \ubb38\uc7a5 \uc0c1\uc5d0\uc11c \uc778\uc811\ud55c \ub2e8\uc5b4\uc5d0\ub9cc \uc5f0\uacb0\uc774 \ub418\uc5c8\ub2e4\uace0 \ubcf4\ub294 \uac83\uc784<\/li>\n  <li>GCNs\uc740 \ubcf4\ud1b5 \uc704\uc640 \uac19\uc740 graph convolutional layer\ub97c \uc313\uc544\uc11c(stacking) \ub9cc\ub4ec. \uc774\ub807\uac8c layer\ub97c \uc313\uc73c\uba74 \uad6c\uc870\uc801\uc73c\ub85c \uc778\uc811\ud55c node\uc5d0 \uc11c\ub85c \uc601\ud5a5\uc744 \uc8fc\uba74\uc11c \uad6c\uc870\uc801 \uc815\ubcf4\uac00 \ubaa8\ub378\ub9c1\ub418\ub294 \uac83\uc784<\/li>\n  <li>\ubb38\uc7a5\uc5d0\uc11c \uc774\ub7f0 \uad6c\uc870\uc801\uc778 \uc815\ubcf4\ub97c \ud65c\uc6a9\ud558\uae30 \uc704\ud574 \ubcf4\ud1b5 dependency tree\ub97c \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h2 id=\"3-aspect-specific-graph-convolutional-network\">3. Aspect-specific Graph Convolutional Network<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/figure2.png\" alt=\"figure2\" \/><\/p>\n\n<h3 id=\"31-embedding-and-bidirectional-lstm\">3.1 Embedding and Bidirectional LSTM<\/h3>\n\n<ul>\n  <li>Input \ubb38\uc7a5\uc5d0 \ub300\ud574 word embedding \ucc98\ub9ac<\/li>\n  <li>\uadf8 \ub2e4\uc74c Bi-LSTM \ud0dc\uc6b0\uace0, output\uc744 concat\ud574\uc11c \ucd5c\uc885 hidden representation\uc744 \ub9cc\ub4ec<\/li>\n<\/ul>\n\n<h3 id=\"32-obtaining-aspect-oriented-features\">3.2 Obtaining Aspect-oriented Features<\/h3>\n\n<ul>\n  <li>\uc77c\ubc18\uc801\uc778 sentiment classification\uacfc \ub2ec\ub9ac, ABSA\ub294 \ud0c0\uac9f aspect\uc758 \uad00\uc810\uc5d0\uc11c sentiment classification\uc744 \ud574\uc57c \ud558\uace0, \ub530\ub77c\uc11c aspect-oriented feature extraction strategy\uac00 \ud544\uc694\ud568<\/li>\n  <li>\uc774 \uc5f0\uad6c\uc5d0\uc11c \uc6b0\ub9ac\ub294 aspect-oriented feature\ub97c \uc5bb\uae30 \uc704\ud574 1) multi-layer graph convolution over the syntactical dependency tree\ub97c \uc801\uc6a9\ud558\uace0, 2) aspect-specific masking layer\ub97c \ub9e8 \uc704\uc5d0 \ubd99\uc600\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"321-graph-convolution-over-dependency-trees\">3.2.1 Graph Convolution over Dependency Trees<\/h4>\n\n<ul>\n  <li>\uc77c\ub2e8 graph convolution\uc5d0 \uc55e\uc11c, \uc8fc\uc5b4\uc9c4 \ubb38\uc7a5\uc5d0 \ub300\ud55c dependency tree\ub97c \ub9cc\ub4e4\uc5b4\uc11c \uc778\uc811\ud589\ub82c <strong>A<\/strong> \ub97c \uc5bb\uc5b4\uc57c \ud568<\/li>\n  <li>\uc6b0\ub9ac\ub294 \uc774\ub807\uac8c \uc5bb\uc740 dependency tree\ub97c \uae30\ubc18\uc73c\ub85c\ud55c ASGCN\uc758 \ub450\uac00\uc9c0 varient\ub97c \uc81c\uc548\ud568\n    <ol>\n      <li>ASGCN-DG\n        <ul>\n          <li>Dependency tree\ub97c un-directional graph\ub85c \ubc14\uafd4\uc11c \uc0ac\uc6a9\ud558\ub294 \ubc29\ubc95<\/li>\n          <li>\ubcf4\ud1b5 GCNs\uc774 \uc774\ub807\uac8c graph\ub97c \ub9cc\ub4e4\uc5b4\uc11c \uc0ac\uc6a9\ud558\uace4 \ud568<\/li>\n        <\/ul>\n      <\/li>\n      <li>ASGCN-DT\n        <ul>\n          <li>\ubc29\ud5a5\uc131\uc744 \uadf8\ub300\ub85c \uc720\uc9c0\ud574\uc11c \uc0ac\uc6a9\ud568<\/li>\n          <li>Parent node\uac00 children node\uc758 \uc601\ud5a5\uc744 \ud3ed\ub113\uac8c \ubc1b\uc74c<\/li>\n        <\/ul>\n      <\/li>\n    <\/ol>\n  <\/li>\n  <li>\ub610\ud55c Kipf and Welling (2017)\uc774 \uc81c\uc548\ud55c self-looping\uc744 \uc801\uc6a9\ud568. self-looping\uc774\ub780 \ubaa8\ub4e0 node\uc5d0 \ub300\ud574 self edge\ub97c \ucd94\uac00\ud558\uc5ec loop\ub97c \ub9cc\ub4dc\ub294 \ubc29\ubc95\uc73c\ub85c, \uc778\uc811\ud589\ub82c <strong>A<\/strong> \uc758 \ub300\uac01 \uc131\ubd84\uc744 \ubaa8\ub450 1\ub85c \ucc98\ub9ac\ud558\ub294 \uae30\ubc95\uc784<\/li>\n  <li>\uc774\ub807\uac8c \uc5bb\uc740 \ucd5c\uc885 <strong>A<\/strong> \ub97c \uae30\ubc18\uc73c\ub85c \uc544\ub798\uc640 \uac19\uc774 graph convolution operation\uacfc normalization\uc744 \ud1b5\ud574 \uac01 node\uc758 representation\uc744 update \ud574\ub098\uac10<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/eq2.png\" alt=\"eq2\" \/><\/p>\n\n<ul>\n  <li>Equation 2\ub294 graph convoution \uc5f0\uc0b0\uc774\uace0, Equation 3\uc740 graph \uc0c1\uc758 \uac01 node\uc758 degree\ub97c \uae30\ubc18\uc73c\ub85c \ub098\ub220\uc8fc\uba70 normalization\uc744 \ud558\ub294 \uac83\uc784<\/li>\n  <li><strong>g^(l-1)_j<\/strong> \ub294 \uc9c1\uc804 GCN layer\ub97c \ud0c0\uace0 \ub098\uc654\uc744 \ub54c\uc758 j \ubc88\uc9f8 \ub2e8\uc5b4\uc758 representation\uc774\uace0 <strong>h^l_j<\/strong> \ub294 \ud604\uc7ac GCN layer\uac00 \ub9cc\ub4dc\ub294 \uac83\uc784<\/li>\n  <li>d_i = sum^n_{j=1} (<strong>A<\/strong>_ij) \ub294 i \ubc88\uc9f8 \ud1a0\ud070\uc758 degree\uc784<\/li>\n  <li><strong>W<\/strong> \uc640 <strong>b<\/strong> \ub294 trainable parameter<\/li>\n<\/ul>\n\n<h5 id=\"position-aware-transformation-\uc774\ud574-\uc798-\uc548\ub428\">Position-aware Transformation (\uc774\ud574 \uc798 \uc548\ub428)<\/h5>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/eq456.png\" alt=\"eq456\" \/><\/p>\n\n<ul>\n  <li>\uc55e\uc5d0\uc11c \uc5bb\uc740 graph convolution \uc5f0\uc0b0\uc758 \uacb0\uacfc\uc778 <strong>h<\/strong> \uc5d0 \ub300\ud574 position-aware transformation\uc744 \ud568. Equation 4\uc5d0\uc11c \ud568\uc218 F\uac00 \uc774\ub97c \uc758\ubbf8\ud568<\/li>\n  <li>F\ub294 aspect \uc704\uce58\uc5d0 \ub300\ud55c \uac01 \ub2e8\uc5b4\uc758 \uc0c1\ub300 \uc704\uce58 \uc815\ubcf4\ub97c \uae30\ubc18\uc73c\ub85c transformation\ud558\ub294 function\uc778\ub370, aspect\uc5d0\uc11c \uba40\uc5b4\uc9c8\uc218\ub85d \uc601\ud5a5\ub825\uc774 \uac10\uc18c\ud558\ub3c4\ub85d \ub418\uc5b4\uc788\uc74c (Equation 5)<\/li>\n  <li>\uc774\ub54c aspect \uc790\uccb4\ub294 0\uc744 \uacf1\ud558\ubbc0\ub85c\uc11c aspect \uc790\uccb4\ub294 representation\uc5d0 \uad00\uc5ec\ud558\uc9c0 \ubabb\ud558\uace0 contextual words\uc5d0 \ub300\ud574\uc11c\ub9cc dependency structure\ub97c \uae30\ubc18\uc73c\ub85c aspect representation\uc744 \ud615\uc131\ud558\ub3c4\ub85d \ud568<\/li>\n<\/ul>\n\n<h4 id=\"322-aspect-specific-masking\">3.2.2 Aspect-specific Masking<\/h4>\n\n<ul>\n  <li>GCN\uc744 \ud1b5\ud574 \uc5bb\uac8c \ub41c <strong>h^L<\/strong>\uc5d0 \ub300\ud574\uc11c aspect-specific masking \ucc98\ub9ac\ub97c \ud568<\/li>\n  <li>\uac04\ub2e8\ud788 aspect \ub2e8\uc5b4\ub97c \uc81c\uc678\ud55c \ubaa8\ub4e0 \ub2e8\uc5b4\uc5d0 zero-masking\uc744 \ud558\ub294 \uac83\uc784<\/li>\n  <li>\uc774\ub97c \ud1b5\ud574 <strong>H^L_mask<\/strong> \ub97c \uc5bb\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"33-aspect-aware-attention\">3.3 Aspect-aware Attention<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/eq89.png\" alt=\"eq89\" \/><\/p>\n\n<ul>\n  <li>\uc55e\uc5d0\uc11c GCN\uc744 \ud1b5\ud574 \uad6c\ud55c aspect\uc758 representation <strong>h^L<\/strong> \uc640 Bi-LSTM\uc758 output <strong>h^c<\/strong> \uc758 attention\uc744 \ud1b5\ud574\uc11c \ucd5c\uc885 representation\uc744 \ub9cc\ub4ec<\/li>\n  <li><strong>h^L<\/strong> \uc740 \uc55e\uc5d0\uc11c zero-masking\uc744 \ud588\uae30 \ub54c\ubb38\uc5d0 aspect \uc704\uce58\uc758 \ub2e8\uc5b4\ub4e4\ub9cc \uc0b4\uc544\uc788\ub294 \uc0c1\ud0dc\uc784<\/li>\n  <li>\uc774 aspect \ub2e8\uc5b4\ub4e4\uacfc \ubb38\uc7a5\uc758 \uac01 \ub2e8\uc5b4(h^c_t) \uac04\uc758 dot-product\uc758 \ud569\ub4e4\uc5d0 \ub300\ud574\uc11c attention\uc744 \ucde8\ud568<\/li>\n  <li>\uadf8\ub807\uac8c \uad6c\ud55c alpha\uc5d0 \ub300\ud574\uc11c <strong>h^c<\/strong> \ub97c \uac00\uc911\ud569\ud558\uc5ec \ucd5c\uc885 representation <strong>r<\/strong> \uc744 \uad6c\ud568<\/li>\n<\/ul>\n\n<h3 id=\"34-sentiment-classification\">3.4 Sentiment Classification<\/h3>\n\n<ul>\n  <li><strong>r<\/strong> \uc5d0 \ub300\ud574 linear \ud558\ub098 \ubd99\uc5ec\uc11c \ucd5c\uc885 classification<\/li>\n<\/ul>\n\n<h3 id=\"35-training\">3.5 Training<\/h3>\n\n<ul>\n  <li>standard gradient descent<\/li>\n  <li>Cross-entropy loss\uc640 L2-regularization \uc0ac\uc6a9<\/li>\n<\/ul>\n\n<h2 id=\"4-experiments\">4. Experiments<\/h2>\n\n<h3 id=\"41-datasets-and-experimental-settings\">4.1 Datasets and Experimental Settings<\/h3>\n\n<ul>\n  <li>5\uac1c\uc758 dataset\uc744 \uc0ac\uc6a9\n    <ul>\n      <li>TWITTER (Dong et al., 2014)<\/li>\n      <li>SemEval 2014(LAP14, REST14), 2015(REST15), 2016(REST16) dataset<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774\uc804 \uc5f0\uad6c (Tang et al., 2016b) \ucc98\ub7fc conflicting polarity \uc0d8\ud50c\uacfc aspect\uac00 \uc5c6\ub294 sentence\ub97c \uc81c\uac70\ud558\uc600\uc74c<\/li>\n  <li>\uac01 \ub370\uc774\ud130\uc14b \ubcc4 label \ubd84\ud3ec:<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/table1.png\" alt=\"table1\" \/><\/p>\n\n<ul>\n  <li>GloVe, Adam \uc0ac\uc6a9\ud568<\/li>\n  <li>learning rate = 1e-3<\/li>\n  <li>coefficient of L2 = 10^5<\/li>\n  <li>batch size = 32<\/li>\n  <li># of GCN layer = 2<\/li>\n  <li>\uc2e4\ud5d8 \uacb0\uacfc(Acc, F1)\ub294 random init\ud558\uc5ec 3\ubc88 \ub3cc\ub9b0 \uacb0\uacfc\uc758 \ud3c9\uade0\uc784<\/li>\n<\/ul>\n\n<h3 id=\"42-models-for-comparison\">4.2 Models for Comparison<\/h3>\n\n<ul>\n  <li>SVM (Kiritchenko et al., 2014)<\/li>\n  <li>LSTM (Tang et al., 2016a)<\/li>\n  <li>MemNet (Tang et al., 2016b)<\/li>\n  <li>AOA (Huang et al, 2018)<\/li>\n  <li>IAN (Ma et al., 2017)<\/li>\n  <li>TNet-LF (Li et al., 2018)<\/li>\n  <li>ASCNN: \uc790\uccb4\uc801\uc73c\ub85c ASGCN \ubaa8\ub378\uc5d0\uc11c GCN\ub300\uc2e0 CNN\uc744 \uc0ac\uc6a9\ud558\uc5ec \ub9cc\ub4ec<\/li>\n<\/ul>\n\n<h3 id=\"43-results\">4.3 Results<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/table2.png\" alt=\"table2\" \/><\/p>\n\n<ul>\n  <li>\uc81c\uc548\ud558\ub294 ASGCN\uc774 SOTA \ud639\uc740 baseline(TNet-LF)\uc5d0 \ub300\ud574 comparable\ud55c \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>ASGCN-DG\uac00 DT\ubcf4\ub2e4 TWITTER, LAP14, REST15, REST16\uc5d0 \ub300\ud574\uc11c\ub294 \uaf64 \ucc28\uc774\ub97c \ubc8c\ub9ac\uba70 outperform\ud588\uc74c<\/li>\n  <li>ASCNN\uc5d0 \ube44\ud574\uc11c ASGCN\uc774 REST14\ub97c \uc81c\uc678\ud55c \ubaa8\ub4e0 dataset\uc5d0\uc11c \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc600\ub294\ub370, \uc774\ub97c \ud1b5\ud574 long-range word dependency\ub97c ASGCN\uc774 \ub354 \uc798 \uc7a1\uc544\ub0b4\uace0 \uc788\ub2e4\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uc74c<\/li>\n  <li>\uc6b0\ub9ac\ub294 REST14\uc640 TWITTER\uc758 \ubb38\uc7a5\ub4e4\uc774 not so sensitive to syntactic infromation\uc774\uba70, less grammatical\ud558\ub2e4\uace0 \uc758\uc2ec \uc911. \uadf8\ub798\uc11c \uc6b0\ub9ac \ubaa8\ub378\uc774 baseline\ubcf4\ub2e4 \uc870\uae08 \ubd80\uc871\ud55c \uc131\ub2a5\uc744 \uc5bb\uc9c0 \uc54a\uc558\uc744\uae4c \uc2f6\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"44-ablation-study\">4.4 Ablation Study<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/table3.png\" alt=\"table3\" \/><\/p>\n\n<ul>\n  <li>w\/o pos: removal of position weights<\/li>\n  <li>w\/o mask: rid of aspect-specific masking<\/li>\n  <li>w\/o GCN: perserving position weights and aspect-specific masking, but without using GCN layers<\/li>\n<\/ul>\n\n<h3 id=\"45-case-study\">4.5 Case Study<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/table4.png\" alt=\"table4\" \/><\/p>\n\n<ul>\n  <li>Attention visualization\uc744 \ud1b5\ud574 baseline \ubaa8\ub378\ub4e4\uacfc \ube44\uad50\ub97c \ud574\ubcf4\uc558\uc74c<\/li>\n  <li>First sample: <em>\u201cgreat food but the service was dreadful!\u201d<\/em>\n    <ul>\n      <li>Aspect\uac00 <em>food<\/em>, <em>service<\/em> \ub85c \ud55c \ubb38\uc7a5\uc5d0 \ub450 \uac1c\uc758 aspect\uac00 \ub098\ud0c0\ub09c \ucf00\uc774\uc2a4\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>Second sample: <em>\u201cThe staff should be a bit more friendly.\u201d<\/em>\n    <ul>\n      <li>Aspect\ub294 <em>staff<\/em> \uc774\uace0, <em>should<\/em> \uc5d0 \uc758\ud574 \ubb38\uc7a5\uc758 \uc758\ubbf8\uac00 \uc5ed\uc804\ub418\ub294 \ucf00\uc774\uc2a4\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>Third sample: <em>\u201cDid not enjoy the new Window 8 and touchscreen function.\u201d<\/em>\n    <ul>\n      <li>Aspect\ub294 <em>Window 8<\/em> \uc774\uace0, \ubd80\uc815\uc5b4(negation) <em>not<\/em> \uc774 \ub4f1\uc7a5\ud558\ub294 \ucf00\uc774\uc2a4\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>ASCNN\uc740 local\uc5d0 \uc9d1\uc911\ud558\uae30\uc5d0 2\ubc88\uc9f8 \ucf00\uc774\uc2a4\uc5d0\uc11c <em>staff<\/em>  \uadfc\ucc98\uc5d0 \uc788\ub294 <em>should<\/em> \uc5d0 \uc798 \uc9d1\uc911\ud558\uc5ec \uc815\ub2f5\uc744 \ub9de\ucd94\uc5c8\uc9c0\ub9cc, longer-rage word dependency\ub97c \uc798 \ucc98\ub9ac\ud558\uc9c0 \ubabb\ud574\uc11c 3\ubc88\uc9f8 \ucf00\uc774\uc2a4\uc5d0\uc11c\ub294 \uc815\ub2f5\uc744 \ub9de\ucd94\uc9c0 \ubabb\ud568<\/li>\n  <li>ASGCN\uc740 \uc704\uc758 \ubaa8\ub4e0 \ucf00\uc774\uc2a4\ub97c \ub2e4 \uc798 \ub9de\ucda4<\/li>\n<\/ul>\n\n<h2 id=\"5-discussion\">5. Discussion<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-12-08-aspect-specific-gcn\/figure34.png\" alt=\"figure34\" \/><\/p>\n\n<h4 id=\"51-investigation-on-the-impact-of-gcn-layers\">5.1 Investigation on the Impact of GCN Layers<\/h4>\n\n<ul>\n  <li>GCN layer\uc758 \uac1c\uc218\ub294 2\uac1c\uc77c \ub54c \uac00\uc7a5 \uc88b\uace0 \ub298\uc5b4\ub0a0\uc218\ub85d \uc131\ub2a5\uc774 \ub5a8\uc5b4\uc9d0<\/li>\n<\/ul>\n\n<h4 id=\"52-investigation-on-the-effect-of-multiple-aspects\">5.2 Investigation on the Effect of Multiple Aspects<\/h4>\n\n<ul>\n  <li>\ud55c \ubb38\uc7a5\uc5d0 \uc5ec\ub7ec \uac1c\uc758 aspect\uac00 \ub098\ud0c0\ub0a0 \uc218 \uc788\uc74c<\/li>\n  <li>Aspect\uc758 \uac1c\uc218\uc5d0 \ub530\ub77c dataset\uc744 \ub098\ub220\uc11c \ud559\uc2b5\ud558\uace0 \ud3c9\uac00\ud574\ubd04<\/li>\n  <li>1\uac1c\ubd80\ud130 7\uac1c\uae4c\uc9c0 \uc2e4\ud5d8\uc744 \ud574\ubcf4\uc558\uace0, 8\uac1c \uc774\uc0c1\uc740 \uc0d8\ud50c \uc218\uac00 \ub108\ubb34 \uc801\uc5b4\uc11c \uc548\ud588\uc74c<\/li>\n  <li>\ud55c \ubb38\uc7a5\uc5d0\uc11c aspect\uc758 \uc218\uac00 3\uac1c \uc774\uc0c1\ubd80\ud130\ub294 \uc131\ub2a5 \ubcc0\ub3d9\ud3ed\uc774 \ucee4\uc9c0\ub294 \uac78 \ubcfc \uc218 \uc788\ub294\ub370, low robustness in capturing multiple-aspect correlation\uc744 \ubcf4\uc5ec\uc900\ub2e4\uace0 \ud560 \uc218 \uc788\uc74c<\/li>\n  <li>\uc774\ub7f0 \ubd80\ubd84(multi-aspect dependency)\uc744 future work\uc5d0\uc11c \uac1c\uc120\ud560 \ud544\uc694\uac00 \uc788\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"6-related-work\">6. Related Work<\/h3>\n\n<ul>\n  <li>\uc0dd\ub7b5<\/li>\n<\/ul>\n\n<h3 id=\"7-conclusions-and-future-work\">7. Conclusions and Future Work<\/h3>\n\n<ul>\n  <li>\uc6b0\ub9ac\ub294 ABSA\ub97c \uc704\ud55c \ud604\uc7ac \ub098\uc628 \ubaa8\ub378\ub4e4\uc774 \ub2f9\uba74\ud55c challenges\ub97c \uc7ac\uac80\ud1a0\ud558\uace0 GCN\uc774 \uc774\ub7f0 \ubd80\ubd84\uc744 tackling\ud558\uae30 \uc801\ud569\ud558\ub2e4\ub294 \uac83\uc744 \ubcf4\uc600\uc74c<\/li>\n  <li>GCN\uc744 ABSA \ubb38\uc81c\uc5d0 \ub9de\ucdb0\uc11c \uc218\uc815\ud558\uc5ec novel network\uc778 ASGCN\uc744 \uc81c\uc548\ud568<\/li>\n  <li>\uc2e4\ud5d8\uc744 \ud1b5\ud574 GCN\uc774 syntactiacl infromation\uacfc long-rage word dependency\ub97c leveraging\ud558\uba74\uc11c \uc804\ubc18\uc801\uc778 \uc131\ub2a5 \uc774\ub4dd\uc744 \uac00\uc838\uc654\ub2e4\ub294 \uac78 \ubcf4\uc784<\/li>\n  <li>\uc544\uc9c1 \uc774\ubc88 \uc5f0\uad6c\uc5d0\uc11c\ub294 dependency tree\uc758 edge \uc815\ubcf4\ub97c \ud65c\uc6a9\ud558\uc9c0 \uc54a\uc558\uc74c<\/li>\n  <li>\uc6b0\ub9ac\ub294 \uc774\ub7f0 edge \uc815\ubcf4\ub97c \ud65c\uc6a9\ud55c graph neural network\ub97c \ub514\uc790\uc778\ud574\ubcfc \uacc4\ud68d\uc784<\/li>\n  <li>\ub610\ud55c domain knowledge\ub97c \uc0ac\uc6a9\ud558\ub294 \uac83\ub3c4 \uace0\ub824\ud558\uace0 \uc788\uc74c<\/li>\n  <li>\ub9c8\uc9c0\ub9c9\uc73c\ub85c \uc9c0\uae08\uc740 multiple aspect\uc5ec\ub3c4 \ud558\ub098\uc758 aspect\uc5d0 \ub300\ud574\uc11c\ub9cc \uc608\uce21\uc744 \ud558\uace0 \uc774\ub97c \uc5ec\ub7ec\ubc88 \ud558\ub294 \uac74\ub370, \ucd94\ud6c4\uc5d0 aspect\ub4e4 \uac04\uc758 dependency\ub97c \ud3ec\ucc29\ud558\uc5ec \ub3d9\uc2dc\uc5d0 multiple aspect\uc758 sentiment\ub97c \uc608\uce21\ud558\ub294 ASGCN\uc758 \ud655\uc7a5\ud310\uc744 \uace0\ub824 \uc911\uc784<\/li>\n<\/ul>\n","pubDate":"Sun, 08 Dec 2019 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/aspect-specific-gcn\/","guid":"https:\/\/roomylee.github.io\/aspect-specific-gcn\/","category":["aspect-based-sentiment-analysis","graph-convolutional-network","aspect-specific-graph-convolutional-network","asgcn","blog"]},{"title":"MoEL: Mixture of Empathetic Listeners (EMNLP 2019)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/www.aclweb.org\/anthology\/D19-1012\/\">https:\/\/www.aclweb.org\/anthology\/D19-1012\/<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Zhaojiang Lin, Andrea Madotto, Jamin Shin, Peng Xu, Pascale Fung<\/li>\n      <li>The Hong Kong University of Science and Technology<\/li>\n    <\/ul>\n  <\/li>\n  <li>published at\n    <ul>\n      <li>EMNLP 2019<\/li>\n    <\/ul>\n  <\/li>\n  <li>Key Points and My Comments\n    <ul>\n      <li>\uc774\uc804 \uc5f0\uad6c\ub294 \ud604\uc7ac \ubc1c\ud654\uc758 \uac10\uc815 \uc0c1\ud0dc\ub97c \uc608\uce21\ud558\ub294 \uac83\uacfc decoding \ud558\ub294 \ubb38\uc81c\ub97c \uba40\ud2f0 \ud0dc\uc2a4\ud06c\ub85c \ud478\ub294 \ubc29\ud5a5\uc774 \ud558\ub098\uac00 \uc788\uace0 (EmpatheticDialogue), \uace0\uc815\ub41c \uac10\uc815\uc744 \uac16\ub294 \ubc1c\ud654\ub97c \uc0dd\uc131\ud558\ub294 \ubc29\ud5a5\uc774 \uc788\uc74c (PersonaChat)<\/li>\n      <li>\uc774 \ub450 \uc5b4\ud504\ub85c\uce58\ub294 \uba87 \uac00\uc9c0 \uc911\uc694\ud55c \ud3ec\uc778\ud2b8\ub97c \ub193\uce58\uace0 \uc788\ub294\ub370, 1) \uc5b4\ub5a4 \uac10\uc815\uc73c\ub85c \ub300\ub2f5\ud574\uc57c \ud558\ub294\uc9c0 implicit\ud558\uac8c \ud559\uc2b5\ud574\uc11c interpretability\uc758 \ud55c\uacc4\uc640 generic response \ubb38\uc81c\uac00 \uc788\uc74c. \ub610\ud55c 2) \uc870\uac74\ubd80\ub85c generation\uc744 \ud560 \ub54c \uc6b0\ub9ac\ub294 \ud2b9\uc815 \uac10\uc815\uc744 \uc778\ud48b\uc73c\ub85c \uc8fc\ub294\ub370, \uc774\ub54c \uc774\ub7f0 \uac10\uc815\uc774 \uc801\uc808\ud55c \uac74\uc9c0 \uc6b0\ub9ac\ub3c4 \uc0ac\uc2e4 \ubaa8\ub978\ub2e4\ub294 \ubb38\uc81c\uc810\uc774 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>\uc774\uc804 empathetic dialogue system \uc5f0\uad6c\ub4e4\uc740 \uc8fc\ub85c \uc8fc\uc5b4\uc9c4 \ud2b9\uc815 emotion\uc5d0 \ub300\ud55c generation\uc744 \ud558\ub294 \uac8c \uba54\uc778\uc774\uc5c8\uc74c<\/li>\n  <li>\uadf8\ub7ec\ub098 empathy\ub294 generation\ub3c4 \ud544\uc694\ud558\uc9c0\ub9cc understanding \ucabd\ub3c4 \uc911\uc694\ud568. \uc774\ub7f0 \uc774\ud574\ub97c \ubc14\ud0d5\uc73c\ub85c \uc801\uc808\ud55c \uac10\uc815\uc758 \ubc1c\ud654\ub97c \uc0dd\uc131\ud574\uc57c \ud568<\/li>\n  <li>\uadf8\ub798\uc11c \uc6b0\ub9ac\ub294 empathy\ub97c \ubaa8\ub378\ub9c1\ud558\ub294 e2e novel approach\uc778 MoEL\uc744 \uc81c\uc548\ud568<\/li>\n  <li>\uc6b0\ub9ac \ubaa8\ub378\uc740 \uba3c\uc800 \uc720\uc800\uc758 \uac10\uc815\uc744 \ud3ec\ucc29\ud558\uace0 \uac10\uc815\uc5d0 \ub300\ud55c \ubd84\ud3ec\ub97c \ub0b4\ub193\uc74c. \uadf8\ub7fc \uc774\uac78 \uae30\ubc18\uc73c\ub85c MoEL\uc740 \uc5ec\ub7ec \uac10\uc815 \ubcc4 Listener\uc758 output state\ub97c soft\ud558\uac8c \uc870\ud569\ud574\uc11c \uc801\uc808\ud55c empathetic response\ub97c \ub0b4\ub193\uc74c. \uac01 Listener\ub294 \ub2f4\ub2f9\ud558\uace0 \uc788\ub294 \uac10\uc815\uc5d0 \ud2b9\ud654\ub418\uc5b4 \uc788\uc74c<\/li>\n  <li>Empathetic dialogues (Rashkin et al., 2018)\uc758 human evaluation\uc744 \ud588\uc744 \ub54c SOTA<\/li>\n  <li>\uac01\uac01 generated response\ub97c \ubcf4\uba74 interpretable\ud568<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>\ub300\ud654\ub97c \uc704\ud55c neural approach\ub4e4\uc774 \uaf64 \uc798 \ub428. \ub2e4\ub9cc, MLE\ub97c \uae30\ubc18\uc73c\ub85c \ud558\uae30\uc5d0 \uc77c\ubc18\uc801\uc774\uace0 \ubc18\ubcf5\uc801\uc778 \ub9d0\ub4e4\uc774 \ub098\uc634<\/li>\n  <li>Commonsense\uc758 \uc774\ud574\uc640 \uc77c\uad00\ub41c \ud398\ub974\uc18c\ub098 \ubaa8\ub378\ub9c1\uc740 chatbot engaging\uc5d0 \ub3c4\uc6c0\uc774 \ub428<\/li>\n  <li>Emotion \uc774\ud574\uc640 empathy\ub3c4 \uc0c1\ub2f9\ud788 \uc911\uc694\ud558\uc9c0\ub9cc \uc9c0\uae08\uae4c\uc9c0\ub294 \ud06c\uac8c \uc8fc\ubaa9 \ubc1b\uc9c0 \ubabb\ud588\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-11-28-moel-empathetic-listeners\/table1.png\" alt=\"table1\" \/><\/p>\n\n<ul>\n  <li>Table1\uc740 <em>empathetic-dialogues<\/em> dataset (Rashkin et al., 2018)\uc758 \uc0d8\ud50c\uc784<\/li>\n  <li>\uc704 \uc608\uc2dc\ucc98\ub7fc \uc0ac\ub78c\uc740 \uc0c1\ud669\uacfc \uc0c1\ub300\ubc29\uc758 \uac10\uc815\uc5d0 \ub530\ub77c \uc801\uc808\ud558\uac8c \ub300\ub2f5\uc744 \ud558\uac8c \ub428<\/li>\n  <li>\uc774\ub807\uac8c empathy\uacfc emotional understanding\uc774 \uc911\uc694\ud558\uae34 \ud558\uc9c0\ub9cc, \uc774\ub807\uac8c \uc801\uc808\ud55c emotion\uc744 \uc778\uc2dd\ud558\uace0 \ub300\ub2f5\ud558\ub294 dialogue agent\ub294 \ud559\uc2b5\uc2dc\ud0a4\uae30\uac00 \ub9e4\uc6b0 \uc5b4\ub824\uc6c0<\/li>\n  <li>\uc774 \ubb38\uc81c\ub97c \ud480\uae30 \uc704\ud574 \uc9c0\uae08\uae4c\uc9c0 \ud06c\uac8c 2\uac00\uc9c0 \ubc29\ud5a5\uc758 \uc5f0\uad6c\uac00 \uc788\uc5c8\uc74c\n    <ol>\n      <li>Multi-task approach\n        <ul>\n          <li>\ud604\uc7ac \uc720\uc800 \ubc1c\ud654\uc758 emotional state\ub97c \uc608\uce21\ud558\ub294 \ubb38\uc81c\ub791 \uadf8\uc5d0 \uc801\uc808\ud55c response generation \ubb38\uc81c\ub97c multi-task\ub85c \ub3d9\uc2dc\uc5d0 \ud559\uc2b5<\/li>\n        <\/ul>\n      <\/li>\n      <li>Conditional generation with certain fixed emotion\n        <ul>\n          <li>\ud2b9\uc815 emotion\uc5d0 \ub300\ud55c generation<\/li>\n        <\/ul>\n      <\/li>\n    <\/ol>\n  <\/li>\n  <li>\uc704\uc758 \ub450 \ubc29\ubc95\uc740 empathetic and emotional response\ub97c \uc798 \uc0dd\uc131\ud574\ub0c4. \ud558\uc9c0\ub9cc \uba87\uac00\uc9c0 \uc911\uc694 \ud3ec\uc778\ud2b8\ub97c \ub193\uce68\n    <ol>\n      <li>\ubaa8\ub378\uc774 emotion\uc744 \uc774\ud574\ud574\uc11c implicit\ud558\uac8c \uc801\uc808\ud55c \ub2f5\ubcc0\uc5d0 \ub300\ud574 \ud559\uc2b5\ud55c\ub2e4\ub294 \uac00\uc815\uc774 \ub4e4\uc5b4\uac10. \ud558\uc9c0\ub9cc \ucd94\uac00 inductive bias\uac00 \uc5c6\ub2e4\uba74 single decoder\ub294 interpretable\ud558\uc9c0 \uc54a\uace0 generic\ud55c \ub2f5\ubcc0\ub9cc \ub0b4\ub193\uac8c \ub428<\/li>\n      <li>\uc0dd\uc131\ud560 \ub54c \ud2b9\uc815 emotion\uc774 condition\uc774 \uc8fc\uc5b4\uc9c4\ub2e4\ub294 \uac00\uc815\uc774 \ub4e4\uc5b4\uac10. \ud558\uc9c0\ub9cc \uc6b0\ub9ac\ub294 \uc885\uc885 empathetic response\ub97c \ub9cc\ub4e4\uae30 \uc704\ud574 \uc5b4\ub5a4 emotion\uc774 \uc801\uc808\ud55c\uc9c0 \ubaa8\ub984. \ub530\ub77c\uc11c \ubd88\uba85\ud655\ud55c emotion\uc744 condition\uc73c\ub85c generation\ud55c\ub2e4\ub294 \uac83\uc740 \uc774\uc0c1\ud568<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-11-28-moel-empathetic-listeners\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>\uc774\ub7f0 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574\uc11c, \uc6b0\ub9ac\ub294 Mixture of Empathetic listeners (MoEL)\ub97c \uc81c\uc548\ud568\n    <ul>\n      <li>Rashkin et al. (2018)\uacfc \uc720\uc0ac\ud558\uac8c dialogue context\ub97c \uc778\ucf54\ub529\ud574\uc11c n\uac1c\uc758 emotion\uc5d0 \ub300\ud55c emotional state\ub97c \ub9cc\ub4ec<\/li>\n      <li>\ub514\ucf54\ub529\uc5d0\uc11c \ucc28\uc774\uc810\uc774 \uc0dd\uae30\ub294\ub370, \ud558\ub098\uc758 \ub514\ucf54\ub354\ub97c \uc4f0\ub294 \uac8c \uc544\ub2c8\uace0 n\uac1c\uc758 emotion \uac01\uac01\uc744 \uc704\ud55c \ub514\ucf54\ub354\ub97c n\uac1c \ub460. \uc774\uac78 \uc6b0\ub9ac\ub294 \uc774 \ub514\ucf54\ub354\ub4e4\uc744 <em>listeners<\/em> \ub77c\uace0 \ubd80\ub974\uae30\ub85c \ud568<\/li>\n      <li>\uc774 listeners\ub294 Meta-listener\uc640 \ud568\uaed8 \ud559\uc2b5\ub428. Meta-listener\ub294 emotion classification\uc758 \uacb0\uacfc distribution\uc744 \uae30\ubc18\uc73c\ub85c \uac01 listener\uc758 output\uc744 softly combine\ud568<\/li>\n      <li>\uc774\ub97c \ud1b5\ud574\uc11c \ubaa8\ub378\uc774 context of emotion\uc5d0 \ub300\ud55c \uc774\ud574\ub97c \ubc14\ud0d5\uc73c\ub85c \uc5b4\ub5bb\uac8c \uc801\uc808\ud55c \ub9ac\uc561\uc158\uc744 \uace0\ub97c\uc9c0 explicit\ud558\uac8c \ud559\uc2b5\ud560 \uc218 \uc788\uc74c (interpretability\uc5d0 \ub300\ud55c \uc598\uae30\uc778\ub4ef)<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc131\ub2a5 \ud3c9\uac00\ub97c \uc704\ud574 competitive baseline\ub4e4\uacfc \ube44\uad50 \ud14c\uc2a4\ud2b8\ub97c \ud558\uc600\uace0, human evaluation\ub3c4 \ud568<\/li>\n  <li>SOTA\ub97c \ucc0d\uc5c8\uc73c\uba70, \ubd84\uc11d\uc744 \ud1b5\ud574\uc11c MoEL\uc774 \ud6a8\uacfc\uc801\uc73c\ub85c \uc62c\ubc14\ub978 listener\uc5d0 \uc9d1\uc911\ud558\uace0 \uc788\ub2e4\ub294 \uac78 \ubcf4\uc600\uc73c\uba70, \ub610\ud55c \ubaa8\ub378\uc774 \uac01 emotion\uc5d0 \ub300\ud574 \uc5b4\ub5bb\uac8c \ub300\uc751\ud558\ub294 \uac8c \uc54c\ub9de\ub294 \uac83\uc778\uc9c0 \ud559\uc2b5\ud568. \uc774\ub7f0 \uc774\uc720\ub85c more interpretable generative process\ub77c\uace0 \uc0dd\uac01\ud568<\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<h2 id=\"3-mixture-of-empathetic-listeners\">3. Mixture of Empathetic Listeners<\/h2>\n\n<ul>\n  <li>dialog context C={U1, S1, U2, S2, \u2026, Ut}<\/li>\n  <li>speaker emotion state at each utterance Emo = {e1, e2, \u2026, et}<\/li>\n  <li>\ubaa8\ub378\uc740 speaker\uc758 \ub9c8\uc9c0\ub9c9 emotional state e_t\ub97c \uc608\uce21\ud558\uace0 \uc774\uc5d0 \uc801\uc808\ud55c empathetic response S_t\ub97c \uc0dd\uc131\ud574\ub0b4\ub294 \uac83\uc774 \ubaa9\ud45c\uc784<\/li>\n  <li>MoEL\uc740 Figure 1\uacfc \uac19\uc774 \ud06c\uac8c 3\uac00\uc9c0 \ucef4\ud3ec\ub10c\ud2b8\ub85c \uad6c\uc131\ub428\n    <ol>\n      <li>Emotion tracker: context C\ub97c \uc778\ucf54\ub529\ud558\uace0 user\uc758 emotion\uc5d0 \ub300\ud55c \uc608\uce21(distribution)\uc744 \ud568<\/li>\n      <li>Emotion-aware listeners: \ubaa8\ub4e0 listenser\ub4e4\uc740 \ub3c5\ub9bd\uc801\uc73c\ub85c \uc6c0\uc9c1\uc774\uba70, distribution\uc744 \uae30\ubc18\uc73c\ub85c \uac01\uac01\uc758 representation\uc744 \ubf51\uc74c<\/li>\n      <li>Meta listener: \ubaa8\ub4e0 listener\uc758 representation\uc744 weighted-sum\ud558\uace0, \uc774\ub97c \uae30\ubc18\uc73c\ub85c \ucd5c\uc885 response\ub97c \uc0dd\uc131\ud574\ub0c4<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<h3 id=\"31-embedding\">3.1. Embedding<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-11-28-moel-empathetic-listeners\/figure2.png\" alt=\"figure2\" \/><\/p>\n\n<ul>\n  <li>Context embedding E^C\ub294 word embedding E^W, positional embedding E^P, dialogue state embedding E^D\uc758 \ud569\uc73c\ub85c \uc774\ub8e8\uc5b4\uc9c4\ub2e4. dialogue state embedding\uc740 \ubc1c\ud654\uc790\uc5d0 \ub300\ud55c turn embedding\uc778 \uac83\uc73c\ub85c \ubcf4\uc784 (Wolf et al., 2019)\n    <ul>\n      <li>E^C(C) = E^W(C) + E^P(C) + E^D(C)<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"32-emtion-tracker\">3.2. Emtion Tracker<\/h3>\n\n<ul>\n  <li>\uae30\ubcf8\uc801\uc73c\ub85c Transformer encoder(TRS_Enc)\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>\ubaa8\ub4e0 context\uc758 \ubc1c\ud654\ub4e4\uc740 \ubaa8\ub450 \uc21c\uc11c\ub300\ub85c concat\ud568<\/li>\n  <li>BERT\uc5d0\uc11c\uc758 CLS \ud1a0\ud070 \ub300\uc2e0\uc5d0 QRY \ub77c\ub294 \ud1a0\ud070\uc744 \ub9cc\ub4e4\uc5b4\uc11c \ubb38\uc7a5\uc758 \ub9e8 \uc55e\uc5d0 \ubd99\uc784<\/li>\n  <li>\ucd5c\uc885\uc801\uc778 Transformer encoder\ub85c\ubd80\ud130 \uc5bb\ub294 context representation\uc740 \ub2e4\uc74c\uacfc \uac19\uc74c:\n    <ul>\n      <li>H = TRS_Enc(E^C([QRY; C])), where [;] denotes concatenation<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ucd5c\uc885 QRY\uc758 representation\uc740 H\uc758 \uccab\ubc88\uc9f8 \ud1a0\ud070\uc758 representation\uc778 H_0\uc784<\/li>\n  <li>\uc774\uac78 \uac00\uc9c0\uace0 emotion distibution\uc744 \ub9cc\ub4ec (\ub9cc\ub4dc\ub294 \ubc29\ubc95\uc740 \ub2e4\uc74c \uc139\uc158\uc5d0 \ub098\uc634)<\/li>\n<\/ul>\n\n<h3 id=\"33-emotion-aware-listeners\">3.3. Emotion Aware Listeners<\/h3>\n\n<ul>\n  <li>Emotion-aware listeners\ub294 \ud06c\uac8c \ud558\ub098\uc758 shared listener\uc640 n\uac1c\uc758 \ub3c5\ub9bd\uc801\uc778 emotion \ubcc4 listener\ub85c \uc774\ub8e8\uc5b4\uc838\uc788\uc74c\n    <ul>\n      <li>shared listener: \ubaa8\ub4e0 \uac10\uc815\uc5d0 \ub300\ud55c shared information\uc744 \ud559\uc2b5\ud568<\/li>\n      <li>n listeners: parameterized Transformer decoders(TRS_Dec). \ud2b9\uc815 \uac10\uc815\uc5d0 \ub300\ud55c \uc801\uc808\ud55c react\ub97c \ud559\uc2b5<\/li>\n    <\/ul>\n  <\/li>\n  <li>n\uac1c\uc758 listener \uac01\uac01\uc5d0\ub294 user emotion distribution\uc744 \uae30\ubc18\uc73c\ub85c \uc11c\ub85c \ub2e4\ub978 weight\ub97c \uc8fc\uace0, shared listener\ub294 \uace0\uc815\ub41c 1\uc758 weight\ub97c \uc8fc\uc5b4\uc11c \ubaa8\ub4e0 \uac10\uc815\uc5d0 \ub300\ud55c \uc77c\ubc18\uc801\uc778 \ud559\uc2b5\uc744 \ud558\uac8c \ud568<\/li>\n  <li>user emotion distribution\uc740 Key-Value Memory Network (Miler et al., 2016)\uc744 \ud1b5\ud574\uc11c \ub9cc\ub4ec.\n    <ul>\n      <li>\uac01 \uac10\uc815\uc5d0 \ub300\ud55c (key, value) \uc30d\uc744 \ub9cc\ub4ec. key\ub294 random init, value\ub294 TRS_Dec\uc758 \uc544\uc6c3\ud48b<\/li>\n      <li>\uc55e\uc5d0\uc11c \uad6c\ud55c QRY\uc758 representation\uc778 H_0\ub97c \ubaa8\ub4e0 key\uc5d0 \ub300\ud574\uc11c dot-product\ub97c \ud574\uc11c softmax \ucde8\ud568<\/li>\n      <li>\uc774\ub807\uac8c \uc5bb\uc5b4\uc9c4 \ud655\ub960 \ubd84\ud3ec\ub97c \uac10\uc815\uc5d0 \ub300\ud55c distribution p \ub77c\uace0 \ud568<\/li>\n      <li>p\uc5d0 \ub300\ud574 emotion classification\uc5d0 \ub300\ud55c cross entropy loss\ub97c \uc801\uc6a9\ud568<\/li>\n      <li>\ub610\ud55c \ubaa8\ub4e0 \uac10\uc815\uc5d0 \ub300\ud55c TRS_Dec\ub97c p \ubd84\ud3ec\ub85c weighted-sum\ud558\uace0, shared listener\ub294 \uadf8\ub0e5 \ub354\ud574\uc90c.\n        <ul>\n          <li>V_M = V_0 + sum{p_i * V_i}<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"34-meta-listener\">3.4. Meta Listener<\/h3>\n\n<ul>\n  <li>Meta listener\ub294 \ub610\ub2e4\ub978 transformer decoder\uc774\uba70, \uc55e\uc5d0\uc11c encoder\uac00 \ub9cc\ub4e0 H\uc640 emotion-aware listeners\uac00 \ub9cc\ub4e0 V_M\uc5d0 \ub300\ud574\uc11c decoding\ud558\uc5ec \ucd5c\uc885 response\ub97c \uc0dd\uc131\ud568<\/li>\n  <li>\uc774\ub807\uac8c \uc0dd\uc131\ub41c \ud1a0\ud070(\ub2e8\uc5b4)\ub4e4\uc5d0 \ub300\ud574\uc11c MLE \uae30\ubc18\uc73c\ub85c \ud559\uc2b5<\/li>\n  <li>\ucd5c\uc885 loss\ub294 \uc55e\uc11c \uad6c\ud55c emotion classification loss\uc640 generation loss\uc758 \ud569\uc774\uba70, \uac01\uac01 \ud558\uc774\ud37c \ud30c\ub77c\ubbf8\ud130 alpha, beta\uac00 \uc0c1\uc218\ub85c \ubd99\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"experiment\">Experiment<\/h2>\n\n<h3 id=\"41-dataset\">4.1. Dataset<\/h3>\n\n<ul>\n  <li>Empathetic dialogues (Rashkin et al., 2018) dataset\uc744 \uc0ac\uc6a9\n    <ul>\n      <li>25k one-to-one open-domain conversation grounded in emotional situations<\/li>\n      <li>32 emotion labels<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"42-training\">4.2. Training<\/h3>\n\n<ul>\n  <li>Adam, GloVe \uc0ac\uc6a9. \ub098\uba38\uc9c0 \ud30c\ub77c\ubbf8\ud130\ub294 \ub2e4 random initialization<\/li>\n<\/ul>\n\n<h3 id=\"43-baseline\">4.3. Baseline<\/h3>\n\n<ul>\n  <li>Transformer (TRS): \uadf8\ub0e5 MLE\ub85c \ud559\uc2b5\ud55c standard Transformer (Vaswani et al., 2017)<\/li>\n  <li>Multitask Transformer (Multi-TRS): emotion\uc5d0 \ub300\ud55c additional supervised information\uc744 \uac19\uc774 multitask\ub85c \ud559\uc2b5\ud558\ub294 Transformer (Rashkin et al., 2018)<\/li>\n<\/ul>\n\n<h3 id=\"44-hyperparameter\">4.4. Hyperparameter<\/h3>\n\n<ul>\n  <li>word embedding size: 300<\/li>\n  <li>hidden size: 300<\/li>\n  <li>2 self-attention layers made up of 2 attention heads with embedding dimension 40<\/li>\n  <li>positionwise feedforward with 1D conv layer with 50 filters of width 3<\/li>\n  <li>batch size: 16<\/li>\n<\/ul>\n\n<h3 id=\"45-evaluation-metrics\">4.5. Evaluation Metrics<\/h3>\n\n<ul>\n  <li>BLEU<\/li>\n  <li>Human Ratings<\/li>\n  <li>Human A\/B Test<\/li>\n<\/ul>\n\n<h2 id=\"5-results\">5. Results<\/h2>\n\n<ul>\n  <li>Emotion detection<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-11-28-moel-empathetic-listeners\/figure3.png\" alt=\"figure3\" \/><\/p>\n\n<ul>\n  <li>Response evaluation<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-11-28-moel-empathetic-listeners\/table2.png\" alt=\"table2\" \/><\/p>\n\n<h2 id=\"6-analysis\">6. Analysis<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-11-28-moel-empathetic-listeners\/table4.png\" alt=\"table4\" \/><\/p>\n\n<p><img src=\"\/assets\/images\/blog\/2019-11-28-moel-empathetic-listeners\/figure4.png\" alt=\"figure4\" \/><\/p>\n\n<h2 id=\"7-conclusion--future-work\">7. Conclusion &amp; Future Work<\/h2>\n\n<p>In this paper, we propose a novel way to generate empathetic dialogue responses by using Mixture of Empathetic Listeners (MoEL). Differently from previous works, our model understand the user feelings and responds accordingly by learning specific listeners for each emotion. We benchmark our model in empathetic-dialogues dataset (Rashkin et al., 2018), which is a multiturn open-domain conversation corpus grounded on emotional situations. Our experimental results show that MoEL is able to achieve competitive performance in the task with the advantage of being more interpretable than other conventional models. Finally, we show that our model is able to automatically select the correct emotional decoder and effectively generate an empathetic response.\nOne of the possible extensions of this work would be incorporating it with Persona (Zhang et al., 2018a) and task-oriented dialogue systems (Gao et al., 2018; Madotto et al., 2018; Wu et al., 2019, 2017, 2018a; Reddy et al., 2018; Raghu et al., 2019). Having a persona would allow the system to have more consistent and personalized responses, and combining open-domain conversations with task-oriented dialogue systems would equip the system with more engaging conversational capabilities, hence resulting in a more versatile dialogue system.<\/p>\n","pubDate":"Thu, 28 Nov 2019 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/moel-empathetic-listeners\/","guid":"https:\/\/roomylee.github.io\/moel-empathetic-listeners\/","category":["empathetic-listener","generation","moel","blog"]},{"title":"Gunrock: Building A Human-Like Social Bot By Leveraging Large Scale Real User Data (Alexa Prize 2018)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/pdfs.semanticscholar.org\/b402\/b85ad45e3ac51f1da8ee718373082ce24f47.pdf]\">https:\/\/pdfs.semanticscholar.org\/b402\/b85ad45e3ac51f1da8ee718373082ce24f47.pdf]<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Chun-Yen Chen , Dian Yu , Weiming Wen , Yi Mang Yang, Jiaping Zhang, Mingyang Zhou, Kevin Jesse, Austin Chau, Antara Bhowmick, Shreenath Iyer, Giritheja Sreenivasulu, Runxiang Cheng, Ashwin Bhandare, Zhou Yu<\/li>\n      <li>University of California<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>Alexa Prize 2018<\/li>\n    <\/ul>\n  <\/li>\n  <li>Key Points and My Comments\n    <ul>\n      <li>XiaoIce\uc640\uc758 \ucc28\uc774\uc810\n        <ul>\n          <li>Dialog state\uc5d0 \ub300\ud55c \uc815\uc758\uac00 \uba85\ud655\ud558\uc9c0 \uc54a\uc74c<\/li>\n          <li>\ub530\ub77c\uc11c state\uc5d0 \ub530\ub978 policy\uc640 action\uc774 \uc815\uc758\ub418\ub294 \ub290\ub08c\uc774 \uc544\ub2d8<\/li>\n          <li>Gunrock\uc740 NLU\ub97c \ud1b5\ud574 \ub118\uc5b4\uc628 \uc815\ubcf4\ub97c \uae30\ubc18\uc73c\ub85c Intent\ub97c \ubcf4\uace0 \uc8fc\uc81c\ubcc4 \uc11c\ube0c \ubaa8\ub4c8\uc5d0 \ub358\uc9c0\ub294 \ub290\ub08c<\/li>\n          <li>\uc989, \uc720\ud55c\ud55c \uc8fc\uc81c\uc5d0 \ub300\ud574 \ub300\ud654 \uc2a4\ud0ac\uc14b(\ubaa8\ub4c8)\uc744 \uac16\ucd98 \ud1b5\ud569 \uc2dc\uc2a4\ud15c\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc9c0 \uc54a\ub098<\/li>\n          <li>\uc774\uc5d0 \ube44\ud574 XiaoIce\ub294 NLU\ub97c \ud1b5\ud574 \ub118\uc5b4\uc628 \uc815\ubcf4\ub97c \uae30\ubc18\uc73c\ub85c \uc801\uc808\ud55c \ud1a0\ud53d\uc744 \uac00\ubbf8\ud574\uc11c \ub2e4\uc591\ud55c \uc5b4\ud504\ub85c\uce58\ub85c \ub9d0\uc744 \uac80\uc0c9\ud558\uace0 \ub9cc\ub4e4\uc5b4\ub0c4<\/li>\n          <li>Gunrock\uc740 \ud2b9\uc815 \uc8fc\uc81c\uc5d0 \ub300\ud574 \uae4a\uac8c \ub9d0\ud558\ub294 \uac8c \ubaa9\uc801\uc774\ub77c \uadf8\ub7f0\ub4ef. XiaoIce\ub294 \uc815\ub9d0 \uc7a1\ub2f4\uae4c\uc9c0 \ud560 \uc218 \uc788\ub294 \uac83 \uac19\uace0<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>\ud1a0\ud53d \uc804\ud658\uc774\ub098 \uc9c8\uc758 \uc751\ub2f5\uacfc \uac19\uc774 \ub113\uc740 \ubc94\uc704\uc758 \ub2e4\uc591\ud55c \uc720\uc800 behavior\ub97c \ub2e4\ub8f0 \uc218 \uc788\ub294 context-aware hierarchical dialog manager\ub97c \uac1c\ubc1c\ud568<\/li>\n  <li>Robust\ud55c 3\ub2e8\uacc4\uc758 NLU \ubaa8\ub4c8\uc744 \ub514\uc790\uc778\ud568<\/li>\n  <li>\uc74c\uc131\ud569\uc131\uc744 \ud1b5\ud574 \uc0ac\ub78c\ub2e4\uc6c0\uc744 \ud5a5\uc0c1\uc2dc\ud0a4\uace0\uc790 \ud568<\/li>\n  <li>Amazon Alexa Prize 2018 \uc6b0\uc2b9\ud568<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>45\uc77c\uc758 \ud3c9\uac00 \uae30\uac04\ub3d9\uc548 \ud558\ub8e8 \ud3c9\uade0 500\uac1c\uc758 \ub300\ud654(\uc138\uc158)\uc744 \uc5bb\uc74c<\/li>\n  <li>\uc804\uccb4 \uac1c\ubc1c \uae30\uac04\ub3d9\uc548 \ucd1d 48\ub9cc\uac1c\uc758 \ub300\ud654 \ud134\uc744 \uc5bb\uc74c<\/li>\n  <li>\ub17c\ubb38\uc5d0\uc11c \uc81c\uc548\ud558\ub294 open-domain social bot\uc778 Gunrock\uc740 \ud2b9\uc815\ud55c \ud639\uc740 \uc720\uba85\ud55c \uc8fc\uc81c\uc5d0 \ub300\ud574 \uae4a\uc774 \ub300\ud654\ub97c \ub098\ub20c \uc218 \uc788\uc744 \uc815\ub3c4\uc758 \ub113\uace0 \ub2e4\uc591\ud55c \uc0ac\ud68c\uc801 \uc8fc\uc81c\ub97c \ucee4\ubc84\ud558\ub294 \ub2a5\ub825\uc744 \ud1b5\ud574 \uc0ac\ub78c-\uc0ac\ub78c \uac04\uc758 \uc790\uc5f0\uc2a4\ub7ec\uc6b4 \ub300\ud654\ub97c \ubaa8\ubc29\ud558\uace0\uc790 \ud568<\/li>\n  <li>NLU: NER, Intent(Dialog Act), Sentiment, Coreference<\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>Cornell Movie Dialogs, Reddit \ub4f1\uc758 \ub370\uc774\ud130 \uc0ac\uc6a9<\/li>\n<\/ul>\n\n<h2 id=\"3-architecture\">3. Architecture<\/h2>\n\n<h3 id=\"31-system-overview\">3.1 System Overview<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-09-17-gunrock\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>NLU\ub294 12\uac1c\uc758 \ucef4\ud3ec\ub10c\ud2b8\ub85c \uad6c\uc131<\/li>\n  <li>\ud06c\uac8c\ub294 1) multiple sentences\uc5d0 \ub300\ud55c segmentation, 2) noun phrase \ucd94\ucd9c, 3) \uadf8 \uacb0\uacfc\ub97c \uc5ec\ub7ec NLP \ucef4\ud3ec\ub10c\ud2b8\ub97c \ud1b5\ud574 \ubd84\uc11d, \uc73c\ub85c \uc774\ub8e8\uc5b4\uc9d0<\/li>\n  <li>DM\uc740 Intent Classifier\uac00 \uc788\ub294\ub370, \uc758\ub3c4\ub97c \ubd84\uc11d\ud574\uc11c movies, sport, animal \ub4f1\uc758 \ud1a0\ud53d \ub300\ud654 \ubaa8\ub4c8\ub85c \uc5f0\uacb0\uc2dc\ud0a4\ub294 \uc5ed\ud560\uc744 \ud568<\/li>\n  <li>NLG\ub294 output\uc5d0 \ub300\ud55c template\uc744 \uad00\ub9ac\ud568<\/li>\n<\/ul>\n\n<h3 id=\"32-automatic-speech-recognition\">3.2 Automatic Speech Recognition<\/h3>\n\n<ul>\n  <li>\uc74c\uc131 \ucabd\uc740 \ud328\uc2a4<\/li>\n<\/ul>\n\n<h3 id=\"33-natural-language-understanding\">3.3 Natural Language Understanding<\/h3>\n\n<h4 id=\"331-sentence-segmentation\">3.3.1 Sentence Segmentation<\/h4>\n\n<ul>\n  <li>\n    <p>\uc0ac\ub78c\ub4e4\uc774 \uae38\uace0 \ubcf5\uc7a1\ud558\uac8c \ub9d0\ud558\ub294 \uac83\ub3c4 \ucc98\ub9ac\ud574\uc57c \ud55c\ub2e4.<\/p>\n  <\/li>\n  <li>Cornell Movie-Quotes Corpus\uc758 30\ub9cc\uac1c\uc758 \ub300\ud654 \uc911 \uc57d 2\ub9cc\uac1c\uc758 \ubc1c\ud654\uc5d0 \ub300\ud574 breaking point\ub97c \uc9c1\uc811 labeling<\/li>\n  <li>Example\n    <ul>\n      <li>\u201cAlexa that is cool what do you think of the Avengers\u201d<\/li>\n      <li>\u201cAlexa &lt;BRK&gt; that is cool &lt;BRK&gt; what do you think of the Avengers &lt;BRK&gt;\u201d<\/li>\n    <\/ul>\n  <\/li>\n  <li>Model\n    <ul>\n      <li>fastText word embedding<\/li>\n      <li>Attention-Seq2Seq\n        <ul>\n          <li>2-layer Bi-LSTM Encoder<\/li>\n          <li>2-layer RNN Decoder<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>30 epochs \uc815\ub3c4 \ud559\uc2b5\uc2dc\ucf30\uc744 \ub54c, 96%\uc758 Test \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h4 id=\"332-noun-phrase-extraction\">3.3.2 Noun Phrase Extraction<\/h4>\n\n<ul>\n  <li>Stanford CoreNLP constituency parser \uc0ac\uc6a9<\/li>\n  <li>\uc4f8\ubaa8\uc5c6\ub294 Stopword(e.g. it, all) \uc81c\uac70<\/li>\n  <li>Future Work\uc73c\ub85c dependency parser\ub97c \uc774\uc6a9\ud558\ub294 \ubc29\ubc95\uc744 \uace0\ub824\uc911<\/li>\n<\/ul>\n\n<h4 id=\"333-entity-recognition\">3.3.3 Entity Recognition<\/h4>\n\n<ul>\n  <li>\uae30\uc874\uc758 NER \ud234(Stanford CoreNLP, spaCy)\uc740 \ub300\uc18c\ubb38\uc790\uc5d0 \ub9e4\uc6b0 \uc758\uc874\uc801\uc784<\/li>\n  <li>\ub610\ud55c, general\ud55c entity\ub97c \uc7a1\uc544\ub0b4\uae30 \uc27d\uc9c0 \uc54a\uc74c<\/li>\n  <li>\ub530\ub77c\uc11c \uae30\uc874 NER\uc744 \uc548\uc4f0\uace0 \ucd94\ucd9c\ud55c Noun Phrase\uc5d0 \ub300\ud574 Knowledge Graph\ub97c \ud1b5\ud574 entity\ub97c \uc7a1\uc544\ub0c4<\/li>\n  <li>Google Knowledge Graph, Microsoft Concept Graph \ub4f1\uc5d0 \ub300\ud574\uc11c Noun Phrase\ub97c \uac80\uc0c9\ud558\uace0 \uc2e0\ub8b0\ub3c4\uac00 \ub192\uc740 \uacbd\uc6b0 \uc774\ub97c entity\ub85c \uc778\uc2dd<\/li>\n  <li>\uc608\ub97c \ub4e4\uc5b4, \u201ctomb raider\u201d\ub97c \uac80\uc0c9\ud558\uba74 \u201cvideo game series\u201d\ub77c\ub294 label\uc774 \ub192\uc740 \uc2e0\ub8b0\ub3c4\ub97c \uac16\uace0 \uc788\uace0, \ub530\ub77c\uc11c \u201dtomb raider\u201d\ub97c \u201cvideo game series\u201d\ub77c\uace0 \ud0dc\uae45<\/li>\n<\/ul>\n\n<h4 id=\"334-coreference-resolution\">3.3.4 Coreference Resolution<\/h4>\n\n<ul>\n  <li>\uc77c\uc885\uc758 \ub300\uba85\uc0ac \ucc98\ub9ac \uc815\ub3c4\ub85c \uc774\ud574\ud558\uba74 \ub428<\/li>\n  <li>\uae30\uc874 \ubaa8\ub378\ub4e4(CoreNLP, NeuralCoref)\ub294 \ub300\ud654 \ub370\uc774\ud130\ub85c \ud559\uc2b5\ub41c \uac83\uc774 \uc544\ub2c8\uae30\uc5d0 \uc798 \uc791\ub3d9\ud558\uc9c0 \uc54a\uc74c<\/li>\n  <li>\uadf8\ub798\uc11c \uc0c8\ub85c labeling\uc744 \ud574\uc11c \ud559\uc2b5\uc2dc\ud0a8\ub4ef\ud568<\/li>\n<\/ul>\n\n<h4 id=\"335-asr-correction\">3.3.5 ASR Correction<\/h4>\n\n<ul>\n  <li>\ud328\uc2a4<\/li>\n<\/ul>\n\n<h4 id=\"336-dialog-act-prediction\">3.3.6 Dialog Act Prediction<\/h4>\n\n<ul>\n  <li>Annotated \ub300\ud654 \ub370\uc774\ud130\uac00 \uc5c6\uae30 \ub54c\ubb38\uc5d0, Switchboard Dialog Act Corpus(SwDA)\ub97c \uc0ac\uc6a9<\/li>\n  <li>2-layer Bi-LSTM, 2-layer CNN \ub4f1\uc744 \uc0ac\uc6a9 (w\/ fastText)<\/li>\n  <li>\uc131\ub2a5\uc740 \ub458\ub2e4 85% \uc815\ub3c4\ub85c \ube44\uc2b7\ud568<\/li>\n  <li>\ud604\uc7ac ELMo\uc640 Recurrent Convolutional NN\uc73c\ub85c \uacc4\uc18d \uc2e4\ud5d8\uc744 \uc9c4\ud589 \uc911\uc774\ub77c\uace0 \ud568<\/li>\n<\/ul>\n\n<h4 id=\"337-topic-expansion\">3.3.7 Topic Expansion<\/h4>\n\n<ul>\n  <li>Knowledge Graph\ub85c\uc11c ConceptNet\uc744 \uc774\uc6a9<\/li>\n  <li>\ud604\uc7ac \uc598\uae30\ud558\ub294 \ud1a0\ud53d(entity)\uc73c\ub85c\ubd80\ud130 \ud655\uc7a5\uc2dc\ucf1c\uac00\uba70 \ub300\ud654\ub97c \uc774\uc5b4\uac08 \uc218 \uc788\ub3c4\ub85d \ud568<\/li>\n  <li>\uc608\ub97c \ub4e4\uc5b4, \ud604\uc7ac \uc790\ub3d9\ucc28\uc5d0 \ub300\ud574 \uc598\uae30\ud558\uace0 \uc788\ub2e4\uba74 Knowledge Graph \uc0c1\uc5d0\uc11c \ud655\uc7a5(retrieve) \uac00\ub2a5\ud55c \u201cVolvo\u201d\uc5d0 \ub300\ud55c \uc598\uae30\ub85c \uc8fc\uc81c\ub97c \ud655\uc7a5\ud574\ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>(NLU\uac00 \uc544\ub2c8\uace0 DM\ucabd\uc5d0\uc11c \ucc98\ub9ac\ud558\ub294 \uac8c \ub9de\uc9c0\uc54a\ub098?)<\/li>\n<\/ul>\n\n<h3 id=\"34-dialog-management\">3.4 Dialog Management<\/h3>\n\n<ul>\n  <li>2-level Hierarchical Dialog Manager\uac00 \ub300\ud654\ub97c \uad00\ub9ac\ud568<\/li>\n  <li>High-level\uc5d0\uc11c\ub294 NLU\uc758 output\uc744 \uae30\ubc18\uc73c\ub85c \uc801\uc808\ud55c topic dialog module\uc744 \uc120\ud0dd\ud568<\/li>\n  <li>Low-level\uc5d0\uc11c\ub294 \uc120\ud0dd\ub41c topic dialog module\uc774 \uc2e4\ud589\ub428<\/li>\n<\/ul>\n\n<h4 id=\"341-high-level-system-dialog-management\">3.4.1 High-Level System Dialog Management<\/h4>\n\n<ul>\n  <li>Intent Classifier\n    <ul>\n      <li>Common Alexa Prize Chats(CAPC) dataset\ub97c \uc0ac\uc6a9 (17\ub144 \ub300\ud68c \ub370\uc774\ud130\ub85c \ub9cc\ub4ec)<\/li>\n      <li>System Request(\u201cplay music\u201d, \u201cturn on the lights\u201d) \ub530\ub85c \ucc98\ub9ac<\/li>\n      <li>\uae30\ubcf8\uc801\uc778 Intent \ud30c\uc545\uacfc \ud568\uaed8, KG\ub4f1\uc744 \uae30\ubc18\uc73c\ub85c \ud1a0\ud53d(topic intent)\uc744 \uc7a1\uc544\ub0c4<\/li>\n    <\/ul>\n  <\/li>\n  <li>Dialog Module Selector\n    <ul>\n      <li>Intent Classifier\ub85c\ubd80\ud130 \ucc3e\uc544\ub0b8 topic intent\ub97c \ud1b5\ud574 dialog module\uc744 \uc120\ud0dd<\/li>\n      <li>\uc608\ub97c \ub4e4\uc5b4, \u201clet\u2019s talk about movies\u201d\ub77c\ub294 \ubc1c\ud654\uac00 \ub4e4\uc5b4\uc624\uba74 \u201cmovie\u201d\ub77c\ub294 \uac78 Intent Classifier\uac00 \ud3ec\ucc29\ud558\uace0 Selector\ub294 movie module\uc744 \uc120\ud0dd\ud568<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h4 id=\"342-low-level-dialog-management\">3.4.2 Low Level Dialog Management<\/h4>\n\n<ul>\n  <li>Topic Dialog Modules (11\uac1c)\n    <ul>\n      <li>Animal: Reddit API\uc744 \uc774\uc6a9\ud574 \uc751\ub2f5\uc744 \ucc3e\uc544\uc11c \uc0ac\uc6a9<\/li>\n      <li>Movie and Book: TMDB API, Goodreads API, IMDB \ub4f1\uc744 \uc774\uc6a9<\/li>\n      <li>Music: Spotify\u2019s million Playlist dataset\uc5d0 \uc788\ub294 artist\uc5d0 \ub300\ud574 \uc598\uae30\ub97c \ud568<\/li>\n      <li>Sport: \ucd5c\uc2e0 \uc815\ubcf4\uac00 \uc911\uc694. Reddit, Twitter, Moments, News \ub4f1 \ud65c\uc6a9<\/li>\n      <li>Game: IGDM<\/li>\n      <li>Psychology and Philosophy: TED talk<\/li>\n      <li>Holiday, Travel, Technology and Science, News, Retrieval<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ub9d0\uc744 \ub9cc\ub4dc\ub294 \uac83\uc774 \uc544\ub2c8\ub77c, \ud574\ub2f9 \uc8fc\uc81c\uc5d0\uc11c \ud765\ubbf8\ub85c\uc6b4 \uc815\ubcf4\ub97c \u201c\ucc3e\ub294 \uac83\u201d<\/li>\n  <li>\uc815\ubcf4\ub97c \ucd94\ub824\uc11c NLG\uc5d0 \uc8fc\uba74 \ud15c\ud50c\ub9bf \uae30\ubc18\uc73c\ub85c \ubc1c\ud654 \uc0dd\uc131\uc744 \uc9c4\ud589<\/li>\n<\/ul>\n\n<h3 id=\"35-knowledge-base\">3.5 Knowledge Base<\/h3>\n\n<ul>\n  <li>Fact \ucee8\ud150\uce20\ub97c \ub2f4\ub2f9\ud558\ub294 KB\ub791 Opinion \ucee8\ud150\uce20\ub97c \ub2f4\ub2f9\ud558\ub294 KB\ub85c \ub098\ub214<\/li>\n<\/ul>\n\n<h3 id=\"36-natural-language-generation-nlg\">3.6 Natural Language Generation (NLG)<\/h3>\n\n<ul>\n  <li>\uac19\uc740 \ub9d0\uc744 \ub418\ud480\uc774\ud558\uc9c0 \uc54a\uace0, \ub2e4\uc591\uc131\uc744 \uc8fc\uae30 \uc704\ud574 \ud15c\ud50c\ub9bf \uae30\ubc18\uc73c\ub85c \uc0dd\uc131<\/li>\n<\/ul>\n\n<h2 id=\"4-an-example-dialog\">4. An Example Dialog<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-09-17-gunrock\/table1.png\" alt=\"table1\" \/><\/p>\n\n<ul>\n  <li>\uc720\uc800\uac00 \uad00\uc2ec\uc744 \uac00\uc9c8 \uc218 \uc788\uac8c, \uad00\uc2ec\uc0ac\ub97c \uace0\ub824\ud558\uc5ec fact, experience, opinion \ub4f1 \ubc1c\ud654\ub97c \ub9cc\ub4ec<\/li>\n  <li>acknowledgement \uc5ed\uc2dc \uc0ac\ub78c\ub2f5\uac8c \ub9cc\ub4dc\ub294\ub370 \uc911\uc694\ud55c \uc5ed\ud560\uc744 \ud55c\ub2e4\uace0 \uc0dd\uac01<\/li>\n<\/ul>\n\n<h2 id=\"5-results-ans-analysis\">5. Results ans Analysis<\/h2>\n\n<ul>\n  <li>\ub300\ud68c \uae30\uac04\ub3d9\uc548 \uc131\ub2a5\uc774 \uacc4\uc18d \ud5a5\uc0c1\ub418\uc5c8\ub294\ub370, 3\ub2e8\uacc4\uc758 NLU\uc640 \uacc4\uce21\uc801 \ud1a0\ud53d \uc804\uc774 \uad00\ub9ac \uadf8\ub9ac\uace0 \uc74c\uc131\ud569\uc131\uc774 \ud070 \uc601\ud5a5\uc744 \uc8fc\uc5c8\ub2e4\uace0 \ubd04<\/li>\n<\/ul>\n\n<h3 id=\"51-module-performance-analysis\">5.1 Module Performance Analysis<\/h3>\n\n<h4 id=\"511-topic-level-analysis\">5.1.1 Topic level analysis<\/h4>\n\n<p><img src=\"\/assets\/images\/blog\/2019-09-17-gunrock\/table2.png\" alt=\"table2\" \/><\/p>\n\n<ul>\n  <li>Table 2\ub294 \ud1a0\ud53d \ub300\ud654 \ubaa8\ub4c8(topic dialog module)\uc5d0 \ub530\ub978 \ud134 \uc218\uc640 \ud3c9\uade0 \uc810\uc218\ub97c \ubcf4\uc5ec\uc90c<\/li>\n  <li>\ub300\ud68c\uae30\uac04\ub3d9\uc548 MOVIE\uac00 \uac00\uc7a5 \ub110\ub9ac \ub2e4\ub904\uc84c\uace0, ANIMAL\uc774 \uac00\uc7a5 \uc810\uc218\uac00 \uc88b\uc558\uc74c<\/li>\n<\/ul>\n\n<h4 id=\"512-lexical-analysis\">5.1.2 Lexical analysis<\/h4>\n\n<ul>\n  <li>\uc0ac\ub78c\ub4e4\uc774 \uc598\uae30\ud558\uace0 \uc2f6\uc5b4\ud558\ub294 \uc5d4\ud2f0\ud2f0(\uc8fc\uc81c)\uc5d0 \ub300\ud574\uc11c unsupervised\ub85c \ubd84\uc11d\ud574\ubd04<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-09-17-gunrock\/figure5.png\" alt=\"figure5\" \/><\/p>\n\n<ul>\n  <li>Figure 5\ub294 \ub300\ud654\uc5d0\uc11c \ub098\uc628 \uc5d4\ud2f0\ud2f0\ub4e4\uacfc \ub300\ud654\uc758 \uc810\uc218 \uac04\uc758 correlation\uc744 \ubcf4\uae30 \uc704\ud574 \uc815\uc131\uc801 \ud3c9\uac00\ub97c \ud574\ubcf8 \uac83\uc784<\/li>\n  <li>x\ucd95\uc740 \ud574\ub2f9 \uc5d4\ud2f0\ud2f0\uc758 \ube48\ub3c4\uc218\uc774\uace0, y\ucd95\uc740 \ub2e8\uc5b4\uc640 \uc810\uc218\uc744 \ub098\ud0c0\ub0b4\ub294 \uc9c0\ud45c\uc784<\/li>\n  <li>\uc0ac\ub78c\ub4e4\uc740 \uae30\uc220, \uac8c\uc784, \ub3d9\ubb3c\uc5d0 \ub300\ud574\uc11c \ub9ce\uc774 \uc598\uae30\ud558\uace0 \uc2f6\uc5b4\ud558\ub294\ub370 \ubc18\uba74 \uc810\uc218\ub294 \ub0ae\uac8c \uc8fc\ub294 \uacbd\ud5a5\uc774 \uc788\uc74c. \u201crobots\u201d, \u201cundertake\u201d, \u201cdog\u201d \ub4f1\uc758 \ub2e8\uc5b4\ub4e4\uc774 \uc774\ub7f0 \uac78 \ub098\ud0c0\ub0c4<\/li>\n  <li>\uadf8\ub9ac\uace0 \ub17c\ub780\uc774 \ub420\ub9cc\ud55c \uc8fc\uc81c\uc640 \ub0ae\uc740 \uc810\uc218\uac00 \uc0c1\uad00\uc131\uc774 \uc788\ub2e4\ub294 \uac78 \ubc1c\uacac\ud568. \u201creligion\u201d, \u201cgossip\u201d, \u201cpoop\u201d \ub4f1\uc774 \uc774\uc5d0 \ud574\ub2f9\ud568<\/li>\n  <li>\uadf8\ub9ac\uace0 \uc0ac\ub78c\ub4e4\uc740 \ucd5c\uadfc \uc774\ubca4\ud2b8\uc5d0 \ub300\ud574\uc11c \uc598\uae30\ud558\uace0 \uc2f6\uc5b4\ud568<\/li>\n<\/ul>\n\n<h3 id=\"52-dialog-strategy-effectiveness\">5.2 Dialog Strategy Effectiveness<\/h3>\n\n<h4 id=\"521-acknowledgement-with-knowledge-graph-reasoning\">5.2.1 Acknowledgement with Knowledge Graph reasoning<\/h4>\n\n<p><img src=\"\/assets\/images\/blog\/2019-09-17-gunrock\/figure6.png\" alt=\"figure6\" \/><\/p>\n\n<ul>\n  <li>Non-task oriented dialog system\uc5d0\uc11c grounding\uc740 \ub9e4\uc6b0 \ud6a8\uacfc\uc801\uc778 \uc804\ub7b5\uc784<\/li>\n  <li>Acknowledging(\uc544\ub294 \ucc99 \ud558\ub294 \uac70)\uc774 \ub9e4\uc6b0 \uc911\uc694\ud568 (Figure 6 \ucc38\uc870)<\/li>\n  <li>\uc608\ub97c \ub4e4\uc5b4, \ucc45 \ubaa8\ub4c8\uc5d0\uc11c \uc0ac\uc6a9\uc790\uac00 \ub17c\ud53d\uc158\ubcf4\ub2e4 \ud53d\uc158\uc744 \uc88b\uc544\ud55c\ub2e4\uace0 \ud558\uba74 \u201c\uc624 \ub098 \ud574\ub9ac\ud3ec\ud130 \uc2dc\ub9ac\uc988 \uc77d\uc5b4\ubd24\uc5b4\u201d \ub77c\uace0 \ub2f5\ud558\ub294 \uac70\uc784<\/li>\n  <li>\uc774\ub7ec\ub824\uba74 \ud574\ub9ac\ud3ec\ud130\uac00 \ub300\ud45c\uc801\uc778 \ud53d\uc158 \ucc45\uc774\ub77c\ub294 \uc9c0\uc2dd\uc744 \uc54c\uace0 \uc788\uc5b4\uc57c \ud568<\/li>\n  <li>\ub530\ub77c\uc11c knowledge graph\ub97c \uc801\uc808\ud788 \ud65c\uc6a9\ud558\ub294 \uac83\uc774 \ud544\uc694<\/li>\n<\/ul>\n\n<h3 id=\"53-system-latency\">5.3 System Latency<\/h3>\n\n<ul>\n  <li>\ud328\uc2a4<\/li>\n<\/ul>\n\n<h2 id=\"6-visualization-tool\">6. Visualization Tool<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-09-17-gunrock\/figure8.png\" alt=\"figure8\" \/><\/p>\n\n<h2 id=\"7-conclusion\">7. Conclusion<\/h2>\n\n<ul>\n  <li>Introduction\uacfc \uac19\uc740 \ub0b4\uc6a9<\/li>\n<\/ul>\n\n<h2 id=\"8-future-work\">8. Future Work<\/h2>\n\n<ul>\n  <li>\ub300\ud68c \ub54c\ubb38\uc5d0 \uc5c4\ubc00\ud788 \uc2e4\ud5d8\ud558\uace0 \ubd84\uc11d\ud558\uc9c0 \ubabb\ud55c \ubd80\ubd84\uc774 \uc788\uc74c<\/li>\n  <li>\ub300\ud68c \ub05d\ub098\uace0 A\/B \ud14c\uc2a4\ud2b8 \uac19\uc740 \uac70 \ud574\ubcf4\ub824\uace0 \ud568<\/li>\n  <li>\uadf8\ub9ac\uace0 \uba87\uba87 \ubd80\ubd84\ub4e4\uc740 \ub354 \ud5a5\uc0c1\uc2dc\ucf1c\ubcf4\uace0\uc790\ud568\n    <ul>\n      <li>\uac01 \uc720\uc800 \ubcc4\ub85c \uc720\ub2c8\ud06c\ud55c \ub300\ud654 \uacbd\ud5d8\uc744 \uc8fc\uae30 \uc704\ud574 \uc720\uc800\uc758 \uc131\ubcc4, \uc131\uaca9, \uad00\uc2ec\uc0ac\ub97c \uae30\ubc18\uc73c\ub85c \ud1a0\ud53d\uc744 \uc120\ud0dd\ud558\ub294 \uc131\ub2a5\uc744 \ud5a5\uc0c1\uc2dc\ud0ac \uac70\uc784<\/li>\n      <li>\ub610\ud55c upcoming \uc774\ubca4\ud2b8\ub098 \uad00\uc2ec \ucee8\ud150\uce20\ub97c \ucd94\ucc9c\ud558\ub294 \uc2dc\uc2a4\ud15c\uc744 \ub9cc\ub4dc\ub294 \uac83\ub3c4 \uc0dd\uac01\ud558\uace0 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ub354 \ub098\uc740 dialog policy\ub97c \uc704\ud574\uc11c \uac15\ud654\ud559\uc2b5\uc744 \ub3c4\uc785\ud560\uae4c \uc2f6\uc74c<\/li>\n  <li>\ub610\ud55c social citchat\uacfc task-oriented\ub97c \ud569\uccd0\ubcfc \uac70\uc784. \uc720\uc800\uac00 \uac00\ub054 \ub808\uc2a4\ud1a0\ub791 \ucd94\ucc9c\uacfc \uac19\uc774 \ud2b9\uc815 \ubaa9\uc801\uc744 \uac16\uace0 \ubb3c\uc5b4\ubcfc \ub54c\uac00 \uc788\uae30 \ub54c\ubb38\uc5d0<\/li>\n  <li>\uadf8\ub9ac\uace0 data-driven opinion answering\uc774\ub098 \ud1a0\ud53d\uc5d0 \ub300\ud55c debate\ub97c \ud560 \uc218 \uc788\ub294 \ubaa8\ub378\uc744 \ub9cc\ub4e4\uac70\uc784<\/li>\n  <li>\ub9c8\uc9c0\ub9c9\uc73c\ub85c \uc790\ub3d9\uc73c\ub85c \uc720\uc800\uc640\uc758 \ub300\ud654\ub97c \ud559\uc2b5\ud560 \uc218 \uc788\uac8c online learning \ucabd\ub3c4 \uc54c\uc544\ubcf4\uace0 \uc788\uc74c<\/li>\n<\/ul>\n","pubDate":"Tue, 17 Sep 2019 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/gunrock\/","guid":"https:\/\/roomylee.github.io\/gunrock\/","category":["dialog-system","alexa-prize","blog"]},{"title":"Rasa: Open Source Language Understanding and Dialogue Management (ConvAI 2017)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1712.05181\">https:\/\/arxiv.org\/abs\/1712.05181<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Tom Bocklisch, Joey Faulkner, Nick Pawlowski, and Alan Nichol<\/li>\n      <li>Rasa<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>arXiv 2017<\/li>\n      <li>NIPS 2017 Conversational AI Workshop<\/li>\n    <\/ul>\n  <\/li>\n  <li>Key Points and My Comments\n    <ul>\n      <li>\uc5ed\uc2dc Task-oriented System<\/li>\n      <li>\ub17c\ubb38\uc774\ub77c\uae30\ubcf4\ub2e8 System Description \ub290\ub08c<\/li>\n      <li>\uc598\ub124\uac00 \ucd94\uad6c\ud558\ub294 \uac74 \uc544\uc8fc \ub192\uc740 \ud004\ub9ac\ud2f0\ubcf4\ub2e8 \ub204\uad6c\ub098 \uc27d\uace0 \ud3b8\ud558\uac8c \uc4f8 \uc218 \uc788\ub294 \uac70 \uac19\uc74c<\/li>\n      <li>\ub530\ub77c\uc11c \uad6c\uc870\ub098 \ubaa8\ub4c8\uc774 \uc2ec\ud50c\ud558\uace0 \ub098\uc774\ube0c\ud568<\/li>\n      <li>\ub370\uc774\ud130\uc758 \ud3ec\ub9f7\ub9cc \uc798 \ub9de\ucdb0\uc11c \uc900\ube44\ud558\uba74 \uc0ac\uc6a9\ud560 \uc218 \uc788\uc74c<\/li>\n      <li>Rasa NLU\ub294 \ud06c\uac8c NER\uacfc Intent, \uc774 \uacb0\uacfc\ub85c structured data\ub97c \uc5bb\uc74c (no vector representation)\n        <ul>\n          <li>NER, Intent \ub458 \ub2e4 \uc0ac\uc6a9\uc790\uc5d0 \uc758\ud55c \ub3c4\uba54\uc778 \ucee4\uc2a4\ud130\ub9c8\uc774\uc9d5 \uac00\ub2a5, \uc774 \uacbd\uc6b0 \ud574\ub2f9 \ubaa8\ub4c8\uc744 \uc0c8\ub85c \ud559\uc2b5\n            <ul>\n              <li>\ud0dd\uc2dc \ucc57\ubd07\uc744 \ub9cc\ub4e4 \ub54c, \u201cbooking_taxi\u201d \uac19\uc740 intent\ub97c \uc815\uc758\ud560 \uc218 \uc788\uc74c<\/li>\n            <\/ul>\n          <\/li>\n        <\/ul>\n      <\/li>\n      <li>Dialogue Handling(DM)\uc744 \uc61b\ub0a0\ucc98\ub7fc state machine \uae30\ubc18\uc73c\ub85c \ud558\ub294 \uac74 \ub354\uc774\uc0c1 \u3134\u3134. \ub118\ub098 \ubcf5\uc7a1\ud568\n        <ul>\n          <li>\uac15\ud654\ud559\uc2b5(RL)\uc744 \ud558\uae30\uc5d4 \ub370\uc774\ud130\ub3c4 \ub9ce\uc774 \ud544\uc694\ud558\uace0 policy \uc815\uc758\uac00 \ud798\ub4ec. \ube44\uc804\ubb38\uac00\ub3c4 \uc27d\uace0 \ud3b8\ud558\uac8c \uc4f0\uae30 \uc704\ud574\uc120 \ub354\uc6b1 \uc2ec\ud50c\ud55c \uad6c\uc870\uac00 \ud544\uc694<\/li>\n          <li>state\ub294 \ub2e4\ub978 Task-oriented \ucc57\ubd07\ucc98\ub7fc slot-value \ud615\ud0dc<\/li>\n          <li>action\uc740 slots, previous utternaces, results of previous action\uc744 \uae30\ubc18\uc73c\ub85c \ubaa8\ub378\uc774 \uacb0\uc815<\/li>\n          <li>action\uc740 \ub2e4\uc2dc state\uc5d0 \uc601\ud5a5\uc744 \uc90c. action\uc740 \u201cSlotSet\u201d, \u201cAllSlotsReset\u201d, \u201cRestarted\u201d \ub4f1\uc758 \u201clist of event\u201d \ud615\ud0dc\ub85c \uc8fc\uc5b4\uc9d0<\/li>\n        <\/ul>\n      <\/li>\n      <li>NLG\ub294 \uc798 \uc124\uba85\uc774 \uc548\ub418\uc5b4\uc788\ub294\ub370, \ub2e4\ub978 Rasa \uc790\ub8cc\ub97c \ucc3e\uc544\ubcf4\uba74 DM(Rasa Core)\ub85c\ubd80\ud130 structured data\ub97c \ubc1b\uace0 \uc774\ub97c unstructured text data\ub85c \ubcc0\ud658\ud558\ub294 \uc791\uc5c5\uc744 \ud568\n        <ul>\n          <li>\uae30\ubcf8\uc801\uc73c\ub85c Rasa\ub294 built-in templated based NLG\ub85c \ub418\uc5b4\uc788\uace0, \uc774 \uc5ed\uc2dc external HTTP server\ub97c \ubd99\uc5ec\uc11c \uc678\ubd80 NLG\ub97c \uc0ac\uc6a9\ud55c \ucee4\uc2a4\ud130\ub9c8\uc774\uc9d5\uc774 \uac00\ub2a5<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-09-04-rasa\/figure_flow.png\" alt=\"figure_flow\" \/><\/p>\n\n<ul>\n  <li>Reference\n    <ul>\n      <li><a href=\"https:\/\/www.youtube.com\/watch?v=zpdLFR3sWZ4\">https:\/\/www.youtube.com\/watch?v=zpdLFR3sWZ4<\/a><\/li>\n      <li><a href=\"https:\/\/medium.com\/@BhashkarKunal\/conversational-ai-chatbot-using-rasa-nlu-rasa-core-how-dialogue-handling-with-rasa-core-can-use-331e7024f733\">https:\/\/medium.com\/@BhashkarKunal\/conversational-ai-chatbot-using-rasa-nlu-rasa-core-how-dialogue-handling-with-rasa-core-can-use-331e7024f733<\/a><\/li>\n      <li><a href=\"https:\/\/github.com\/RasaHQ\/rasa-workshop-pydata-berlin\">https:\/\/github.com\/RasaHQ\/rasa-workshop-pydata-berlin<\/a><\/li>\n      <li><a href=\"https:\/\/www.slideshare.net\/JustinaPetraityt\/deprecating-the-state-machine-building-conversational-ai-with-the-rasa-stack\">https:\/\/www.slideshare.net\/JustinaPetraityt\/deprecating-the-state-machine-building-conversational-ai-with-the-rasa-stack<\/a><\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>\ub300\ud654 \uc2dc\uc2a4\ud15c\uc744 \ub9cc\ub4e4\uae30 \uc704\ud55c \uc624\ud508\uc18c\uc2a4 python library\uc778 Rasa NLU\uc640 Rasa Core\ub97c \uc18c\uac1c\ud568<\/li>\n  <li>\uc18c\ud504\ud2b8\uc6e8\uc5b4 \uac1c\ubc1c\uc790 \uc5c6\uc774\ub3c4 \ub204\uad6c\ub098 \uc27d\uac8c \uba38\uc2e0\ub7ec\ub2dd \uae30\ubc18 dialogue manager\uc640 language understanding \ubaa8\ub4c8\uc744 \ub9cc\ub4e4 \uc218 \uc788\ub3c4\ub85d \ub514\uc790\uc778\ud568<\/li>\n  <li>\uc0dd\uac01\ubcf4\ub2e4 \uae30\uc220\uc801\uc73c\ub85c \uc5c4\uccad\ub09c \uac83 \uac19\uc9c0\ub294 \uc54a\uc74c. \uac01 \ubaa8\ub4c8\uc758 \uc218\uc900\ub3c4 \uadf8\ub9ac \ub192\uc9c0 \uc54a\uace0 \uc815\uad50\ud558\ub2e4\uace0 \ub290\uaef4\uc9c0\uc9c0\ub3c4 \uc54a\uc74c.<\/li>\n  <li><strong>\ub531 \ubaa9\uc801(\ube44\uc804\ubb38\uac00\ub3c4 \uc27d\uac8c \ub9cc\ub4e4 \uc218 \uc788\uac8c \ud558\uc790)\uc5d0 \uc54c\ub9de\uac8c \ub514\uc790\uc778\ub41c Task-oriented \ucc57\ubd07 \ud504\ub808\uc784\uc6cc\ud06c\uc778 \uac83\uc73c\ub85c \ubcf4\uc784<\/strong><\/li>\n  <li>\ucf54\ub4dc\ub294 \uae43\ud5d9 \ucc38\uace0 <a href=\"https:\/\/github.com\/RasaHQ\/\">https:\/\/github.com\/RasaHQ\/<\/a><\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Rasa NLU\uc640 Rasa Core (dialogue manager)\ub97c \uac1c\ubc1c<\/li>\n  <li>\ube44\uc804\ubb38\uac00\ub3c4 \ub110\ub9ac \uc4f8 \uc218 \uc788\uc74c<\/li>\n  <li>Rasa\ub294 \uc774\ubbf8 \uc804\uc138\uacc4\uc758 \uc218\ucc9c\uba85\uc758 \uac1c\ubc1c\uc790\uac00 \uc0ac\uc6a9\ud558\uace0 \uc788\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>Rasa\ub294 \uba87\uba87 \uc18c\uc2a4\ub85c\ubd80\ud130 \uc601\uac10\uc744 \ubc1b\uc74c: Scikit-learn, Keras<\/li>\n  <li>fastText, CloVe \ub4e4\uc744 \uc0ac\uc6a9<\/li>\n<\/ul>\n\n<h2 id=\"3-description-of-the-code\">3. Description of the Code<\/h2>\n\n<ul>\n  <li>Rasa\uc758 \uc544\ud0a4\ud14d\uccd0\ub294 \ubaa8\ub4c8\ub85c \ub514\uc790\uc778\ub418\uc5b4\uc788\uc5b4\uc11c \uc27d\uac8c \ub2e4\ub978 \uc2dc\uc2a4\ud15c\uacfc \ud1b5\ud569\ud560 \uc218 \uc788\uc74c<\/li>\n  <li>Python\uc73c\ub85c \uad6c\ud604\ub418\uc5b4\uc788\uace0 HTTP API \uac00\ub2a5<\/li>\n<\/ul>\n\n<h3 id=\"31-architecture\">3.1 Architecture<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-09-04-rasa\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>Dialogue state\ub294 \uc704 Figure 1\uc5d0\uc11c Tracker \uac1d\uccb4\uc5d0 \uc800\uc7a5\ub428<\/li>\n  <li>\ub300\ud654 \uc138\uc158\ub9c8\ub2e4 \ud558\ub098\uc758 Tracker\uac00 \uc874\uc7ac\ud568<\/li>\n  <li>Tracker\ub294 slot\uc744 \uc800\uc7a5\ud558\uace0 \ud574\ub2f9 \uc0c1\ud0dc\ub85c \uc5f0\uacb0\ub418\uc5b4 \ub300\ud654 \ub0b4\uc5d0\uc11c \ubc1c\uc0dd\ud55c \ubaa8\ub4e0 \uc774\ubca4\ud2b8\uc758 \ub85c\uadf8\ub97c \uc800\uc7a5\ud568<\/li>\n  <li>\ubaa8\ub4e0 \uc774\ubca4\ud2b8\ub97c \uc7ac\uc0dd\ud558\uc5ec \ub300\ud654 \uc0c1\ud0dc\ub97c \uc7ac\uad6c\uc131 \ud560 \uc218 \uc788\uc74c<\/li>\n  <li>Figure 1\uc5d0\uc11c step 1\uc740 Rasa NLU\uac00 \ub2f4\ub2f9\ud558\uace0 \uc774\uc678\uc758 \ubaa8\ub4e0 step\uc740 Rasa Core\uac00 \ub2f4\ub2f9<\/li>\n  <li>Figure 1 \uc124\uba85\n    <ol>\n      <li>Interpreter (Rasa NLU): intent, entity, \uae30\ud0c0 \uc815\ud615 \ub370\uc774\ud130(structured information) \ucd94\ucd9c<\/li>\n      <li>Tracker: \ub300\ud654 \uc0c1\ud0dc \uc720\uc9c0. \uc0c8\ub85c\uc6b4 \uba54\uc138\uc9c0\ub97c \ubc1b\uc74c<\/li>\n      <li>Policy: \ud604\uc7ac \uc0c1\ud0dc(state)\ub97c Tracker\ub85c\ubd80\ud130 \ubc1b\uc74c. \ub2e4\uc74c \uc561\uc158\uc744 \uc120\ud0dd<\/li>\n      <li>Action: Policy\uc5d0 \uc758\ud574 \uc120\ud0dd\ub41c \uc561\uc158\uc744 \uc218\ud589. \uc561\uc158\uc5d0 \ub300\ud55c \ub85c\uadf8\ub97c Tracker\ud55c\ud14c \uc804\ub2ec<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<h3 id=\"32-actions\">3.2 Actions<\/h3>\n\n<ul>\n  <li>DM\uc758 \ubb38\uc81c\ub97c classification \ubb38\uc81c\ub85c \ud480\uace0\uc790\ud568<\/li>\n  <li>Rasa Core\ub294 \ubbf8\ub9ac \uc815\uc758\ub41c \uc561\uc158 \ub9ac\uc2a4\ud2b8 \uc911\uc5d0 \uc801\uc808\ud55c \uc561\uc158 \ud558\ub098\ub97c \uc608\uce21\ud568<\/li>\n  <li>\uc561\uc158\uc740 \u201c\uc720\uc800\uc5d0\uac8c \uba54\uc138\uc9c0 \ubcf4\ub0b4\uae30\u201d \ud639\uc740 \u201c\uae30\ub2a5(function) \uc2e4\ud589\ud558\uae30\u201d \ub4f1\uc758 \uac04\ub2e8\ud55c \ubc1c\ud654\uac00 \ub420 \uc218 \uc788\uc74c<\/li>\n  <li>\uc561\uc158\uc774 \uc2e4\ud589\ub420 \ub54c, slot, \uc774\uc804 \ubc1c\ud654\ub4e4, \uc774\uc804 \uc561\uc158\ub4e4\uacfc \uac19\uc740 \ub300\ud654 \ud788\uc2a4\ud1a0\ub9ac\ub85c\ubd80\ud130 \uad00\ub828 \uc815\ubcf4\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc561\uc158\uc740 \uc2e4\ud589\ud560 \uc774\ubca4\ud2b8\ub97c \ub9cc\ub4e4\uc5b4\ub0b4\uace0 tracker\uc5d0\uac8c \uc774\ub97c \uc90c. \uc774\ubca4\ud2b8\ub85c\ub294 \u201cSlotSet\u201d, \u201cAllSlotsReset\u201d, \u201cRestarted\u201d \ub4f1\uc774 \uc788\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"33-natural-language-understanding\">3.3 Natural Language Understanding<\/h3>\n\n<ul>\n  <li>Rasa NLU \ubaa8\ub4c8\uc5d0 \ub300\ud55c \uac83<\/li>\n  <li>\ucee4\uc2a4\ud130\ub9c8\uc774\uc9d5\uacfc \uc26c\uc6b4 \uc0ac\uc6a9, \ub450 \ub9c8\ub9ac \ud1a0\ub07c\ub97c \uc7a1\uae30 \uc704\ud574 \ub300\ubd80\ubd84\uc758 use case\uc5d0\uc11c \uc798 \uc791\ub3d9\ud558\ub294 pre-defined \ud30c\uc774\ud504\ub77c\uc778\uc744 \uc81c\uacf5<\/li>\n  <li>\uc608\ub97c \ub4e4\uc5b4, \u201cspacy_sklearn\u201d \ud30c\uc774\ud504\ub77c\uc778\uc740 \ub2e4\uc74c\uc758 \uacfc\uc815\uc744 \ud1b5\ud574 \ud14d\uc2a4\ud2b8\ub97c \ucc98\ub9ac\ud568\n    <ul>\n      <li>spaCy\ub97c \uc774\uc6a9\ud574\uc11c \ud14d\uc2a4\ud2b8 \ud1a0\ud06c\ub098\uc774\uc9d5\uacfc POS \ucc98\ub9ac<\/li>\n      <li>GloVe \uc368\uc11c \uac01 \ud1a0\ud070 \uc784\ubca0\ub529 \ub8e9\uc5c5\ud558\uace0 \ucd5c\uc885 representation\uc744 \ub9cc\ub4ec<\/li>\n      <li>\uadf8\ub9ac\uace0 sklearn\uc744 \ud1b5\ud574\uc11c svm \ud559\uc2b5<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uadf8\ub9ac\uace0 \u201cner_crf\u201d \ucef4\ud3ec\ub10c\ud2b8\ub3c4 \ud1a0\ud06c\ub098\uc774\uc9d5 + POS<\/li>\n<\/ul>\n\n<h3 id=\"34-policies\">3.4 Policies<\/h3>\n\n<ul>\n  <li>tracker\uac00 \uc900 \uc815\ubcf4\ub97c \uae30\ubc18\uc73c\ub85c \ub2e4\uc74c\uc5d0 \uc2e4\ud589\ud560 \uc561\uc158\uc744 \uc120\ud0dd\ud558\ub294 \uc5ed\ud560<\/li>\n  <li>\uc5ec\uae30\uc5d0\ub294 featurizer\ub77c\ub294 \uac8c \uc788\ub294\ub370, \uc774\uac8c \ud604\uc7ac \ub300\ud654 \uc0c1\ud0dc\uc5d0 \ub300\ud55c vector representation\uc744 \ub9cc\ub4e4\uc5b4\uc90c<\/li>\n  <li>featurizer\ub294 \ub2e4\uc74c\uc744 \ud45c\ud604\ud558\ub294 feature\ub97c concat\ud568\n    <ul>\n      <li>\uc9c1\uc804 \uc561\uc158\uc774 \ubb34\uc5c7\uc774\uc5c8\ub098<\/li>\n      <li>\uac00\uc7a5 \ucd5c\uadfc \uc720\uc800 \uba54\uc138\uc9c0\uc758 \uc778\ud150\ud2b8\uc640 \uc5d4\ud2f0\ud2f0<\/li>\n      <li>\uc5b4\ub5a4 \uc2ac\ub86f\uc774 \ud604\uc7ac \uc815\uc758\ub418\uc5b4 \uc788\ub294\uac00 (value\ub97c \ucc3e\uc740 \uc2ac\ub86f\uc744 \uc758\ubbf8\ud558\ub294 \ub4ef)<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"4-usage\">4. Usage<\/h2>\n\n<h3 id=\"41-training-data-formats\">4.1 Training Data Formats<\/h3>\n\n<ul>\n  <li>\n    <p>Rasa NLU: json, md \ud3ec\ub9f7 \uc9c0\uc6d0<\/p>\n\n    <p><img src=\"\/assets\/images\/blog\/2019-09-04-rasa\/figure_nlu.png\" alt=\"figure_nlu\" \/><\/p>\n\n    <ul>\n      <li>NLU\ub294 \ud558\ub098\uc758 \ubb38\uc7a5, \uc778\ud150\ud2b8, \uc5d4\ud2f0\ud2f0(list)\ub85c \uad6c\uc131\ub41c \uc0d8\ud50c\ub85c \ud559\uc2b5<\/li>\n    <\/ul>\n  <\/li>\n  <li>\n    <p>Rasa Core (DM): md \ud3ec\ub9f7 \uc9c0\uc6d0<\/p>\n\n    <p><img src=\"\/assets\/images\/blog\/2019-09-04-rasa\/figure_dm.png\" alt=\"figure_dm\" \/><\/p>\n\n    <ul>\n      <li>\uc704\uc640 \uac19\uc740 dialogue \ud615\ud0dc<\/li>\n      <li>\uc81c\uc77c \uc717\uc904\uc740 \ub300\ud654\uc758 hash code<\/li>\n      <li>\ubc14\ub514 \ubd80\ubd84\uc740 sequence of \uc774\ubca4\ud2b8\n        <ul>\n          <li>\uc5ec\uae30\uc11c \uc774\ubca4\ud2b8\ub77c \ud558\uba74 inform{\u201clocation\u201d: \u201crome\u201d\u2026} \uac19\uc740 \uac70<\/li>\n          <li>inform\uc740 dialogue act\ub77c \ubcf4\uba74 \ub428<\/li>\n        <\/ul>\n      <\/li>\n      <li>\uc2dc\uc2a4\ud15c\uc758 \uc561\uc158 \ub610\ud55c \uc774\ubca4\ud2b8\uc774\uba70, \uc704\uc5d0\uc11c dash(-)\uac00 \ubd99\uc740 \ub77c\uc778\uc774 \uc774\uc5d0 \ud574\ub2f9\ub428<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"42-machine-teaching\">4.2 Machine Teaching<\/h3>\n\n<ul>\n  <li>Rasa Core\ub294 supervised learning\uacfc \ud568\uaed8 machine teaching\uc774\ub77c\ub294 \uac78 \uc9c0\uc6d0<\/li>\n  <li>machine teaching\uc774\ub780 \uac1c\ubc1c\uc790\uac00 \uc2dc\uc2a4\ud15c\uc774 \ub9cc\ub4e0 \uc561\uc158\uc744 \uace0\uce58\ub294 \uac70\uc784<\/li>\n  <li>\n    <p>\uc774\uac74 \ud559\uc2b5 \ub370\uc774\ud130 \uc0dd\uc131\uacfc \uadf8\ub7f4\ub4ef\ud55c \ub300\ud654\uc758 \uacf5\uac04\uc744 \ud0d0\uc0c9\ud558\uae30 \uc704\ud55c practical\ud55c \uc5b4\ud504\ub85c\uce58<\/p>\n\n    <p><img src=\"\/assets\/images\/blog\/2019-09-04-rasa\/figure_teach.png\" alt=\"figure_teach\" \/><\/p>\n  <\/li>\n  <li>\uc704\uc640 \uac19\uc740 \ud504\ub86c\ud504\ud2b8\ub97c \uc720\uc800\uac00 \ubcf4\uace0 \ucc98\ub9ac\ud568<\/li>\n  <li>\ubd07\uc758 \uc561\uc158, \uc720\uc800\uc758 \ubc1c\ud654 \ubc0f \uc778\ud150\ud2b8\ub97c \ubcf4\uace0 \ubd07\uc774 \uc120\ud0dd\ud55c \uc561\uc158\uc774 \uc801\uc808\ud55c\uc9c0 \ud3c9\uac00\ud558\ub294 \uac83\uc784<\/li>\n  <li>\uc704\uc758 \ubcf4\uae30 \uc911 2\ubc88\uc744 \uc720\uc800\uac00 \uc120\ud0dd\ud588\ub2e4\uba74, \ub2e4\ub978 \uc561\uc158\ub4e4\uc774 \ubd07\uc758 \uc608\uce21 \ud655\ub960\uacfc \ud568\uaed8 \uc8fc\uc5b4\uc9c0\uace0 \uc774\uc911 \ubb50\uac00 \uc801\uc808\ud55c\uc9c0 \uace8\ub77c\uc57c \ud568<\/li>\n  <li>\uc704\uc640 \uac19\uc774 \ub370\uc774\ud130\ub97c \uc0dd\uc131\ud558\uace0 \uc774\ub97c \ud559\uc2b5\ud55c\ub2e4\ub294 \uac70 \uac19\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"5-demonstration\">5. Demonstration<\/h2>\n\n<ul>\n  <li>Rasa Core\uc758 \uc720\uc6a9\ud568\uc744 \ubcf4\uc774\uae30 \uc704\ud574, bAbI \ub370\uc774\ud130\uc14b\uc744 \uc0ac\uc6a9<\/li>\n  <li>\uac04\ub2e8\ud55c slot filling \ubb38\uc81c<\/li>\n  <li>\uac19\uc740 \uc815\ubcf4(slot value)\ub97c \uc5bb\uc5b4\ub0b4\uae30 \uc704\ud55c \ubc29\ubc95\uc740 \uc5ec\ub7ec\uac00\uc9c0\uac00 \uc788\uae30 \ub54c\ubb38\uc5d0 non-linearity\ub97c \ub0b4\uc81c\ud55c \ubb38\uc81c\ub77c\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\uadf8\ub798\uc11c accuracy\ub098 precision\ub294 dialogue policy\ub97c \ud3c9\uac00\ud558\uae30\uc5d0 \uc801\uc808\ud55c metric\uc774 \uc544\ub2d0 \uc218 \uc788\uc74c<\/li>\n  <li>\uc2e4\ud5d8\uc744 \ud574\ubcf4\uba74 \uc774\ubbf8 slot value\ub97c \uc5bb\uc740 \uac83\uc5d0 \ub300\ud574\uc11c\ub294 \uc2dc\uc2a4\ud15c\uc774 \ud574\ub2f9 \uc815\ubcf4\ub97c \uc5bb\uae30 \uc704\ud55c \uc9c8\ubb38(\uc561\uc158)\uc744 \ud558\uc9c0 \uc54a\uc74c (\ud655\ub960\uc774 \ub0ae\uac8c \ucc45\uc815\ub428)<\/li>\n<\/ul>\n\n<h2 id=\"6-outlook\">6. Outlook<\/h2>\n\n<ul>\n  <li>Rasa NLU and Core \uc9f1\uc9f1<\/li>\n  <li>\ub9ce\uc740 contribution \ubc14\ub78c<\/li>\n<\/ul>\n","pubDate":"Wed, 04 Sep 2019 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/rasa\/","guid":"https:\/\/roomylee.github.io\/rasa\/","category":["dialog-system","rasa","blog"]},{"title":"Towards Universal Dialogue State Tracking (EMNLP 2018)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1810.09587\">https:\/\/arxiv.org\/abs\/1810.09587<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Liliang Ren, Kaige Xie, Lu Chen and Kai Yu<\/li>\n      <li>Shanghai Jiao Tong University<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>EMNLP 2018 (Oral)<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Dialogue State Tracking (DST)\ub780 dialogue system\uc5d0 \uc788\uc5b4\uc11c \ub9e4\uc6b0 \uc911\uc694\ud55c \ud30c\ud2b8\ub85c\uc11c, <strong>\ub9e4 \ub300\ud654 \ud134 \ubcc4\ub85c \uc720\uc800\uc758 \ubaa9\ud45c(goal)\uc5d0 \ub300\ud55c \uac00\ub2a5\uc131\uc744 \uc608\uce21\ud558\ub294 \ubb38\uc81c<\/strong>\uc784<\/li>\n  <li>\ud558\uc9c0\ub9cc \ub300\ubd80\ubd84\uc758 \ucd5c\uadfc \uc5b4\ud504\ub85c\uce58\ub4e4\uc740 \ub113\uc740 \ubc94\uc704\uc758 \ub300\ud654 \ub3c4\uba54\uc778\uc73c\ub85c \uc778\ud574 \uc544\ub798\uc758 \ud55c\uacc4 \ubc0f \uc5b4\ub824\uc6c0\uc774 \uc788\uc74c\n    <ul>\n      <li>Ontology\uc758 \uc2ac\ub86f \uac12\uc774 \ub3d9\uc801\uc73c\ub85c \ubcc0\ud558\ub294 \ud658\uacbd\uc5d0\uc11c \uc798 \uc791\ub3d9\ud558\uc9c0 \uc54a\uc74c<\/li>\n      <li>\uc2ac\ub86f\uc758 \uac1c\uc218\uc5d0 \ube44\ub840\ud574\uc11c \ubaa8\ub378 \ud30c\ub77c\ubbf8\ud130 \ud06c\uae30\uac00 \ucee4\uc9d0<\/li>\n      <li>Hand-crafted lexicon feature\ub97c \uae30\ubc18\uc73c\ub85c \ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774\ub7f0 \ubb38\uc81c\ub4e4\uc744 \ud574\uacb0\ud558\uae30\uc704\ud574, <strong>\uc6b0\ub9ac\ub294 Universal Dialogue State Tracker\uc778 StateNet\uc744 \uc81c\uc548<\/strong>\ud568<\/li>\n  <li>StateNet\uc740 \ud2b9\uc9d5 \ubc0f contribution\uc740 \ub2e4\uc74c\uacfc \uac19\uc74c\n    <ul>\n      <li>\ubaa8\ub4e0 \uc2ac\ub86f\uc5d0 \ub300\ud574 \ud30c\ub77c\ubbf8\ud130\ub97c \uacf5\uc720\ud558\uae30 \ub54c\ubb38\uc5d0 \uc2ac\ub86f\uc758 \uac1c\uc218\uc5d0 \ub300\ud574 \ubaa8\ub378 \ud06c\uae30\uac00 \ub3c5\ub9bd\uc801\uc784<\/li>\n      <li>Lexicon feature (explicit semantic dictionaries) \ub300\uc2e0\uc5d0 pre-trained word vector\ub97c \uc0ac\uc6a9\ud568<\/li>\n      <li>DST \ubd84\uc57c\uc758 \ub300\ud45c\uc801\uc778 \ub370\uc774\ud130\uc14b\uc778 DSTC\uc640 WOZ\uc5d0 \ub300\ud574\uc11c state-of-the-art \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n      <li>\ub610\ud55c \uc2e4\ud5d8 \uacb0\uacfc\ub97c \ud1b5\ud574 \uc704\uc5d0 \uc5b8\uae09\ud55c \ud55c\uacc4\uc810\ub4e4\uc744 \uadf9\ubcf5\ud588\ub2e4\ub294 \uac83\uc744 \ubcf4\uc5ec\uc90c<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Task-oriented dialogue system\uc740 \ud06c\uac8c input, output, control, 3\uac1c\uc758 \ubaa8\ub4c8\ub85c \uad6c\uc131\ub428<\/li>\n  <li>Control\uc740 \ud754\ud788 dialogue management\ub77c\uace0 \ubd88\ub9ac\uba70 \uc774 \ubaa8\ub4c8\uc740 dialogue state tracking\uacfc decision making\uc774\ub77c\ub294 2\uac00\uc9c0 \ubbf8\uc158\uc774 \uc788\uc74c\n    <ul>\n      <li>State tracker\ub294 \ub9e4 \ub300\ud654 \ud134\ub9c8\ub2e4, input \ubaa8\ub4c8\ub85c\ubd80\ud130 \ubc1b\uc740 \uc815\ubcf4\ub97c \uae30\ubc18\uc73c\ub85c \uc2dc\uc2a4\ud15c\uc758 \ub0b4\ubd80 state\ub97c \uc720\uc9c0\uc2dc\ud0b4<\/li>\n      <li>\ub300\ud654\ub97c \uc9c4\ud589\uc2dc\ud0a4\uae30 \uc704\ud574 dialogue policy\uc5d0 \ub530\ub77c\uc11c \uc774 dialogue state\ub97c \uae30\ubc18\uc73c\ub85c \uba38\uc2e0\uc740 \uc561\uc158\uc744 \uacb0\uc815\ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li>Dialogue state\ub294 \uc804\uccb4 \ub300\ud654\uc5d0 \ub300\ud55c \uba38\uc2e0\uc758 \uc774\ud574\ub97c \uc778\ucf54\ub529\ud55c \uac83<\/li>\n  <li>\uc804\ud1b5\uc801\uc73c\ub85c \uc774 state\ub294 3\uac00\uc9c0 \ucef4\ud3ec\ub10c\ud2b8\ub97c \uac16\uc74c: 1) user\u2019s goal, 2) user\u2019s action, 3) dialogue history. \uc774\uc911\uc5d0\uc11c <strong>user\u2019s goal\uc774 \uac00\uc7a5 \uc911\uc694<\/strong>. <strong>user\u2019s goal\uc740 slot-value\uc758 \uc30d\uc73c\ub85c \ud45c\ud604<\/strong>\ud568<\/li>\n  <li>\n    <p>\uc6b0\ub9ac\ub294 \uc774 \ub17c\ubb38\uc5d0\uc11c user\u2019s goal\uc744 tracking\ud558\ub294 \ubd80\ubd84\ub9cc\uc744 \uc9d1\uc911\ud574\uc11c \ub2e4\ub8f0 \uac83<\/p>\n  <\/li>\n  <li>DST\ub97c \uc704\ud55c rule-based model, generative statistical model, discriminative statistical model \ub4f1\uc758 \ub2e4\uc591\ud55c \uc5b4\ud504\ub85c\uce58\uac00 \uc81c\uc548\ub428. SOTA\ub294 \ub525\ub7ec\ub2dd \uae30\ubc18<\/li>\n  <li>\uc774 \uc5b4\ud504\ub85c\uce58\ub4e4\uc5d0\ub294 \ub2e4\uc74c\uc758 \uba87 \uac00\uc9c0 \ud55c\uacc4\uc810\ub4e4\uc774 \uc788\uc74c\n    <ul>\n      <li>\ud2b9\uc815 \ub3c4\uba54\uc778\uc758 ontology\ub9cc\uc744 \uc0ac\uc6a9\ud560 \uc218 \uc788\uc74c. \ub2e4\ub978 \ub3c4\uba54\uc778 \uc0ac\uc6a9 \ubd88\uac00\n        <ul>\n          <li>\uc608\ub97c \ub4e4\uc5b4, \uad00\uad11 \ub3c4\uba54\uc778\uc744 \ub2e4\ub8e8\ub2e4\uac00 \ub808\uc2a4\ud1a0\ub791 \ub3c4\uba54\uc778\uc744 \ub2e4\ub8e8\ub824\uba74 ontology\ub97c \uad50\uccb4\ud574\uc57c \ud568<\/li>\n        <\/ul>\n      <\/li>\n      <li>\uac01 slot\uc5d0 \ub300\ud55c \ubaa8\ub378\uc774 \ub2e4\ub984. \uc2ac\ub86f\ub9c8\ub2e4 \ubaa8\ub378\uc774 \ud544\uc694\n        <ul>\n          <li>\ub530\ub77c\uc11c \uc2ac\ub86f\uc774 \ub298\uc5b4\ub098\uba74 \uc804\uccb4 \ubaa8\ub378\uc758 \ud30c\ub77c\ubbf8\ud130\ub3c4 \ube44\ub840\ud558\uac8c \uc99d\uac00<\/li>\n        <\/ul>\n      <\/li>\n      <li>Semantic dictionary\ub97c \uae30\ubc18\uc758 feature\ub97c \uc0ac\uc6a9\ud568\n        <ul>\n          <li>Large scale \ub3c4\uba54\uc778\uc5d0\uc11c\uc758 slot\uacfc value\uc5d0 \ub300\ud574 \uc774\ub7f0 dictionary\ub97c \uad6c\ucd95\ud558\ub294 \uac83\uc740 \ub9e4\uc6b0 \uc5b4\ub824\uc6c0<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774\ub7f0 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574 Universal Dialogue State Tracker, \uc77c\uba85 StateNet\uc744 \uc81c\uc548\ud568<\/li>\n  <li>StateNet\uc740 \uac01 state slot\uc5d0 \ub300\ud574 dialogue history\uc758 \uace0\uc815 \uae38\uc774 representation\uc744 \uc0dd\uc131\ud568<\/li>\n  <li>\uadf8\ub9ac\uace0 state\uc640 \ud6c4\ubcf4 representation \uac04\uc758 \ubca1\ud130 \uac70\ub9ac\ub97c \ud1b5\ud574 decision\uc744 \ud568. \ud6c4\ubcf4\ub294 \ub3d9\uc801\uc73c\ub85c \ubcc0\ud560 \uc218 \uc788\uc74c<\/li>\n  <li>StateNet\uc740 3\uac00\uc9c0 \ub370\uc774\ud130\uac00 \ud544\uc694\ud568\n    <ul>\n      <li>User utterance<\/li>\n      <li>Machine\uc758 act \uc815\ubcf4<\/li>\n      <li>slot\uacfc value\uc758 \uc774\ub984(literal)<\/li>\n    <\/ul>\n  <\/li>\n  <li>StateNet\uc740 \ubaa8\ub4e0 slot\uc5d0 \ub300\ud574 \ubaa8\ub4e0 \ud30c\ub77c\ubbf8\ud130\ub97c \uacf5\uc720\ud568. \ub530\ub77c\uc11c slot \uac04\uc758 \uc9c0\uc2dd\uc744 \ud2b8\ub79c\uc2a4\ud37c\ud560 \uc218 \uc788\uc744 \ubfd0 \uc544\ub2c8\ub77c, \ud30c\ub77c\ubbf8\ud130\ub3c4 \uc904\uc77c \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"2-statenet-a-universal-dialogue-state-tracker\">2. StateNet: A Universal Dialogue State Tracker<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-08-27-universal-dialogue-state-tracking\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>\uac01 \ub300\ud654 \ud134\ub9c8\ub2e4 StateNet\uc740 \ub2e4\uc74c\uc744 input\uc73c\ub85c \ubc1b\uc74c\n    <ul>\n      <li>Multiple n-gram <strong>user utterance<\/strong> representation $r_u^n$<\/li>\n      <li>m-gram <strong>machine act<\/strong> representation $r_a^m$<\/li>\n      <li><strong>value<\/strong> set $V_s$<\/li>\n      <li>word vector of <strong>slot<\/strong> $s$<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>\ubaa9\ud45c\ub294 slot $s$ \uc5d0 \ub300\ud574 \uc801\uc808\ud55c value $v \\in V_s$ \ub97c \uad6c\ud558\ub294 \uac83!<\/strong><\/li>\n  <li>StateNet\uc740 \ub0b4\ubd80 dialogue state\ub97c tracking\ud558\uae30 \uc704\ud574\uc11c LSTM\uc744 \uc801\uc6a9\ud568<\/li>\n  <li>StateNet\uc740 \ub9e4 \ub300\ud654 \ud134\ub9c8\ub2e4 \uac01 slot\uc5d0 \ub300\ud574\uc11c value set, $V_s$, \uc5d0 \ub300\ud55c \ud655\ub960 \ubd84\ud3ec\ub97c output\uc73c\ub85c \ub0b4\ubcf4\ub0c4\n    <ul>\n      <li>$p_s = \\text{StateNet}(r_u^n, r_a^m, s, V_s)$<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc804\uccb4 \ubaa8\ub378 \uc544\ud0a4\ud14d\uccd0\ub294 Figure 1\uacfc \uac19\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"21-user-utterance-representation\">2.1 User Utterance Representation<\/h3>\n\n<ul>\n  <li>Utterance\uc758 \uac01 \ub2e8\uc5b4 representation\uc744 \uad6c\ud568<\/li>\n  <li>n-gram representation\uc744 \uad6c\ud568. n\uac1c\uc758 \ub2e8\uc5b4 representation\uc758 concat<\/li>\n  <li>\uac01 n-gram representation\uc758 \ud569\uc774 \ucd5c\uc885 user utterance representation $r_u^n$<\/li>\n<\/ul>\n\n<h3 id=\"22-multi-scale-receptors-layer\">2.2 Multi-scale Receptors Layer<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-08-27-universal-dialogue-state-tracking\/figure2.png\" alt=\"figure2\" \/><\/p>\n\n<ul>\n  <li>k-gram (1&lt;=k&lt;=n)\uc758 representation \ud615\ud0dc\uac00 k\uac1c\ub97c concat\uc744 \ud558\ub294 \ubc29\uc2dd\uc774\uae30\uc5d0 k\uc5d0 \ub530\ub77c representation \ubaa8\uc591\uc774 \ub2e4\ub984. \ub530\ub77c\uc11c \ubca1\ud130 \ucc28\uc6d0\uc744 \ub9de\ucdb0\uc8fc\uae30 \uc704\ud574 linear layer\ub97c $c$ \uac1c \ubd99\uc784<\/li>\n  <li>$\\hat{r}<em>u^k = \\text{concat}<\/em>{j=1}^c (W_k^j r_u^k + b_k^j)$<\/li>\n  <li>$W_k^j$ \ub294 k-gram\uc5d0 \ub300\ud55c j \ubc88\uc9f8 linear layer. output \ucc28\uc6d0\uc740 \ubaa8\ub450 $N_c$ \ub85c \ub3d9\uc77c\ud568 (\ucc28\uc6d0\uc744 \ub9de\ucdb0\uc57c \ud568)<\/li>\n  <li>\uc989 \ubaa8\ub4e0 k-gram\uc5d0 \ub300\ud574\uc11c $N_c$ \ucc28\uc6d0\uc73c\ub85c $c$ \ubc88 projection \uc2dc\ud0b4. $c$ \ub294 \ucc44\ub110\uc758 \uac1c\ub150\uc774\uc9c0 \uc2f6\uc74c<\/li>\n  <li>\ucd5c\uc885\uc801\uc73c\ub85c $c$ \uac1c\uc758 linear layer\uc758 output\uc744 \ubaa8\ub450 concat\ud558\uc5ec user utterance\uc5d0 \ub300\ud55c k-gram\uc758 representation $\\hat{r}_u^k$ $\\in \\mathbb{R}^{N_c \\times c}$ \uc744 \uad6c\ud568 (Ba et al., 2016)<\/li>\n  <li>\ubaa8\ub4e0 k-gram\uc758 representation\uc744 \ubaa8\ub450 \ud569\ud558\uace0 layer normalization, relu, linear (project to ) \ub97c \ud0dc\uc6cc\uc11c \ucd5c\uc885 user feature vector $f_u \\in \\mathbb{R}^{N_c}$ \ub97c \uad6c\ud568<\/li>\n  <li>$f_u = \\text{Linear}(\\text{ReLU}(\\text{LayerNorm}(\\sum_{k=1}^{n} \\hat{r}_u^k)))$<\/li>\n<\/ul>\n\n<h3 id=\"23-machine-act-representation\">2.3 Machine Act Representation<\/h3>\n\n<ul>\n  <li>Machine act\uac00 \ubb54\uc9c0 \uc798 \ubaa8\ub974\uaca0\uc74c<\/li>\n  <li>\uca0b\ub4e0 \uc8fc\uc5b4\uc9c4 \ub370\uc774\ud130\uc14b\uc5d0 \uc788\ub294 machine act\ub85c vocab\uc744 \ub9cc\ub4e4\uace0 \uc544\ub798\uc640 \uac19\uc774 machine act feature vector $f_a \\in \\mathbb{R}^{N_c}$ \ub97c \uad6c\ud568<\/li>\n  <li>$f_a = \\text{ReLU}(\\text{Linear}(r_a^m))$<\/li>\n<\/ul>\n\n<h3 id=\"24-slot-information-decoding\">2.4 Slot Information Decoding<\/h3>\n\n<ul>\n  <li>Slot\uc5d0\ub294 area, food \ub4f1\uc774 \ud574\ub2f9\ub418\ub294\ub370, \uc8fc\ub85c \ud558\ub098\uc758 \ub2e8\uc5b4 \ud639\uc740 \uc9e7\uc740 \uad6c\ub85c \uc774\ub8e8\uc5b4\uc838 \uc788\uc74c<\/li>\n  <li>\uc989 slot\uc758 \uc774\ub984(area, food) \uadf8 \uc790\uccb4\uc5d0 \ub300\ud55c word vector\ub97c \uc0ac\uc6a9\ud558\uaca0\ub2e4\ub294 \ucee8\uc149.<\/li>\n  <li>Slot\uc758 representation $s$ \ub97c \uad6c\ud568. \uc5ec\ub7ec \ub2e8\uc5b4(\uad6c)\uc758 \uacbd\uc6b0 \uac01 \ub2e8\uc5b4 \ubca1\ud130\uc758 \ud569\uc73c\ub85c \ud45c\ud604<\/li>\n  <li>\uc704\uc758 user\uc640 machine feature\uc640\uc758 \uc5f0\uc0b0\uc744 \uc704\ud574 $2N_c$ \uc9dc\ub9ac linear \ud0dc\uc6c0<\/li>\n  <li>\n    <p>$f_s = \\text{ReLU}(\\text{Linear}(s))$<\/p>\n  <\/li>\n  <li>\uc9c0\uae08\uae4c\uc9c0 \uad6c\ud55c user, machine, slot feature\ub97c \ubaa8\ub450 \ud569\ud574\uc11c turn-level feature vector $i_s$ \ub97c \ub9cc\ub4e4 \uac83\uc784<\/li>\n  <li>$i_s = f_s \\otimes (f_u \\oplus f_a)$<\/li>\n  <li>user\uc640 machine feature\ub97c concat\ud558\uace0 ($2N_c$ \ucc28\uc6d0), \uadf8 \uacb0\uacfc\ub97c slot feature\uc640 point-wise multiplication\ud568<\/li>\n  <li>\uc774\ub807\uac8c \uad6c\ud55c turn-level feature vector\ub294 large magnitude signal\uc744 \uc99d\ud3ed\ub418\ub294 \uacbd\ud5a5\uc774 \uc788\uc74c (\ubb54 \uc18c\ub9ac\uc9c0\u2026 \ub2f9\uc5f0\ud55c \uac70 \uc544\ub2cc\uac00)<\/li>\n<\/ul>\n\n<h3 id=\"25-fixed-length-value-prediction\">2.5 Fixed-length Value Prediction<\/h3>\n\n<ul>\n  <li>\uac01 \ud134\uc5d0 \ub300\ud574\uc11c turn-level feature vector $i_s$ \uc5d0 \ub300\ud574 LSTM\uc744 \ud0dc\uc6c0<\/li>\n  <li>\uadf8\ub807\uac8c \ud574\uc11c \ucd5c\uc885\uc801\uc73c\ub85c fixed-length value prediction vector $o_s \\in \\mathbb{R}^{N_w}$ \ub97c \uc5bb\uc74c<\/li>\n  <li>$o_s = \\text{ReLU}(\\text{Linear}(\\text{LSTM}(i_s, q_{t-1})))$<\/li>\n  <li>\uc774\ub97c \uc774\uc6a9\ud574\uc11c value representation\uacfc\uc758 \uc720\uc0ac\ub3c4\ub97c \uad6c\ud574 \uac00\uc7a5 \uc801\uc808\ud55c value\ub97c \uad6c\ud560 \uc608\uc815\uc784<\/li>\n  <li>\uacb0\uad6d \uc9c0\uae08\uae4c\uc9c0\uc758 \uc791\uc5c5\ub4e4\uc744 \uc0b4\ud3b4\ubcf4\uba74, \ub300\ud654 \uc0c1\ud669 (utterance, machine feature)\uacfc context (LSTM)\ub97c \uace0\ub824\ud55c slot\uc758 representation $o_s$ \uc744 \ub9cc\ub4dc\ub294 \uacfc\uc815\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"26-2-norm-distance\">2.6 2-Norm Distance<\/h3>\n\n<ul>\n  <li>\uc704\uc5d0\uc11c \uad6c\ud55c slot representation\uacfc $V_s$ \uc744 \uad6c\uc131\ud558\ub294 value $v_i$ \ub4e4 \uc0ac\uc774\uc758 \uc720\uc0ac\ub3c4\ub97c \uad6c\ud558\ub294 \ubd80\ubd84<\/li>\n  <li>\uc6d0\ud558\ub294 value\uac00 \uc5c6\uc744 \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0 \u201cnone\u201d \uc774\ub77c\ub294 value\ub97c \ucd94\uac00\ud568<\/li>\n  <li>\n    <table>\n      <tbody>\n        <tr>\n          <td>$p_s(v_i) = \\text{Softmax}(-<\/td>\n          <td>\u00a0<\/td>\n          <td>o_s - v_i<\/td>\n          <td>\u00a0<\/td>\n          <td>)$, $v_i$ \ub294 $v_i$ \uc758 representation vector<\/td>\n        <\/tr>\n      <\/tbody>\n    <\/table>\n  <\/li>\n  <li>\uc704 \uc2dd\uc774 \ubb54\uac00 \uc798\ubabb\ub41c \uac83 \uac19\uc74c. $p_s(V_s)$ \ubb50 \uc774\ub7f0 \uc2dd\uc774 \ub418\uc5b4\uc57c \ud560 \uac83 \uac19\uc74c. \uc5ec\ud2bc \uc758\ubbf8\ub294 \ubaa8\ub4e0 value \ub2e8\uc5b4\ub4e4\uc5d0 \ub300\ud574 \uc720\uc0ac\ub3c4\ub97c \uad6c\ud558\uace0 \uc774\uc5d0 \ub300\ud574 softmax\ub97c \ucde8\ud55c\ub2e4\ub294 \uac83<\/li>\n  <li>CE\ub85c \uc5c5\ub370\uc774\ud2b8<\/li>\n<\/ul>\n\n<p>\ucd5c\uc885\uc801\uc73c\ub85c \uc815\ub9ac\ud558\uba74, StateNet\uc740 user utterance\uc640 word vector\ub85c \ud45c\ud604 \uac00\ub2a5\ud55c semantic slot\uacfc value\uac00 \ud544\uc694\ud568. word vector\ub294 pre-trained\ub97c \uc4f0\uace0 fine-tuning\uc740 \ud558\uc9c0 \uc54a\uc74c. StateNet\uc740 \uc704 \uc870\uac74\ub9cc \ub9cc\uc871\ud558\uba74 \ub418\uae30\uc5d0 \uc0c8\ub85c\uc6b4 slot\uc774\ub098 value\ub97c \uc5bc\ub9c8\ub4e0\uc9c0 \ucd94\uac00\ud560 \uc218 \uc788\uc74c. \uc774\ub7f0 \uc774\uc720\ub85c StateNet\uc744 \u201cuniversal\u201d dialogue state tracker\ub77c\uace0 \uc598\uae30\ud55c \uac83\uc784<\/p>\n\n<h2 id=\"3-experiments\">3. Experiments<\/h2>\n\n<ul>\n  <li>DSTC2\uc640 WOZ 2.0\uc73c\ub85c \uc131\ub2a5 \ud3c9\uac00<\/li>\n  <li>Slot = {food, pricerange, area}<\/li>\n  <li>\uc81c\uc548\ud558\ub294 \ubaa8\ub378\uc5d0 \ub300\ud574 3\uac00\uc9c0 variation\uc744 \ub460\n    <ul>\n      <li>StateNet: slot \ubcc4 \ud30c\ub77c\ubbf8\ud130 \uacf5\uc720\ud558\uc9c0 \uc54a\uc74c. 3\uac1c\uc758 slot\uc5d0 \ub300\ud574 3\uac1c\uc758 \ubaa8\ub378\uc744 \ub9cc\ub4ec. Initialization \uc6a9\ub3c4\uc758 pre-trained model\ub3c4 \uc0ac\uc6a9\ud558\uc9c0 \uc54a\uc74c<\/li>\n      <li>StateNet_PS: slot \ubcc4 \ud30c\ub77c\ubbf8\ud130 \uacf5\uc720. \ud558\ub098\uc758 \ubaa8\ub378\ub85c \ub3d9\uc77c\ud55c \ub300\ud654 \uc815\ubcf4\uc5d0 \ub300\ud574 3\uac1c\uc758 slot\uc5d0 \ub300\ud574 \uc608\uce21\uc744 \uc9c4\ud589. StateNet\uc5d0 \ube44\ud574 1\/3\uc758 \ud30c\ub77c\ubbf8\ud130 \ud06c\uae30<\/li>\n      <li>StateNet_PSI: \ud30c\ub77c\ubbf8\ud130 \uacf5\uc720\uc640 \ud568\uaed8 pre-trained model\ub85c initialization\ud568. \uc5ec\uae30\uc11c pre-training\uc774\ub780, \uac01 single slot\uc5d0 \ub300\ud574\uc11c\ub9cc \ubaa8\ub378\uc744 \ud559\uc2b5\uc2dc\ud0a4\ub294 \uacfc\uc815\uc744 \ub9d0\ud568. single slot\uc5d0 \ub300\ud55c \ubaa8\ub378 \uc911 validation \uc131\ub2a5\uc774 \uc81c\uc77c \uc88b\uc558\ub358 \ubaa8\ub378\uc758 weight\ub85c multi slot \ubaa8\ub378\uc744 initialization\ud558\ub294 \uac70\uc784. (\ub17c\ubb38\uc5d0 food\uc5d0 \ub300\ud55c \uc608\uc2dc\uac00 \ub098\uc624\ub294\ub370 \uc124\uba85\uc774 \uc774\uc0c1\ud568)<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-08-27-universal-dialogue-state-tracking\/table1.png\" alt=\"table1\" \/><\/p>\n\n<ul>\n  <li>SOTA \ucc0d\uc74c. Lexical feature \uc4f4 \ubaa8\ub378\ub4e4\ubcf4\ub2e4\ub3c4 \uc88b\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"\/assets\/images\/blog\/2019-08-27-universal-dialogue-state-tracking\/table2.png\" alt=\"table2\" \/><\/p>\n\n<ul>\n  <li>Pre-training\uc740 food\uc5d0 \ub300\ud574\uc11c \ud558\ub294 \uac8c \uac00\uc7a5 \uc131\ub2a5\uc774 \uc88b\uc558\uc74c<\/li>\n  <li>\uadf8 \uc774\uc720\ub294 food slot\uc774 \uac00\uc7a5 \uc5b4\ub824\uc6b4 \ubb38\uc81c\uc5d0 \ud574\ub2f9\ud558\ub294\ub370 \uc774\uc5d0 \ub300\ud574\uc11c \ucd08\uae30 \ud559\uc2b5\uc744 \ud55c \uac83\uc774 \uc804\uccb4 \uc131\ub2a5\uc744 \ub192\uc774\ub294\ub370 \uae30\uc5ec\ud588\uc744 \uac83\uc774\ub77c\uace0 \ucd94\uce21. Weakness slot\uc5d0 \ub300\ud55c boosting\uc758 \uac1c\ub150\uc73c\ub85c \uc0dd\uac01\ud574\ubcfc \uc218 \uc788\ub2e4\ub294 \uac83<\/li>\n<\/ul>\n","pubDate":"Fri, 23 Aug 2019 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/universal-dialogue-state-tracking\/","guid":"https:\/\/roomylee.github.io\/universal-dialogue-state-tracking\/","category":["dialogue-state-tracking","state-net","blog"]},{"title":"The Design and Implementation of XiaoIce, an Empathetic Social Chatbot (arXiv 2018)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1812.08989\">https:\/\/arxiv.org\/abs\/1812.08989<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum<\/li>\n      <li>MicroSoft &amp; MicroSoft Research<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>arXiv 2018<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>XiaoIce \uc2dc\uc2a4\ud15c \uac1c\ubc1c\uc5d0 \ub300\ud55c description<\/li>\n  <li>Intelligent Quotient (IQ)\uc640 Emotional Quotient (EQ), \ub450 \uac00\uc9c0\ub97c \ubaa8\ub450 \uace0\ub824<\/li>\n  <li>Key components of the system architecture:\n    <ol>\n      <li>Dialogue Manager<\/li>\n      <li>Core Chat<\/li>\n      <li>Dialogue Skills<\/li>\n      <li>Empathetic Computing Module<\/li>\n    <\/ol>\n  <\/li>\n  <li>CPS = 23<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Eliza, Parry, ALice \ub4f1\uc758 \ucd08\uae30 \ub300\ud654 \uc2dc\uc2a4\ud15c\uc740 Turing Test\ub97c \ud1b5\uacfc\ud558\uae30 \uc704\ud574 \ud14d\uc2a4\ud2b8 \uae30\ubc18\uc73c\ub85c \uc0ac\ub78c\uc744 \ud749\ub0b4\ub0b4\ub3c4\ub85d \ub514\uc790\uc778\ub428<\/li>\n  <li>\uaf64 \uad1c\ucc2e\uc740 \uacb0\uacfc\ub97c \ubcf4\uc5ec\uc92c\uc73c\ub098, hand-crafted rules\uc5d0 \uae30\ubc18\ud558\uace0 \ud2b9\uc815 \ub300\ud654 \ud658\uacbd\uc5d0\uc11c\ub9cc \uc798 \uc791\ub3d9\ud55c\ub2e4\ub294 \ud55c\uacc4\uac00 \uc788\uc74c<\/li>\n  <li>\ucd5c\uadfc\uc5d0\ub294 \ub9ce\uc740 \uc591\uc758 \ub300\ud654 \ub370\uc774\ud130\uc640 \uba38\uc2e0\ub7ec\ub2dd\uc758 \ud601\uc2e0\uc744 \ud1b5\ud574 \ud559\uacc4\uc640 \uc0b0\uc5c5\uc5d0\uc11c \uc720\ub9dd\ud55c \uacb0\uacfc\ub4e4\uc774 \ub098\uc624\uace0 \uc788\uc74c<\/li>\n  <li>\uc774 \ub17c\ubb38\uc5d0\uc11c Microsoft\uc758 \uc18c\uc15c \ucc57\ubd07\uc778 <strong>XiaoIce\uc758 \ub514\uc790\uc778\uacfc \uad6c\ud604\uc5d0 \ub300\ud574\uc11c \uc18c\uac1c\ud560 \uac83<\/strong>\uc784<\/li>\n  <li>XiaoIce\ub294 5\uac1c\uad6d(\uc911\uad6d, \uc77c\ubcf8, \ubbf8\uad6d, \uc778\ub3c4, \uc778\ub3c4\ub124\uc2dc\uc544; \uc77c\ubcf8\uc5d0\uc11c\ub294 Rinna)\uc5d0\uc11c 40\uac1c \uc774\uc0c1\uc758 \ud50c\ub7ab\ud3fc(\uc704\ucc57, \ud050\ud050, \uc6e8\uc774\ubcf4, \ud398\uba54, \ub77c\uc778)\uc744 \ud1b5\ud574 \uc11c\ube44\uc2a4 \uc911\uc784<\/li>\n  <li>XiaoIce\ub294 Open-domain \ucc57\ubd07\uc73c\ub85c\uc11c \uc0ac\ub78c\uacfc \uc7a5\uae30\uc801 \uad00\uacc4\ub97c \ud615\uc131\ud560 \uc218 \uc788\ub2e4\ub294 \uc810\uc5d0\uc11c Siri, Alexa, Google Assistant \ub4f1\uacfc \ucc28\ubcc4\uc810\uc744 \ubcf4\uc784<\/li>\n  <li><em>Figure 1\uc5d0\uc11c XiaoIce\uc640 \uc0ac\ub78c\uc774 \uc57d 2\ub2ec\uac04 \uad00\uacc4\ub97c \ud615\uc131\ud558\uace0 \ubc1c\uc804\uc2dc\ucf1c\ub098\uac00\ub294 \uc608\uc2dc\ub294 \uc815\ub9d0 \uc2e0\uae30\ud558\uace0 \ub180\ub77c\uc6c0<\/em><\/li>\n<\/ul>\n\n<h2 id=\"2-design-principle\">2. Design Principle<\/h2>\n\n<ul>\n  <li>\uc18c\uc15c \ucc57\ubd07\uc740 \ud2b9\uc815 \ud14c\uc2a4\ud06c\ub97c \uc798 \ucc98\ub9ac\ud558\uae30 \uc704\ud55c IQ \uc5ed\ub7c9 \ubfd0 \uc544\ub2c8\ub77c, \uc0ac\ub78c\uc758 \uac10\uc815\uc5d0 \ub300\uc751\ud560 \uc218 \uc788\ub294 EQ \uc5ed\ub7c9\ub3c4 \ub9e4\uc6b0 \uc911\uc694\ud568<\/li>\n  <li>IQ\uc640 EQ\ub97c \ud1b5\ud569\uc2dc\ud0a4\ub294 \uac8c XiaoIce\uc758 \ud575\uc2ec \uc2dc\uc2a4\ud15c \ub514\uc790\uc778\uc784<\/li>\n  <li>XiaoIce\ub294 \uc790\uccb4\uc758 \uc720\ub2c8\ud06c\ud55c personality\ub97c \uac16\uace0 \uc788\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"21-iq--eq--personality\">2.1 IQ + EQ + Personality<\/h3>\n\n<ul>\n  <li><strong>IQ<\/strong>\n    <ul>\n      <li>IQ \uc5ed\ub7c9\uc740 \ub2e4\uc74c\uc744 \ud3ec\ud568\ud568: 1) \uc9c0\uc2dd\uacfc \uae30\uc5b5\uc758 \ubaa8\ub378\ub9c1, 2) \uc774\ubbf8\uc9c0\uc640 \uc790\uc5f0\uc5b4\uc758 \uc774\ud574, 3) \ucd94\ub860, 4) \uc0dd\uc131, 5) \uc608\uce21<\/li>\n      <li>\uc774\uac83\ub4e4\uc740 Dialogue Skills \uac1c\ubc1c\uc5d0 \uc788\uc5b4\uc11c \uae30\ucd08\uac00 \ub418\uace0 \uc18c\uc15c \ucc57\ubd07\uc73c\ub85c\uc11c \uc0ac\ub78c\ub4e4\uc758 \ub2c8\uc988 \ucda9\uc871\uacfc \ud14c\uc2a4\ud06c \ud574\uacb0\uc744 \uc704\ud574 \ud544\uc218\uc801\uc784<\/li>\n      <li>\uc9c0\ub09c 5\ub144\uac04 XiaoIce\ub294 \uc9c8\uc758\uc751\ub2f5, \ucd94\ucc9c \ub4f1\uc5d0 \ub300\ud55c 230\uac1c\uc758 Dialogue Skills\uc744 \uac1c\ubc1c\ud568<\/li>\n      <li>\uac00\uc7a5 \uc911\uc694\ud55c Skill\uc740 \uc0ac\ub78c\uacfc \ub9ce\uc740 \uc8fc\uc81c\uc5d0 \ub300\ud574\uc11c \uc624\ub798 \ub113\uac8c \ub300\ud654\ud560 \uc218 \uc788\uac8c \ud574\uc8fc\ub294 \u201cCore Chat\u201d\uc774\ub77c\ub294 \uac83\uc784 (\ub4a4\uc5d0\uc11c \ub2e4\ub8f0 \uac83)<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>EQ<\/strong>\n    <ul>\n      <li>Empathy\uc640 Social Skills, \ud06c\uac8c 2\uac00\uc9c0 \ucef4\ud3ec\ub10c\ud2b8\uac00 \uc788\uc74c<\/li>\n      <li>Empathy\n        <ul>\n          <li>\ub2e4\ub978 \uc0ac\ub78c\uc758 \uacbd\ud5d8\uc5d0 \ub300\ud574 \uc774\ud574\ud558\uace0 \uacf5\uac10\ud558\ub294 \uc5ed\ub7c9 (\uc5ed\uc9c0\uc0ac\uc9c0)<\/li>\n          <li>\ub300\ud654\ub85c\ubd80\ud130 \uc720\uc800\uc758 \uac10\uc815\uc744 \uc54c\uc544\ub0b4\uace0, \uc2dc\uac04\uc5d0 \ub530\ub77c \uc5b4\ub5bb\uac8c \ubcc0\ud558\ub294\uc9c0 \uac10\uc9c0\ud558\uace0, \uc720\uc800\uc758 \uac10\uc815\uc801 \ub2c8\uc988\ub97c \uc774\ud574\ud558\ub294 \uac83\uc740 \uc18c\uc15c \ucc57\ubd07\uc5d0\uac8c \uaf2d \uc788\uc5b4\uc57c \ud560 \ub2a5\ub825\uc784<\/li>\n          <li>\uc774\ub97c \uc704\ud574\uc11c\ub294 \ub2e4\uc74c\uc758 \ubb38\uc81c\ub97c \ud574\uacb0\ud574\uc57c \ud568: 1) query understanding, 2) user profiling, 3) emotion detection, 4) sentiment recognition, 5) dynamically mood tracking<\/li>\n        <\/ul>\n      <\/li>\n      <li>Social Skills\n        <ul>\n          <li>\uc720\uc800\ub294 \ub2e4\uc591\ud55c \ubc30\uacbd\uacfc \uad00\uc2ec\uc0ac\ub97c \uac16\uace0 \uc788\uae30 \ub54c\ubb38\uc5d0 \uc18c\uc15c \ucc57\ubd07\uc740 personalize\ub41c \uc751\ub2f5\uc744 \ud574\uc57c \ud568<\/li>\n          <li>\uc774\ub54c\uc758 \uc751\ub2f5\uc740 \uac10\uc815\uc801\uc73c\ub85c\ub3c4 \uc801\uc808\ud574\uc57c \ud558\uba70, \uc720\uc800\uc758 \uad00\uc2ec\uc0ac\uc5d0\ub3c4 \uc54c\ub9de\uc544\uc57c \ud568<\/li>\n        <\/ul>\n      <\/li>\n      <li>Figure 2\ub294 XiaoIce\uc758 EQ \uc5ed\ub7c9\uc744 \uc798 \ubcf4\uc5ec\uc90c. \uc0ac\ud68c\uc801\uc73c\ub85c \uc801\uc808\ud55c \ub2f5(\uc720\uba38, \uc704\ub85c)\uc744 \ud560 \ubfd0\ub9cc \uc544\ub2c8\ub77c, \ud560 \uc598\uae30\uac00 \uc5c6\uc744 \ub54c \uc0c8\ub85c\uc6b4 \uc8fc\uc81c\ub85c \uc774\ub04c\uc5b4 \ub098\uac00\uae30\ub3c4 \ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Personality<\/strong>\n    <ul>\n      <li>\ud589\ub3d9, \uc778\uc9c0, \uac10\uc815 \ud328\ud134 \ub4f1\uc758 \uc9d1\ud569\uc73c\ub85c \uc815\uc758\ub428<\/li>\n      <li>\uc18c\uc15c \ucc57\ubd07\uc740 \uc77c\uad00\ub41c personality\ub97c \ubcf4\uc5ec\uc918\uc57c \uc720\uc800\uc640 \uc624\ub79c \uc2e0\ub8b0 \uad00\uacc4\ub97c \ud615\uc131\ud560 \uc218 \uc788\uc74c<\/li>\n      <li>XiaoIce\uc758 \ud398\ub974\uc18c\ub098\ub294 reliable\ud558\uace0 sympathetic\ud558\uace0 affectionate\ud558\uace0 sense of humor\ub97c \uac16\ucd98 18\uc0b4 \uc5ec\uc790\uc544\uc774\ub85c \ub514\uc790\uc778\ud568<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"22-social-chatbot-metric-cps\">2.2 Social Chatbot Metric: CPS<\/h3>\n\n<ul>\n  <li>\uc8fc\ub85c Turing Test\ub85c \uce6b\ucc57 \uc131\ub2a5\uc744 \ud3c9\uac00\ud558\ub294\ub370, \uc774\ub294 \uc5bc\ub9c8\ub098 \uc624\ub7ab\ub3d9\uc548 \ub300\ud654\ud558\ub294\uc9c0 \uac10\uc815\uc801\uc73c\ub85c \uad50\ub958\ud558\ub294\uc9c0 \uc54c \uc218 \uc5c6\uc74c<\/li>\n  <li>\uadf8\ub798\uc11c \uc6b0\ub9ac\ub294 Conversation-turns Per Session (CPS)\ub77c\ub294 \uac83\uc744 \uc18c\uc15c \ucc57\ubd07\uc758 \ud3c9\uac00 \uc9c0\ud45c\ub85c \uc815\uc758\ud568<\/li>\n<\/ul>\n\n<h3 id=\"23-social-chat-as-hierarchical-decision-making\">2.3 Social Chat as Hierarchical Decision-Making<\/h3>\n\n<ul>\n  <li>\uc704\uc758 \ub514\uc790\uc778 \ubaa9\ud45c\ub97c \ub2ec\uc131\ud558\uae30\uc704\ud574, \uc6b0\ub9ac\ub294 \uc18c\uc15c \ucc57\ubd07 \ubb38\uc81c\ub97c Decision-Making Process\ub85c \ubc14\ub77c\ubd04<\/li>\n  <li>Decision-Making Process\ub97c \uacc4\uce35\uc801\uc73c\ub85c \ud568\n    <ul>\n      <li>top-level process\n        <ul>\n          <li>\uc804\uccb4\uc801\uc778 \ub300\ud654\ub97c \uad00\ub9ac\ud558\uace0 \ub300\ud654 \ubaa8\ub4dc\uc758 \ud0c0\uc785 \ubcc4 Skill\uc758 \uc120\ud0dd\uc5d0 \ub300\ud55c decision<\/li>\n          <li>ex) chatting casually, question answering, ticket booking<\/li>\n        <\/ul>\n      <\/li>\n      <li>low-level process\n        <ul>\n          <li>\uc120\ud0dd\ud55c \uc2a4\ud0ac\uc744 \ucee8\ud2b8\ub864\ud558\uace0, \ub300\ud654 \uc0dd\uc131\uc774\ub098 \ud14c\uc2a4\ud06c \ud574\uacb0\uc744 \uc704\ud55c \uc561\uc158 \uc120\ud0dd\uc5d0 \ub300\ud55c decision<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774\ub7f0 Decision-Making Process\ub294 \uc218\ud559\uc801\uc73c\ub85c Markov Decision Process (MDP)\ub85c \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\uc18c\uc15c \ucc57\ubd07\uc740 \uc0ac\ub78c\uacfc interacting\ud558\ub294 \ud658\uacbd\uc5d0 \ub193\uc778 \uac83\uc774\uba70, \uc774\ub7f0 \ud658\uacbd\uc5d0\uc11c \ud0d0\uc0c9\uc744 \ud568<\/li>\n  <li>\uac01 \ud134\uc5d0 \ub300\ud574\uc11c \ud604\uc7ac \ub300\ud654 \uc0c1\ud0dc(state), \uc120\ud0dd\ud560 \uc218 \uc788\ub294 \uc2a4\ud0ac(option) \ud639\uc740 \ub300\ub2f5(primary action)\uc744 Hierarchical Dialogue Policy\uc5d0 \ub530\ub77c\uc11c \uc120\ud0dd\ud558\uace0 \uc0c8\ub85c\uc6b4 state\ub97c \ud0d0\uc0c9\ud558\uac8c \ub418\ub294 \uac83. \ub9c8\uce58 MDP\ucc98\ub7fc<\/li>\n  <li>\ub610\ud55c \uc0c8\ub85c\uc6b4 state\uc5d0 \ub300\ud574\uc11c \uc720\uc800\uc758 \ubc18\uc751\uc774\ub77c\ub294 \ubcf4\uc0c1(reward)\uc744 \ubc1b\uac8c \ub428. \ub300\ud654\uac00 \uc885\ub8cc\ub420 \ub54c\uae4c\uc9c0 \uc774\ub7f0 \uc2f8\uc774\ud074\uc744 \ub3c4\ub294 \uac83\uc784<\/li>\n  <li>XiaoIce\ub294 Dialogue Manager\uac00 \uc774\ub7f0 \uac01 \ud134\uc5d0 \ub530\ub978 dialogue state tracking\uacfc policy\uc5d0 \ub530\ub978 response \uc120\ud0dd \ub4f1\uc744 \ub2f4\ub2f9\ud568<\/li>\n  <li>XiaoIce\ub294 CPS\ub97c \ub298\ub9ac\uae30\uc704\ud574, \uc704\uc640 \uac19\uc774 \ubc18\ubcf5\uc801\uc73c\ub85c trial-and-error\ub97c \ubc18\ubcf5\ud568. \uadf8\ub9ac\uace0 \ud56d\uc0c1 exploration\uacfc exploitation\uc758 tradeoff\uc5d0 \ub300\ud55c \ubc38\ub7f0\uc2a4\ub97c \uc798 \ub9de\ucd94\ub824\uace0 \ud568\n    <ul>\n      <li>exploit\ub780 \uc774\ubbf8 \uc54c\uace0 \uc788\ub294 \uac83\uc744 \uc798 \ud65c\uc6a9\ud558\ub294 \uac83<\/li>\n      <li>explore\ub780 \uae30\uc874 \uc720\uc800\ub294 \ub354 \uae4a\uac8c \uad00\uacc4\ub97c \ub9fa\uace0 \uc0c8\ub85c\uc6b4 \uc720\uc800\ub294 \ub04c\uc5b4\ub4e4\uc77c \uc218 \uc788\uac8c \ubaa8\ub974\ub294 \uac83(new skills and dialogue policies)\uc5d0 \ub300\ud574\uc11c \uc2dc\ub3c4\ud574\ubcf4\ub294 \uac83<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"3-system-architecture\">3. System Architecture<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-08-27-xiaoice\/figure1.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li><strong>User Experience Layer<\/strong>\n    <ul>\n      <li>WeChat \uac19\uc740 \ud50c\ub7ab\ud3fc\uacfc \uc5f0\uacb0\ub418\ub294 \ubd80\ubd84\uc774\uba70, 2\uac00\uc9c0 \ubaa8\ub4dc\uac00 \uc874\uc7ac\ud568\n        <ul>\n          <li>Full-Duplex\n            <ul>\n              <li>\uc74c\uc131 \uae30\ubc18 \ub300\ud654, \uc804\ud654\ud558\ub4ef\uc774 \uc11c\ub85c \ub3d9\uc2dc\uc5d0 \ub300\ud654\ud558\ub294 \uac83<\/li>\n            <\/ul>\n          <\/li>\n          <li>Taking turns\n            <ul>\n              <li>\uba54\uc138\uc9c0 \uae30\ubc18 \ub300\ud654, \ucc44\ud305\ud558\ub4ef\uc774 \uc11c\ub85c \uc8fc\uace0 \ubc1b\uc73c\uba70 \ub300\ud654\ud558\ub294 \uac83<\/li>\n            <\/ul>\n          <\/li>\n        <\/ul>\n      <\/li>\n      <li>\uc74c\uc131 \uc778\uc2dd, \uc74c\uc131 \ud569\uc131, \uc774\ubbf8\uc9c0 \uc774\ud574, \ud14d\uc2a4\ud2b8 \uc815\uaddc\ud654 \ub4f1\uc758 \ucef4\ud3ec\ub10c\ud2b8\uac00 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Conversation Engine Layer<\/strong>\n    <ul>\n      <li>\uc544\ub798\uc758 4\uac00\uc9c0 \ubaa8\ub4c8\ub85c \uad6c\uc131\ub428. Chapter 4\uc5d0\uc11c \uc790\uc138\ud788 \ub2e4\ub8f8\n        <ul>\n          <li>Dialogue Manager\n            <ul>\n              <li>Dialogue State\ub97c tracking\ud558\uace0 (Dialogue State Tracker)<\/li>\n              <li>Core Chat \ub610\ub294 Dialogue Skill \uc911\uc5d0 \uc5b4\ub5a4 \uac78 \uc120\ud0dd\ud560\uc9c0 dialogue policy\ub97c \uc0ac\uc6a9\ud574\uc11c \uc815\ud568<\/li>\n            <\/ul>\n          <\/li>\n          <li>Empathetic Computing Module\n            <ul>\n              <li>\ub0b4\uc6a9\uc801\uc778 \uce21\uba74\uc758 \uc774\ud574: topic<\/li>\n              <li>\uac10\uc815\uc801\uc778 \uce21\uba74\uc758 \uc774\ud574: emotion, intent, opinion on topic, user\u2019s background &amp; general interests<\/li>\n              <li>XiaoIce\uc758 EQ\ub97c \ub2f4\ub2f9, IQ\ub294 Core Chat\uacfc Skills<\/li>\n            <\/ul>\n          <\/li>\n          <li>Core Chat<\/li>\n          <li>Dialogue Skills<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Data Layer<\/strong>\n    <ul>\n      <li>\uc5ec\ub7ec \ub370\uc774\ud130 \ubca0\uc774\uc2a4\ub85c \uad6c\uc131\ub428\n        <ul>\n          <li>Conversational Data (text-text pair, text-image pair)<\/li>\n          <li>Non-conversational Data (text)<\/li>\n          <li>Knowledge Graph for Core Chat and Skills<\/li>\n          <li>XiaoIce Profile<\/li>\n          <li>All Registered User Profiles<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"4-implementation-of-conversation-engine\">4. Implementation of Conversation Engine<\/h2>\n\n<h3 id=\"41-dialogue-manager\">4.1 Dialogue Manager<\/h3>\n\n<ol>\n  <li><strong>Global State Tracker<\/strong>: \ud604\uc7ac \ub300\ud654\uc758 \uc0c1\ud0dc $s$ \ub97c \ud2b8\ub808\ud0b9\ud558\ub294 \uc5ed\ud560\n    <ul>\n      <li>working memory\ub97c \ud1b5\ud574 \uad00\ub9ac\ud568<\/li>\n      <li><strong>\uc138\uc158\uc758 \uc2dc\uc791\uc5d0\ub294 memory\ub97c \ube44\uc6b4 \uc0c1\ud0dc, \ub300\ud654 \uc9c4\ud589\uc5d0 \ub530\ub77c \uac01 \ud134 \ubcc4\ub85c \uc720\uc800\uc758 \ubc1c\ud654\uc640 XiaoIce\uc758 \ub300\ub2f5\uc744 memory\uc5d0 \uc800\uc7a5\ud574\ub098\uac10. \uc774\ub54c text\uc5d0\uc11c Empathetic Computing Module(Section 4.2\uc5d0\uc11c \uc18c\uac1c)\uc774 \ucd94\ucd9c\ud55c entity\uc640 empathy label\ub3c4 \ud568\uaed8 \uc800\uc7a5\ud568<\/strong><\/li>\n      <li>working memory\uc5d0 \uc788\ub294 \uc815\ubcf4\ub294 dialogue state vector $s$ \ub85c \uc778\ucf54\ub529\ub428<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Dialogue Policy<\/strong>: policy \ud568\uc218 $\\pi$ \ub97c \uae30\ubc18\uc73c\ub85c \ud604\uc7ac \ub300\ud654 \uc0c1\ud0dc $s$ \uc5d0\uc11c \ucde8\ud560 \uc561\uc158 $a$ \ub97c \uc815\ud568\n    <ul>\n      <li>Section 2.3\uc5d0\uc11c\ub3c4 \uc124\uba85\ud55c \uac83\ucc98\ub7fc XiaoIce\ub294 heirarchical policy\ub97c \uc0ac\uc6a9\ud558\uace0 \uc788\uc74c\n        <ul>\n          <li>Top-level policy: \uac01 \ud134\uc5d0\uc11c Core Chat\uacfc Skill\uc744 dialogue state\uc5d0 \ub530\ub77c \uc120\ud0dd\ud558\ubbc0\ub85c\uc11c \uc804\uccb4\uc801\uc778 \ub300\ud654\ub97c \uad00\ub9ac<\/li>\n          <li>Set of Low-level policies: \uac01 \uc2a4\ud0ac \ubcc4\ub85c policy\uac00 \uc788\uc74c<\/li>\n        <\/ul>\n      <\/li>\n      <li>XiaoIce\ub294 \uc720\uc800\uc758 \ud53c\ub4dc\ubc31\uc744 \uae30\ubc18\uc73c\ub85c trial-and-error process\ub97c \ub2e4\uc74c\uacfc \uac19\uc774 \ubc18\ubcf5\uc801\uc73c\ub85c \ub3d4\n        <ul>\n          <li>if \uc720\uc800\uc758 \uc778\ud48b == text\n            <ul>\n              <li>Core Chat\uac00 \ucc98\ub9ac<\/li>\n              <li>Topic Manager (Section 4.1.3\uc5d0\uc11c \uc18c\uac1c)\uc740 1) \uc0c8\ub85c\uc6b4 \ud1a0\ud53d\uc73c\ub85c \uc804\ud658\ud560\uc9c0 \ud639\uc740 2) \uc720\uc800\uc758 \ud765\ubbf8\uac00 \uac10\uc9c0\ub418\uc5c8\uc744 \ub54c General Chat\uc5d0\uc11c \ud2b9\uc815 Domain Chat\uc73c\ub85c \uc804\ud658\ud560\uc9c0\ub97c \uacb0\uc815\ud558\ubbc0\ub85c\uc11c Core Chat\uc744 \uad00\ub9ac\ud568 (\ub4a4\uc5d0 \uc18c\uac1c\ud558\uaca0\uc9c0\ub9cc, Core Chat\uc740 General Chat\uacfc Domain Chat\uc73c\ub85c \uad6c\uc131)<\/li>\n            <\/ul>\n          <\/li>\n          <li>else if \uc720\uc800\uc758 \uc778\ud48b == image or video\n            <ul>\n              <li>Image Commenting Skill\uc774 \ucc98\ub9ac<\/li>\n            <\/ul>\n          <\/li>\n          <li>Skills of Task Completion, Deep Engagement, and Content Creation\n            <ul>\n              <li>\ud2b9\uc815 \uc720\uc800 \uc778\ud48b\uacfc \ub300\ud654 \ubb38\ub9e5\uc5d0\uc11c \uc791\ub3d9\ud568<\/li>\n              <li>\uc608\ub97c \ub4e4\uc5b4,\n                <ul>\n                  <li>\uc74c\uc2dd \uc0ac\uc9c4\uc774 \ub4e4\uc5b4\uc624\uba74 Food Recognition\uacfc Recommendation Skill\uc774 \ud65c\uc131\ud654\ub428<\/li>\n                  <li>\ub9e4\uc6b0 \ubd80\uc815\uc801\uc778 \uac10\uc815(sentiments)\uac00 \uac10\uc9c0\ub418\uba74 Comforting Skill\uc774 \ud65c\uc131\ud654\ub428<\/li>\n                  <li>\u201cXiaoIce, what is the weather today\u201d\uc640 \uac19\uc740 \ud2b9\uc218\ud55c \uba85\ub839\uc774 \ub4e4\uc5b4\uc624\uba74 Weather Skill\uc774 \ud65c\uc131\ud654\ub428<\/li>\n                  <li>\ub9cc\uc57d\uc5d0 \uc5ec\ub7ec Skill\uc774 \ub3d9\uc2dc\uc5d0 \ubc1c\ub3d9\ub420 \uacbd\uc6b0, confidence score, pre-defined priority, session context\ub97c \uae30\ubc18\uc73c\ub85c \uc5ec\ub7ec \uac1c \uc911\uc5d0\uc11c \uc120\ud0dd\ud568<\/li>\n                  <li>\uc2a4\ubb34\uc2a4\ud55c \ub300\ud654 \uc9c4\ud589\uc744 \uc704\ud574\uc11c \uc7a6\uc740 Skill\uc758 \uc804\ud658\uc740 \ud53c\ud568. \uc0c8\ub85c\uc6b4 Skill\uc774 \ud65c\uc131\ud654\ub420 \ub54c\uae4c\uc9c0 \ud604\uc7ac Skill\uc744 \uacc4\uc18d \ub3cc\ub9bc<\/li>\n                <\/ul>\n              <\/li>\n            <\/ul>\n          <\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Topic Manage<\/strong>:<\/li>\n<\/ol>\n\n<ul>\n  <li>1) \uac01 \ud134\uc5d0\uc11c \ud1a0\ud53d\ub97c \uc804\ud658\ud560\uc9c0 \uacb0\uc815\ud558\ub294 Classifier\uacfc 2) \uc0c8\ub85c\uc6b4 \ud1a0\ud53d \ucd94\ucc9c\uc744 \uc704\ud55c Topic Retrieval Engine\uc73c\ub85c \uad6c\uc131\ub428<\/li>\n  <li>\ud1a0\ud53d \uc804\ud658(switching)\uc740 \ud1a0\ud53d\uc5d0 \ub300\ud55c \ucda9\ubd84\ud55c \uc9c0\uc2dd\uc774 \uc5c6\uac70\ub098 \uc720\uc800\uac00 \uc9c0\ub8e8\ud574 \ud560 \ub54c \ubc1c\ub3d9\ub418\uba70, \ubcf4\ud1b5 \ub2e4\uc74c\uacfc \uac19\uc740 \uc0c1\ud669\uc784\n    <ul>\n      <li>Core Chat\uc774 \uc801\uc808\ud55c \uc751\ub2f5\uc744 \uc0dd\uc131\ud574\ub0b4\uc9c0 \ubabb\ud588\uc744 \ub54c, editorial response (Section 4.3\uc5d0\uc11c \ub2e4\ub8f8) \ub97c \uc0ac\uc6a9\ud568<\/li>\n      <li>\uc0dd\uc131\ub41c \uc751\ub2f5 \ubc1c\ud654\uac00 \uc720\uc800 \uc778\ud48b\uc744 \ub2e8\uc21c\ud788 \ub530\ub77c\ud558\uac70\ub098 \ubcc4 \uc815\ubcf4\uac00 \uc5c6\ub294 \uacbd\uc6b0<\/li>\n      <li>\uc720\uc800 \uc778\ud48b\uc774 \ubcc4 \uac8c \uc5c6\uc744 \ub54c (ex. \u201cOK\u201d, \u201cI see\u201d, \u201cgo on\u201d)<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ud1a0\ud53d \ub370\uc774\ud130\uc14b\uc740 \uc778\uc2a4\ud0c0\uadf8\ub7a8\uc774\ub098 \uc911\uad6d\uc758 douban.com \uac19\uc774 \uc591\uc9c8\uc758 \uc778\ud130\ub137 \ud3ec\ub7fc\uc5d0\uc11c \uc720\uba85\ud55c \ud1a0\ud53d\uacfc \uad00\ub828 \ucf54\uba58\ud2b8 \ubc0f \ub514\uc2a4\ucee4\uc158\uc744 \uc218\uc9d1\ud568<\/li>\n  <li>\ud1a0\ud53d \uc804\ud658 \ud2b8\ub9ac\uac70\uac00 \ubc1c\ub3d9\ud558\uba74 \ud604\uc7ac \ub300\ud654 \uc0c1\ud0dc\ub97c \ucffc\ub9ac\ub85c \uc0ac\uc6a9\ud558\uc5ec \ud6c4\ubcf4 \ud1a0\ud53d\uc5d0 \ub300\ud55c \ud0d0\uc0c9\uc744 \uc9c4\ud589\ud568. \uba38\uc2e0\ub7ec\ub2dd \uae30\ubc18 boosted tree ranker\uc5d0 \uc758\ud574\uc11c \uc0c8\ub85c\uc6b4 \ud1a0\ud53d\uc774 \uc120\ud0dd\ub428. \ud574\ub2f9 \ubaa8\ub378\uc740 \ub2e4\uc74c\uc758 \ud53c\ucc98\ub97c \uc0ac\uc6a9\ud568\n    <ul>\n      <li>Contextual relevance: \uc5bc\ub9c8\ub098 \ud604\uc7ac \ub300\ud654 \ubb38\ub9e5\uacfc \uad00\ub828\uc774 \uc788\uace0, \uc544\uc9c1 \uc598\uae30\ud558\uc9c0 \uc54a\uc740 \uc0c8\ub85c\uc6b4 \ud1a0\ud53d\uc778\uac00<\/li>\n      <li>Freshness: \uc2dc\uac04\uc744 \uace0\ub824\ud558\uc600\uc744 \ub54c, \ud574\ub2f9 \ud1a0\ud53d\uc774 \uc5bc\ub9c8\ub098 \uc2e0\uc120\ud558\uace0 \uc720\ud6a8\ud55c\uac00. \ud2b9\ud788 \ub274\uc2a4<\/li>\n      <li>Personal interests: \uc720\uc800 \ud504\ub85c\ud30c\uc77c\uacfc \ud574\ub2f9 \ud1a0\ud53d\uc774 \uc5bc\ub9c8\ub098 \uc720\uc0ac\ud55c\uac00<\/li>\n      <li>Popularity: \uc778\ud130\ub137\uc774\ub098 XiaoIce \uc720\uc800\ub4e4 \uc0ac\uc774\uc5d0\uc11c \ud574\ub2f9 \ud1a0\ud53d\uc774 \uc5bc\ub9c8\ub098 \uc8fc\ubaa9 \ubc1b\ub294 \ud1a0\ud53d\uc778\uac00<\/li>\n      <li>Acceptance rate: XiaoIce \uc720\uc800\ub4e4\uc774 \uadf8\ub3d9\uc548 \ud574\ub2f9 \ud1a0\ud53d\uc744 \uc5bc\ub9c8\ub098 accept\ud588\ub294\uac00<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"42-empathetic-computing\">4.2 Empathetic Computing<\/h3>\n\n<ul>\n  <li>Empathetic Computing\uc740 XiaoIce\uc758 EQ\ub97c \ub2f4\ub2f9\ud558\uba70 \uac10\uc815\uc801\uc778 \uce21\uba74\uc744 \ubaa8\ub378\ub9c1\ud568. \ubaa8\ub378\ub9c1 \ub8e8\ud2f4\uc740 \ub2e4\uc74c\uacfc \uac19\uc74c\n    <ol>\n      <li>\uc720\uc800\uc758 \uc778\ud48b query $Q$ \uac00 \ub4e4\uc5b4\uc634<\/li>\n      <li>\ub300\ud654\uc758 \ubb38\ub9e5 $C$ \ub97c \uace0\ub824\ud574\uc11c $Q$ \ub97c $Q_C$ \ub85c \ub9cc\ub4ec(rewrite)<\/li>\n      <li>\ub300\ud654 \uc0c1\uc5d0\uc11c \uc720\uc800\uc758 \uac10\uc815\uacfc \uc0c1\ud0dc(state)\ub97c query empathy vector $e_Q$ \ub85c \uc778\ucf54\ub529\ud568<\/li>\n      <li>\ub300\ub2f5 $R$ \uc758 \uac10\uc801\uc801\uc778 \uce21\uba74\uc744 \uad6c\uccb4\ud654\ud574\uc11c response empathy vector $e_R$ \uc744 \ub9cc\ub4ec<\/li>\n    <\/ol>\n  <\/li>\n  <li><strong>\ucd5c\uc885\uc801\uc73c\ub85c Empathetic Computing\uc758 \uc544\uc6c3\ud48b dialogue state vector $s=(Q_C, C, e_Q, e_R)$ \ub85c \ud45c\ud604\ud568<\/strong><\/li>\n  <li>\uc774\ub294 Skill \uc120\ud0dd\uc744 \uc704\ud55c Dialogue Policy\uc640 interpersonal response \uc0dd\uc131\uc744 \uc704\ud55c Core Chat \ub4f1\uc5d0 \uc778\ud48b\uc73c\ub85c \uc8fc\uc5b4\uc9d0<\/li>\n  <li>Empathetic Computing Module\uc740 \ub2e4\uc74c\uc758 3\uac00\uc9c0 \ucef4\ud3ec\ub10c\ud2b8\ub85c \uad6c\uc131\ub428\n    <ul>\n      <li><strong>Contextual Query Understanding (CQU)<\/strong>\n        <ul>\n          <li>CQU\ub294 $Q$ \ub97c \ub300\ud654\uc758 \ubb38\ub9e5 \uc815\ubcf4 $C$ \ub97c \uace0\ub824\ud558\uc5ec $Q_C$ \ub85c \ubc14\uafd4\uc8fc\ub294 \uc5ed\ud560\uc744 \ud568. \uc704\uc758 \ubaa8\ub378\ub9c1 \ub8e8\ud2f4\uc5d0\uc11c 2\ubc88\uc5d0 \ud574\ub2f9. \ubc14\uafb8\ub294 \ubc29\ubc95\uc740 \ub2e4\uc74c\uacfc \uac19\uc74c\n            <ul>\n              <li>Named Entity Identification: $Q$ \uc5d0 \uc5b8\uae09\ub41c \ubaa8\ub4e0 entity\ub97c \ub808\uc774\ube14\ub9c1\ud558\uace0 state tracker\uc758 working memory\uc5d0 \uc800\uc7a5\ub41c entity\uc640 \uc5f0\uacb0(link)\ud568. \uc0c8\ub85c\uc6b4 entity\ub294 working memory\uc5d0 \uc800\uc7a5\ud568<\/li>\n              <li>Co-reference Resolution: \ubaa8\ub4e0 \ub300\uba85\uc0ac\ub97c \uadf8\uc5d0 \ud574\ub2f9\ud558\ub294 entity name\uc73c\ub85c \ub300\uccb4\ud568<\/li>\n              <li>Sentence Completion: $Q$ \uac00 \uc644\uc804\ud558\uc9c0 \uc54a\uc740 \ubb38\uc7a5\uc77c \ub54c, \ubb38\ub9e5 $C$ \ub97c \uc774\uc6a9\ud574\uc11c \ubb38\uc7a5\uc744 \uc644\uc131\ud568<\/li>\n            <\/ul>\n          <\/li>\n          <li>Figure 5\uc5d0\uc11c \ub300\uba85\uc0ac \u201chim\u201d\uc744 \u201cAshin\u201d, \u201cthat\u201d\uc744 \u201cThe Time Machine\u201d\uc73c\ub85c \uce58\ud658\ud55c \ubd80\ubd84 \ub4f1\uc744 \ucc38\uace0\ud558\uba74 \uc704\uc758 \ubc29\ubc95\uc744 \ub354 \uc798 \uc774\ud574\ud560 \uc218 \uc788\uc74c<\/li>\n          <li>\uc774\ub807\uac8c \ub9cc\ub4e0 contextual query $Q_C$ \ub294 Core Chat\uc5d0\uc11c \ub2f5\ubcc0\uc744 \uc0dd\uc131\ud558\ub294\ub370 \uc0ac\uc6a9\ud568 (Section 4.3\uc5d0\uc11c \uc18c\uac1c)<\/li>\n        <\/ul>\n      <\/li>\n      <li><strong>User Understanding<\/strong>\n        <ul>\n          <li>CQU\ub97c \ud1b5\ud574 \uad6c\ud55c $Q_C$ \uc640 \ubb38\ub9e5 $C$ \ub97c \uae30\ubc18\uc73c\ub85c query empathy vector $e_Q$ \ub97c \ub9cc\ub4dc\ub294 \ucef4\ud3ec\ub10c\ud2b8. \uc704 \ub8e8\ud2f4\uc5d0\uc11c 3\ubc88\uc5d0 \ud574\ub2f9\ud568<\/li>\n          <li>$e_Q$ \ub294 Figure 5 (b)\uc640 (c) \uac19\uc774 \uc720\uc800\uc758 intent, emotion, topic, opinion, persona \ub4f1\uc744 \ud45c\ud604\ud558\ub294 key-value \uc30d\uc758 \ub9ac\uc2a4\ud2b8\ub85c \uad6c\uc131\ub428<\/li>\n          <li>\uc774 key-value \uc30d\ub4e4\uc740 \ub2e4\uc74c\uc758 \uba38\uc2e0\ub7ec\ub2dd classifier\ub97c \uc0ac\uc6a9\ud574\uc11c \ub9cc\ub4e4\uc5b4\ub0c4\n            <ul>\n              <li>Topic Detection: \uc720\uc800\uac00 \uac19\uc740 \ud1a0\ud53d\uc744 \uc774\uc5b4\uac00\ub294\uc9c0, \ub610\ub294 \uc0c8\ub85c\uc6b4 \ud1a0\ud53d\uc744 \uaebc\ub0b4\ub294\uc9c0 \ub808\uc774\ube14\ub9c1<\/li>\n              <li>Intent Detection: $Q_C$ \uac00 \uc5b4\ub5a4 Dialogue Act\ub97c \uc0ac\uc6a9\ud558\ub294\uc9c0 (e.g., greet, request, inform, etc.)<\/li>\n              <li>Sentiment Analysis: \uc720\uc800\uc758 \uac10\uc815 (e.g., happy, sad, angry, neutral) \uacfc \ub300\ud654 \uc911\uc5d0 \uc5b4\ub5bb\uac8c \uac10\uc815\uc774 \ubcc0\ud558\ub294\uc9c0 (e.g., from happy to sad)<\/li>\n              <li>Opinion Detection: \ud1a0\ud53d\uc5d0 \ub300\ud55c \uc720\uc800\uc758 \ub9ac\uc561\uc158\uc774 \uc5b4\ub5a4\uc9c0 (e.g., positive, negative, neutral)<\/li>\n              <li>\ub9cc\uc57d \uc720\uc800\uc758 ID\ub97c \uc54c\uace0 \uc788\ub2e4\uba74, \uc720\uc800\uc758 \ud504\ub85c\ud30c\uc77c (\uc131\ubcc4, \ub098\uc774, \uad00\uc2ec\uc0ac, \uc9c1\uc5c5, \uc131\uaca9 \ub4f1)\uc5d0 \ub530\ub77c\uc11c \uc720\uc800\uc758 \ud398\ub974\uc18c\ub098\ub97c $e_Q$ \uc5d0 \ud3ec\ud568\uc2dc\ud0b4<\/li>\n            <\/ul>\n          <\/li>\n        <\/ul>\n      <\/li>\n      <li><strong>Interpersonal Response Generation<\/strong>\n        <ul>\n          <li>\uc774 \ucef4\ud3ec\ub10c\ud2b8\ub294 response empathy vector $e_R$ \uc744 \uc0dd\uc131\ud568. \uc774 \ubca1\ud130\ub294 \uc0dd\uc131\ub420 response\uc758 \uac10\uc815\uc801 \uc591\uc0c1\uacfc XiaoIce\uc758 \ud398\ub974\uc18c\ub098\ub97c \ub098\ud0c0\ub0c4<\/li>\n          <li>\uc5ec\ub7ec \ud734\ub9ac\uc2a4\ud2f1\uc744 \uc0ac\uc6a9\ud574\uc11c $e_Q$ \ub97c \uae30\ubc18\uc73c\ub85c $e_R$ \uc744 \ub9cc\ub4e4\uc5b4\ub0c4<\/li>\n          <li>\ub610\ud55c \ud398\ub974\uc18c\ub098 \uac19\uc740 \uacbd\uc6b0 XiaoIce\uc5d0 \ub9de\ucdb0\uc11c \ubbf8\ub9ac \uc815\uc758\ub41c \ud504\ub85c\ud30c\uc77c\uc744 \uc0ac\uc6a9\ud568<\/li>\n          <li>$e_Q$ \uc640 $e_R$ \uc73c\ub85c \uc5b4\ub5bb\uac8c \ub2f5\ubcc0\uc744 \uc0dd\uc131\ud574\ub0b4\ub294\uc9c0\ub294 \ub2e4\uc74c \uc139\uc158\uc5d0\uc11c \uc18c\uac1c\ud568<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"43-core-chat\">4.3 Core Chat<\/h3>\n\n<p><img src=\"\/assets\/images\/blog\/2019-08-27-xiaoice\/figure6.png\" alt=\"figure6\" \/><\/p>\n\n<ul>\n  <li>Core Chat\uc740 XiaoIce\uc5d0\uc11c \ub9e4\uc6b0 \uc911\uc694\ud55c \ucef4\ud3ec\ub10c\ud2b8\uc784. Empathetic Computing\uacfc \ub9c8\ucc2c\uac00\uc9c0\ub85c \ud14d\uc2a4\ud2b8 \uc778\ud48b\uc744 \ubc1b\uc544\uc11c interpersonal\ud55c \ub2f5\ubcc0\uc744 \uc544\uc6c3\ud48b\uc73c\ub85c \uc0dd\uc131\ud568<\/li>\n  <li>Core Chat\uc740 \ud06c\uac8c General Chat\uacfc Domain Chat \ud30c\ud2b8\ub85c \ub098\ub268\n    <ul>\n      <li>General Chat: Open-domain \ub300\ud654\ub97c \uc704\ud574 \ub113\uc740 \ubc94\uc704\uc758 \ud1a0\ud53d\uc744 \ucee4\ubc84\ud568<\/li>\n      <li>Domain Chat: \ub354 \uae4a\uc740 \ub300\ud654\ub97c \uc704\ud574 \uad6c\uccb4\uc801\uc778 \ub3c4\uba54\uc778(\uc74c\uc545, \uc601\ud654, \uc140\ub7fd)\uc744 \ub2e4\ub8f8<\/li>\n    <\/ul>\n  <\/li>\n  <li>General Chat\uacfc Domain Chat\uc740 \ub370\uc774\ud130\uc14b\uc740 \ub2ec\ub77c\ub3c4 \ub458\ub2e4 \uac19\uc740 \uc5d4\uc9c4(\ubaa8\ub378?)\uc744 \uc0ac\uc6a9\ud558\uace0 \uc788\uae30 \ub54c\ubb38\uc5d0, \uc544\ub798\uc5d0\uc11c General Chat\uc5d0 \ub300\ud574\uc11c\ub9cc \uc790\uc138\ud788 \ub2e4\ub8f8<\/li>\n  <li>General Chat\uc740 data-drive response generation system\uc784<\/li>\n  <li>\uc778\ud48b\uc73c\ub85c dialogue state $s=(Q_C, C, e_Q, e_R)$ \ub97c \ubc1b\uc544\uc11c, 1) \ud6c4\ubcf4 \ub2f5\ubcc0\uc744 \ub9cc\ub4e4\uace0 2) \ub7ad\ud0b9\uc744 \ub9e4\uaca8\uc11c \ucd5c\uc885 response $R$ \uc744 \uc544\uc6c3\ud48b\uc73c\ub85c \ub0b4\ub193\uc74c<\/li>\n  <li>\n    <p>1) \ud6c4\ubcf4 \ub2f5\ubcc0\uc740 \ub300\ud654\ub098 \ud14d\uc2a4\ud2b8\ub85c \uad6c\ucd95\ub41c \ub370\uc774\ud130\ubca0\uc774\uc2a4\ub97c \ud0d0\uc0c9(retrieve)\ud558\uac70\ub098, neural generative model\uc744 \ud1b5\ud574 \uc0dd\uc131\ud574\ub0c4. \uad6c\uccb4\uc801\uc73c\ub85c \uc544\ub798\uc758 3\uac00\uc9c0 Generator\ub97c \uc0ac\uc6a9\ud574\uc11c \ud6c4\ubcf4 \ub2f5\ubcc0\uc744 \ub9cc\ub4e4\uc5b4\ub0c4<\/p>\n\n    <ul>\n      <li><strong>Retrieval-Based Generator using Paired Data<\/strong>\n        <ul>\n          <li>Paired \ub370\uc774\ud130\uc14b\uc740 query-response \uc30d\uc73c\ub85c \uc774\ub8e8\uc5b4\uc838 \uc788\uc74c. \ub370\uc774\ud130\ub294 \uc544\ub798\uc758 2\uac00\uc9c0 \ubc29\ubc95\uc744 \ud1b5\ud574 \uc218\uc9d1\ud568\n            <ul>\n              <li>SNS, \ud3ec\ub7fc, \ub274\uc2a4 \ucf54\uba58\ud2b8 \ub4f1\uc758 \uc778\ud130\ub137\uc5d0\uc11c \uc0ac\ub78c\ub4e4 \uac04\uc758 \ub300\ud654\ub97c \uc218\uc9d1<\/li>\n              <li>2014\ub144 XiaoIce \ucd9c\uc2dc \ud6c4\uc5d0 \ubc1c\uc0dd\ud55c \uc0ac\ub78c-\uba38\uc2e0 \uac04\uc758 \ub300\ud654\ub97c 2018\ub144\uae4c\uc9c0 30B\uac1c \uc774\uc0c1 \uc218\uc9d1.  \ud604\uc7ac XiaoIce \ub2f5\ubcc0 \uc911 70%\ub294 \uacfc\uac70\uc5d0 XiaoIce\uc640 \uc0ac\ub78c \uac04\uc758 \ub300\ud654\uc5d0\uc11c \ud0d0\uc0c9(retrieve)\ud574\uc11c \uc0ac\uc6a9<\/li>\n            <\/ul>\n          <\/li>\n          <li>\ub370\uc774\ud130\uc14b(\ud2b9\ud788 \uc778\ud130\ub137\uc5d0\uc11c \uac00\uc838\uc628 \ub370\uc774\ud130)\uc758 \uc9c8\uc744 \ucee8\ud2b8\ub864\ud558\uae30 \uc704\ud574\uc11c, Empathetic Computing Module\uc744 \uc0ac\uc6a9\ud558\uc5ec \uac01 query-response \uc30d\uc744 \ud29c\ud50c $(Q_C, R, e_Q, e_R)$ \uc73c\ub85c \ubcc0\ud658\ud568<\/li>\n          <li>\uc774 \ud29c\ud50c\uc5d0 \ub300\ud574\uc11c XiaoIce \ud398\ub974\uc18c\ub098\uc5d0 \ub9de\ub294 Empathetic response\ub9cc \uc0b4\ub9ac\ub3c4\ub85d \ud544\ud130\ub9c1\ud568<\/li>\n          <li>\ub610\ud55c \uac1c\uc778 \uc815\ubcf4, \ub354\ub7ec\uc6b4 \ucf5b, \ubd80\uc801\uc808\ud55c \ucee8\ud150\uce20, \uc2a4\ud3a0\ub9c1 \ubbf8\uc2a4 \ub4f1\uc774 \uc788\ub294 \uc30d\ub4e4 \uc5ed\uc2dc \uc81c\uac70\ud568<\/li>\n          <li>\ud544\ud130\ub9c1\ub41c \ud398\uc5b4\ub4e4\uc740 \ud6a8\uc728\uc801\uc778 \uac80\uc0c9\uc744 \uc704\ud574 Lucene\uc744 \uc0ac\uc6a9\ud574\uc11c \uc778\ub371\uc2f1\ud568<\/li>\n          <li>\uc2e4\uc81c\ub85c\ub294 \uc778\ud48b query $Q_C$ (in the state $s$) \ub97c \uc0ac\uc6a9\ud574\uc11c paired dataset\uc5d0 \ub300\ud574 \uba38\uc2e0\ub7ec\ub2dd representation \uae30\ubc18\uc73c\ub85c \ud0a4\uc6cc\ub4dc \ubc0f \uc758\ubbf8 \uac80\uc0c9\uc744 \ud558\uace0 \ucd5c\uc885\uc801\uc73c\ub85c \ucd5c\ub300 400\uac1c\uc758 \ud6c4\ubcf4 \ub300\ub2f5\uc744 \ucd94\ub9bc<\/li>\n          <li>\ub370\uc774\ud130\uc14b\uc5d0 \ub300\ud55c \uac80\uc0c9 \uacb0\uacfc(\ud6c4\ubcf4 \ub300\ub2f5)\uc758 \uc9c8\uc740 \uc6b0\uc218\ud558\uc9c0\ub9cc, \ub370\uc774\ud130\uc14b\uc5d0 \uc874\uc7ac\ud558\uc9c0 \uc54a\ub294 \uc0c8\ub85c\uc6b4 \ud639\uc740 \uc801\uac8c \ub098\ud0c0\ub098\ub294 \ud1a0\ud53d\uc5d0 \ub300\ud574\uc11c\ub294 \ub0ae\uc740 \ucee4\ubc84\ub9ac\uc9c0\ub97c \ubcf4\uc784<\/li>\n          <li>\uc774\ub7f0 \ucee4\ubc84\ub9ac\uc9c0\ub97c \ub192\uc774\uae30 \uc704\ud574, \uc544\ub798\uc5d0\uc11c \uc18c\uac1c\ud560 Neural Response Generator\ub97c \uc0ac\uc6a9<\/li>\n        <\/ul>\n      <\/li>\n      <li><strong>Neural Response Generator<\/strong>\n        <ul>\n          <li>Retrieval-based Generator\uc640 \ub2ec\ub9ac, \uc5b4\ub5a4 \ud1a0\ud53d\uc5d0 \ub300\ud574\uc11c\ub3c4, \uc2ec\uc9c0\uc5b4\ub294 \uc0ac\ub78c \uac04 \ub300\ud654 \ub370\uc774\ud130\uc5d0\uc11c \ubabb \ubcf8 \uac83\uc774\uc5b4\ub3c4 \ud559\uc2b5\uc744 \ud1b5\ud574\uc11c \ub300\ub2f5\uc744 \uc0dd\uc131\ud574\ub0bc \uc218 \uc788\uc74c. Paired \ub370\uc774\ud130\uc14b\uc744 \ud1b5\ud574\uc11c \ud559\uc2b5\ud568<\/li>\n          <li>Neural-based Generator\ub294 robust\ud558\uace0 \ub192\uc740 \ucee4\ubc84\ub9ac\uc9c0\ub97c \ubcf4\uc774\ub294 \ubc18\uba74, Retrieval-based Generator\ub294 \uc720\uba85\ud55c \ud1a0\ud53d\uc5d0 \ub300\ud574 \uc591\uc9c8\uc758 \ub2f5\uc744 \ub9cc\ub4e4 \uc218 \uc788\uc74c. \ub530\ub77c\uc11c \uc11c\ub85c \uc0c1\ud638 \ubcf4\uc644\uc801\uc73c\ub85c \uc5ed\ud560\uc744 \ud568<\/li>\n          <li>\ud604\uc7ac \ub9ce\uc740 \uc5f0\uad6c\uac00 \uc9c4\ud589 \uc911\uc774\uace0, \uacb0\uad6d \uc18c\uc15c \ucc57\ubd07\uc758 \ud37c\ud3ec\uba3c\uc2a4 \ud5a5\uc0c1\uc5d0 \uc788\uc5b4\uc11c \ub9e4\uc6b0 \uc911\uc694\ud55c \uc5ed\ud560\uc744 \ud560 \uac83\uc784<\/li>\n          <li><strong>XiaoIce\uc758 Neural Response Generator\ub294 seq2seq\ub97c \uae30\ubc18\uc73c\ub85c \ud558\uace0 \uc788\uc74c<\/strong><\/li>\n          <li>seq2seq \ubaa8\ub378\uc774 \ub3cc\uc544\uac00\ub294 \ub9e4\ucee4\ub2c8\uc998\uc740 \ub2e4\uc74c\uacfc \uac19\uc74c (Figure 6 \ucc38\uace0)\n            <ol>\n              <li>\uc720\uc800\uc758 \uc778\ud48b query $Q_C$ \ub97c Source RNN(Encoder)\uc5d0 \ud0dc\uc6c0<\/li>\n              <li>user\uc758 query\uc640 XiaoIce\uc758 response\uc5d0 \ub300\ud55c empathy vector\uc778 $e_Q$ \uc640 $e_R$ \ub97c \ud558\ub098\ub85c \ud569\uccd0\uc11c interative representation $v=\\sigma{(W_Q^T e_Q + W_R^T e_R)}$ \ub97c \ub9cc\ub4ec<\/li>\n              <li>Source RNN\uc758 \uc544\uc6c3\ud48b\uc778 context vector\uc640 interative representation\uc744 \ud1b5\ud574\uc11c Target RNN(Decoder)\uc73c\ub85c response $R$ \uc744 \ud55c \ub2e8\uc5b4\uc529 \uc0dd\uc131\ud574\ub0c4<\/li>\n            <\/ol>\n          <\/li>\n          <li>\ucd5c\uc885\uc801\uc73c\ub85c Beam Search\ub97c \uc0ac\uc6a9\ud574\uc11c \ucd5c\ub300 20\uac1c\uc758 \ud6c4\ubcf4\ub97c \uc0dd\uc131\ud568<\/li>\n          <li>Empathy \uc815\ubcf4\ub97c \ud65c\uc6a9\ud558\uc5ec \uc0dd\uc131\uc744 \ud558\uae30 \ub54c\ubb38\uc5d0 \uc77c\uad00\ub41c \ub2f5\ubcc0\uc744 \uc0dd\uc131\ud560 \uc218 \uc788\uc74c. Figure 7\uc5d0\uc11c grounding\uc744 \uc548\ud55c \uac74 \ub098\uc774\ub97c \ubb3c\uc5b4\ubcfc \ub54c \ub9e4\ubc88 \ub2ec\ub77c\uc9c0\uc9c0\ub9cc, \ud55c \uac74 \uc77c\uad00\ub418\uac8c \ub2f5\uc744 \ud568<\/li>\n        <\/ul>\n      <\/li>\n      <li><strong>Retrieval-Based Generator using Unpaired Data<\/strong>\n        <ul>\n          <li>\uc704\uc758 \ub450 Generator\uc5d0\uc11c \uc0ac\uc6a9\ud558\ub294 Piared \ub300\ud654 \ub370\uc774\ud130 \uc678\uc5d0\ub3c4 \ud6e8\uc52c \ub354 \uace0\ud488\uc9c8\uc758 \ub9ce\uc740 \uc591\uc758 \ud14d\uc2a4\ud2b8 \ub370\uc774\ud130(unpaired &amp; non-conversational)\uac00 \uc788\uc74c<\/li>\n          <li>XiaoIce\uc5d0 \uc0ac\uc6a9\ud55c unpaired \ub370\uc774\ud130\uc14b\uc740 \uacf5\uac1c \uac15\uc758\ub098 \ub274\uc2a4\/\ub9ac\ud3ec\ud2b8\uc758 \uc778\uc6a9\uad6c\ub85c\ubd80\ud130 \uc218\uc9d1\ud55c \ubb38\uc7a5\ub4e4\ub85c \uad6c\uc131\ub418\uc5b4 \uc788\uc74c<\/li>\n          <li>\ud574\ub2f9 \ubb38\uc7a5\ub4e4\uc758 \ubc1c\ud654\uc790\uac00 \ub204\uad6c\uc778\uc9c0 \uc54c \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0 \uc55e\uc120 \ub450 Generator\uc5d0 \uc0ac\uc6a9\ud558\ub294 $e_R$ \uc744 \uacc4\uc0b0\ud560 \uc218 \uc788\uae30\uc5d0 \ucda9\ubd84\ud788 \uc774 \ubb38\uc7a5\ub4e4\ub3c4 \ud6c4\ubcf4 \ub300\ub2f5 $R$ \ub85c \ub3d9\uc77c\ud558\uac8c \ub2e4\ub8f0 \uc218 \uc788\uc74c<\/li>\n          <li>\ub370\uc774\ud130 \ud544\ud130\ub9c1 \ud30c\uc774\ud504\ub77c\uc778 \uc5ed\uc2dc $(R, e_R)$ \uc5d0 \ub300\ud574\uc11c paired \ub370\uc774\ud130\uc640 \uc720\uc0ac\ud558\uac8c \uc801\uc6a9\ud568 (paired\ub294 query\uc5d0 \ub300\ud55c \uc815\ubcf4\ub3c4 \ud29c\ud50c\uc5d0 \ud3ec\ud568\ub428)<\/li>\n          <li>Paired \ub370\uc774\ud130\ucc98\ub7fc Unpaired \ub370\uc774\ud130\ub3c4 Lucene\uc744 \uc0ac\uc6a9\ud574\uc11c \uc778\ub371\uc2f1\uc744 \ud568. \ud558\uc9c0\ub9cc Paired\uc640 \ub2ec\ub9ac \ub7f0\ud0c0\uc784\uc5d0 $Q_C$ \uc5d0 \ub300\ud574\uc11c \ud1a0\ud53d\uc744 \ucd94\uac00\ud558\ub294 \ub4f1\uc758 query \ud655\uc7a5\uc744 \uc9c4\ud589\ud568<\/li>\n        <\/ul>\n\n        <p><img src=\"\/assets\/images\/blog\/2019-08-27-xiaoice\/figure8.png\" alt=\"figure8\" \/><\/p>\n\n        <ul>\n          <li>\uc774 query \ud655\uc7a5\uc744 \uc704\ud574 Knowledge Graph (KG)\ub97c \uc0ac\uc6a9\ud568. KG\ub294 MS\uc758 Satori\ub77c\ub294 \uac83\uc744 \uc0ac\uc6a9\ud558\uba70, head-relation-tail\uc758 triplet (h, r, t) \uc73c\ub85c \uc774\ub8e8\uc5b4\uc838 \uc788\uc74c<\/li>\n          <li>\uc704\uc758 Figure 8\uc740 XiaoIce KG\uc640 unpaired \ub370\uc774\ud130\uc14b\uc744 \uc774\uc6a9\ud574\uc11c \ud6c4\ubcf4 \ub2f5\ubcc0\uc744 \uc0dd\uc131\ud558\ub294 \uacfc\uc815\uc744 \ub098\ud0c0\ub0b4\uace0 \uc788\uc74c. \uc774 \uacfc\uc815\uc744 3\ub2e8\uacc4\ub85c \uc124\uba85\ud558\uba74 \ub2e4\uc74c\uacfc \uac19\uc74c\n            <ol>\n              <li>\uc720\uc800\uc758 query $Q_C$ \ub85c\ubd80\ud130 \ud1a0\ud53d\uc744 \ubf51\uc544\ub0c4. (e.g., \u201ctell me about Beijing\u201d\uc5d0\uc11c \u201cBeijing\u201d\uc774\ub77c\ub294 \ud1a0\ud53d\uc744 \ubf51\uc74c)<\/li>\n              <li>KG\ub85c\ubd80\ud130 \uc704\uc5d0\uc11c \ubf51\uc740 \ud1a0\ud53d\uacfc \uac00\uc7a5 \uad00\ub828\uc788\ub294 \ucd5c\ub300 20\uac1c\uc758 \ud1a0\ud53d\uc744 \ubf51\uc74c (e.g., \u201cBeijing\u201d\uacfc \uad00\ub828\uc774 \ud070 \u201cBadaling Great Wall\u201d\uacfc \u201cBeijing snacks\u201d\ub97c \ubf51\uc74c). \uc774\ub807\uac8c \ubf51\uc740 \ud1a0\ud53d\ub4e4\uc740 \uc0ac\ub78c\uc774 \uc9c1\uc811 \ub808\uc774\ube14\ub9c1\ud55c \ud559\uc2b5\ub370\uc774\ud130\ub97c \ud1b5\ud574 \ud559\uc2b5\ub41c boosted tree ranker\ub97c \uc0ac\uc6a9\ud574\uc11c \uad00\ub828\ub3c4 \uc21c\uc73c\ub85c \uc815\ub82c\ud568<\/li>\n              <li>$Q_C$ \uc758 \ud1a0\ud53d\uacfc KG\uc758 \ud1a0\ud53d\uc744 \ud569\ud574\uc11c \uc0c8\ub85c\uc6b4 query\ub97c \ub9cc\ub4ec. \uc774 query\ub97c \uc774\uc6a9\ud574\uc11c unpaired \ub370\uc774\ud130\uc14b\uc5d0 \ub300\ud574 \ucd5c\ub300 400\uac1c\uc758 \ud6c4\ubcf4 \ub2f5\ubcc0 \ubb38\uc7a5\uc744 \ud0d0\uc0c9(retrieve)\ud558\uc5ec \ucd94\ucd9c\ud568<\/li>\n            <\/ol>\n          <\/li>\n          <li>\ube44\ub85d unpaired \ub370\uc774\ud130\uc14b\uc73c\ub85c\ubd80\ud130 \ubf51\uc740 \ud6c4\ubcf4 \ub2f5\ubcc0\uc758 \uc9c8\uc740 paired\uc5d0 \ube44\ud574 \ub5a8\uc5b4\uc9c8 \uc218 \uc788\uc9c0\ub9cc, \ud6e8\uc52c \ub354 \ub113\uc740 \ubc94\uc704\uc758 \ud1a0\ud53d\uc744 \ucee4\ubc84\ud560 \uc218 \uc788\uc74c<\/li>\n          <li>Neural Generator\uc640 \ube44\uad50\ud574\uc11c\ub294 unpaired\uac00 \ub354 \uae38\uace0 \uc88b\uc740 \ucee8\ud150\uce20\ub97c \ud3ec\ud568\ud558\uace0 \uc788\uc74c<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>2) \ud6c4\ubcf4 \ub2f5\ubcc0\uc5d0 \ub300\ud55c \ub7ad\ud0b9\uc740 \uc544\ub798\uc758 1\uac00\uc9c0 \ubc29\ubc95\uc744 \uc0ac\uc6a9\n    <ul>\n      <li><strong>Response Candidate Ranker<\/strong>\n        <ul>\n          <li>\uc704\uc758 3\uac00\uc9c0 \ubc29\ubc95\uc744 \ud1b5\ud574 \uc5bb\uc740 \ud6c4\ubcf4 \ub2f5\ubcc0\ub4e4\uc5d0 \ub300\ud574\uc11c boosted tree ranker\ub97c \uc774\uc6a9\ud574\uc11c \ub7ad\ud0b9\uc744 \ub9e4\uae40<\/li>\n          <li>\ucd5c\uc885 \ub2f5\ubcc0\uc740 thresholde \uc774\uc0c1\uc758 \uc0c1\uc704 \ub7ad\ud0b9 \uc2a4\ucf54\uc5b4\ub97c \ubc1b\uc740 \ud6c4\ubcf4 \ub2f5\ubcc0\ub4e4 \uc911 \ud558\ub098\ub97c \ub79c\ub364\ud558\uac8c \uc120\ud0dd\ud568<\/li>\n          <li>Ranker \ubaa8\ub378\uc740 \uc8fc\uc5b4\uc9c4 dialogue state $s=(Q_C, C, e_Q, e_R)$ \uc640 \uac01\uac01\uc758 \ud6c4\ubcf4 \ub2f5\ubcc0 $R^{`}$ \uc5d0 \ub300\ud574 \ub2e4\uc74c\uc758 feature 4\uac1c\ub97c \uae30\ubc18\uc73c\ub85c \ub7ad\ud0b9 \uc2a4\ucf54\uc5b4\ub97c \uacc4\uc0b0\ud568\n            <ul>\n              <li>Local cohesion features:  $Q_C$ \uc640 $R^{`}$ \uc758 \uc720\uc0ac\ub3c4. \uc88b\uc740 \ub2f5\ubcc0\uc740 query\uc640 \uc758\ubbf8\uc801\uc73c\ub85c \uad00\ub828\uc774 \uc788\uc5b4\uc57c \ud568<\/li>\n              <li>Globl coherence features: $(Q_C, C)$ \uc640 $R^{`}$ \uc758 \uc720\uc0ac\ub3c4. \uc88b\uc740 \ub2f5\ubcc0\uc740 query\uc640 context\uc5d0 \ub300\ud574\uc11c \uc758\ubbf8\uc801\uc73c\ub85c \uc77c\uad00\ub418\uc5b4\uc57c \ud568<\/li>\n              <li>Empathy matching features:  $e_{R}$ \uc640 $e_{R^{`}}$ \uc758 \uc720\uc0ac\ub3c4. \uc88b\uc740 \ub2f5\ubcc0\uc740 XiaoIce\uc758 \ud398\ub974\uc18c\ub098\uc5d0 \ub9de\ub294 Empathetic Response\uc5ec\uc57c \ud568<\/li>\n              <li>Retrieval matching features: \ud0d0\uc0c9 \uacb0\uacfc\ub85c \uc5bb\uc740 query-response \uc30d\uc5d0\uc11c\uc758 query\uc640 $Q_C$ \uac04\uc758 \uc720\uc0ac\ub3c4. word level (BM25, TFIDF)\uc640 semantic level (\ubb38\uc7a5 \uc720\uc0ac\ub3c4)\ub97c \uc774\uc6a9\ud568<\/li>\n            <\/ul>\n          <\/li>\n          <li>Ranker\ub294 dialogue-state-response \uc30d\uc778 $(s, R)$ \uc5d0 \ub300\ud55c 3-level \uc2a4\ucf00\uc77c\uc744 \uac16\ub294 \ub808\uc774\ube14(rating)\uc744 \uc774\uc6a9\ud574\uc11c \ud559\uc2b5\uc744 \ud568. Figure 9\uc640 \uac19\uc740 \ub370\uc774\ud130 \ubaa8\uc591\uc784\n            <ul>\n              <li>0: the response is not empathetic or not very relevant to the query. It is likely to lead to the termination of the conversation.<\/li>\n              <li>1: the response is acceptable and relevant to the query. It is likely to help keep the conversation going.<\/li>\n              <li>2: this is an empathetic, interpersonal response that makes users feel delightful and excited. It is likely to drive the conversation.<\/li>\n            <\/ul>\n          <\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Editorial Response<\/strong>\n    <ul>\n      <li>pass<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"44-dialogue-skills\">4.4 Dialogue Skills<\/h3>\n\n<ul>\n  <li>\n    <p>230\uac1c\uc758 Dialogue Skills\uc774 \uc788\uc74c. \uc774\ub97c 4\uac1c\uc758 \uce74\ud14c\uace0\ub9ac\uc5d0 \ub300\ud574\uc11c \uac04\ub7b5\uc774 \uc124\uba85: 1) image commenting, 2) content creation, 3) deep engagement, 4) task completion<\/p>\n  <\/li>\n  <li><strong>Image Commenting<\/strong>\n    <ul>\n      <li>\uc18c\uc15c \ucc44\ud305\uc5d0\uc11c \uc774\ubbf8\uc9c0\uc758 \uc5ed\ud560\uc774 \ub9e4\uc6b0 \ucee4\uc9c0\uace0 \uc788\uc74c<\/li>\n      <li>\ubb3c\uccb4 \uc778\uc2dd + \uc774\ubbf8\uc9c0 \uc124\uba85 \ubfd0 \uc544\ub2c8\ub77c, \ud398\ub974\uc18c\ub098, \ud0dc\ub3c4, \uc704\uce58\ub97c \ubc18\uc601\ud55c \uac10\uc815\uc801 \ucf54\uba58\ud2b8\uae4c\uc9c0 \ub9cc\ub4e4\uc5b4\ub0c4. \uc774\ub7f0 \uc810\uc5d0\uc11c \uc804\ud1b5\uc801\uc778 \ube44\uc804 \ud14c\uc2a4\ud06c\uc640\ub294 \ub610\ub2e4\ub978 \ubb38\uc81c\ub77c\uace0 \ud560 \uc218 \uc788\uc74c. Figure 11\uc5d0 \uc5b4\ub5bb\uac8c \ub2e4\ub978\uc9c0 \uc608\uc2dc\ub97c \ub4e4\uc5b4\ub193\uc74c<\/li>\n      <li>\ubaa8\ub378\uc758 \uad6c\uc870\ub294 Core Chat\uacfc \uc720\uc0ac\ud568. \ub2e4\ub9cc \uc778\ud48b\uc73c\ub85c \uc774\ubbf8\uc9c0\ub098 \uc601\uc0c1\uc744 \ubc1b\ub294 \uac83<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Content Creation<\/strong>\n    <ul>\n      <li><em>\uc7ac\ubbf8\uc788\ub294 \ub180\uc774 \ucee8\ud150\uce20 \uac19\uc740 \uac74\uac00? \uac10\uc774 \uc798 \uc548\uc634<\/em><\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Deep Engagement<\/strong>\n    <ul>\n      <li>Deep Engagement Skill\uc774\ub780 \uc720\uc800\uc758 \ud2b9\uc815 \uac10\uc815\uc774\ub098 \uc9c0\uc801 \ub2c8\uc988\ub97c \ub9cc\uc871\uc2dc\ud0a4\uae30 \uc704\ud574 \ud2b9\uc815 \ud1a0\ud53d\uc774\ub098 \uc138\ud305\uc5d0 \ud0c0\uac9f\ud558\ub294 \uac83. \uc774\ub97c \ud1b5\ud574 \uc720\uc800\uc640 \uc7a5\uc9c0\uc801 \uad00\uacc4\ub97c \ub9fa\uc5b4\ub098\uac10<\/li>\n    <\/ul>\n\n    <p><img src=\"\/assets\/images\/blog\/2019-08-27-xiaoice\/figure17.png\" alt=\"figure17\" \/><\/p>\n\n    <ul>\n      <li>\uc774 \uce74\ud14c\uace0\ub9ac\uc5d0 \ud3ec\ud568\ub41c \uc2a4\ud0ac\ub4e4\uc740 \ud06c\uac8c 2\uac00\uc9c0 \ucc28\uc6d0\uc5d0 \ub300\ud574\uc11c \uadf8\ub8f9\ud551\uc744 \ud560 \uc218 \uc788\uc74c: (IQ-EQ)\uc640 (1\ub3001-Group). Figure 17 \ucc38\uace0<\/li>\n      <li>(IQ-EQ) \ucd95\uc740 XiaoIce\uc758 \uad00\uc2ec\uc0ac, \uacbd\ud5d8, \uc9c0\uc2dd \ub4f1\uacfc \uad00\ub828\ub41c \uad6c\uccb4\uc801\uc778 \ud1a0\ud53d\uc5d0 \ub300\ud574\uc11c \uc598\uae30\ub97c \ud480\uc5b4\ub098\uac00\ub294 \uac83 \ub610\ub294 \uac10\uc815\uc801 \uc704\ub85c\ub97c \uc704\ud55c \uc811\uadfc\uc744 \ud1b5\ud574\uc11c \uc0ac\ub78c\ub4e4\uc758 \uac10\uc815\uc801 \ub2c8\uc988\ub97c \ub9de\ucdb0\uc8fc\ub294 \uac83<\/li>\n      <li>(1\ub3001-Group) \ucd95\uc740 \uc5bc\ub9c8\ub098 \ud504\ub77c\uc774\ube57\ud55c \uc598\uae30\ub97c \ub098\ub220\ubcfc \uac83\uc778\uac00 \uacf5\ud1b5\uc801\uc778 \uc598\uae30\ub97c \ud560 \uac83\uc778\uac00<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Task Completion<\/strong>\n    <ul>\n      <li>Personal Assistant\uc640 \uc720\uc0ac\ud558\uac8c \ub0a0\uc528, \uae30\uae30 \uc870\uc791, \ub178\ub798, \ub274\uc2a4 \ub4f1\uc5d0 \ub300\ud55c \ud14c\uc2a4\ud06c\ub97c \ucc98\ub9ac\ud558\ub294 \uac83<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"5-xiaoice-in-the-wild\">5. XiaoIce in the Wild<\/h2>\n\n<p><img src=\"\/assets\/images\/blog\/2019-08-27-xiaoice\/figure19.png\" alt=\"figure19\" \/><\/p>\n\n<ul>\n  <li><strong>Core Chat (Conversation Engine)<\/strong>\n    <ul>\n      <li>5\uc138\ub300\ubd80\ud130 neural generator\uac00 \ub3c4\uc785\ub428. \uc774\ub294 XiaoIce\uc758 \ub2f5\ubcc0 \ucee4\ubc84\ub9ac\uc9c0\uc640 \ub2e4\uc591\uc131\uc744 \ud06c\uac8c \ud5a5\uc0c1\uc2dc\ud0b4.<\/li>\n      <li>Empathetic Computing Module \uc5ed\uc2dc \ud070 \uc5ed\ud560\uc744 \ud558\ub294\ub370 \ud2b9\ud788 6\uc138\ub300\uc5d0 empathy model\ub4e4\uc744 \ud1b5\ud569\ud558\uba74\uc11c \uc0ac\ub78c\uc5d0 \ub300\ud55c XiaoIce\uc758 \uac10\uc815\uc801 \ucee4\ub125\uc158\uc774 \uac15\ud654\ub428<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>User Experience<\/strong>\n    <ul>\n      <li>5\uc138\ub300\ubd80\ud130 Full Duplex\ub97c \uc9c0\uc6d0\ud558\uba74\uc11c \ub354 \uc790\uc5f0\uc2a4\ub7ec\uc6b4 \uc758\uc0ac\uc18c\ud1b5\uc744 \uac00\ub2a5\ud558\uac8c \ud558\uace0 \ub300\ud654 \uc138\uc158\uc758 \uae38\uc774\ub97c \ud06c\uac8c \uc99d\uac00\uc2dc\ud0b4. \ub2e4\ub978 \uc18c\uc15c \ucc57\ubd07\uc774\ub098 \uc5b4\uc2dc\uc2a4\ud134\ud2b8\uc640 \ud655\uc2e4\ud55c \ucc28\ubcc4\uc810<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>New Skills<\/strong>\n    <ul>\n      <li>pass<\/li>\n    <\/ul>\n  <\/li>\n  <li><strong>Platform<\/strong>\n    <ul>\n      <li>pass<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"6-conclusions\">6. Conclusions<\/h2>\n\n<ul>\n  <li>\u201c\uc758\ubbf8\uc788\ub294 \ub300\ud654\uc640 \ud589\ubcf5\uc740 \ud568\uaed8 \uac04\ub2e4\u201d<\/li>\n  <li>\uc55e\uc73c\ub85c\ub294 \uc18c\uc15c \ucc57\ubd07\uc774 \uc758\ubbf8\uc788\ub294 \ub300\ud654\uc758 \uc0c1\ub300\uac00 \ub418\uc5b4\uc904 \uac83\uc784<\/li>\n  <li>Future Works\n    <ul>\n      <li><strong>Towards a unified modeling framework<\/strong>: \uc9c0\uae08\uc740 MDP \uae30\ubc18\uc758 \uacc4\uce35\uc801 \uc758\uc0ac\uacb0\uc815\uc744 \ud1b5\ud574 \uc5ec\ub7ec \ubaa8\ub4c8\uc774 \uc791\ub3d9\ud558\ub294\ub370 \uc774\ub97c \ud558\ub098\uc758 \ud504\ub808\uc784\uc6cc\ud06c\ub85c \ud1b5\uc77c\ud560 \ud544\uc694\uac00 \uc788\uc74c<\/li>\n      <li><strong>Towards goal-oriented, grounded conversations<\/strong>: *goa l \uac1c\ub150\uc5d0 \ub300\ud574 \uc774\ud574 \uc798 \uc548\ub428- . \ud604\uc2e4 \uc138\uc0c1\uc758 \ub354 \ub113\uc740 \ubc94\uc704\uc758 \ub300\ud654\ub97c fully grounding\ud574\ubcf4\uc790<\/li>\n      <li><strong>Towards a proactive personal assistant<\/strong>: XiaoIce\ub294 \uc720\uc800\uc758 \uad00\uc2ec\uc0ac\uc640 \uc758\ub3c4\ub97c \ub354 \uc815\ud655\ud788 \uc778\uc2dd\ud560 \uc218 \uc788\uc74c. \uc5ec\uae30\uc11c \ucee4\uba38\uc15c \ubca8\ub958\uac00 \ubc1c\uc0dd\ud55c\ub2e4\uace0 \uc0dd\uac01! \ucfe0\ud3f0 \uc2a4\ud0ac\uc744 \ucd94\uac00\ud574\uc11c \uc720\uc800\uc758 \ub2c8\uc988\uac00 \ud3ec\ucc29\ub418\uba74 \ucfe0\ud3f0\uc744 \uc8fc\ub294 \uac70\uc784. \uc720\uc800 \ud53c\ub4dc\ubc31 \ub85c\uadf8 \uac19\uc740 \uac78 \ubcf4\uba74 \uc774\uc5d0 \ub300\ud574\uc11c \uc2e4\uc81c\ub85c \ud574\ub2f9 \ucd94\ucc9c\uc774 \uc798 \uc218\uc6a9\ub418\uace0 \uc788\ub2e4\ub294 \uac78 \ud655\uc778\ud568<\/li>\n      <li><strong>Towards human-level intelligence<\/strong>\n        <ul>\n          <li>\uae30\uc220\uc801\uc778 \ubc1c\uc804<\/li>\n        <\/ul>\n      <\/li>\n      <li><strong>Towards an ethical social chatbot<\/strong><\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n","pubDate":"Thu, 22 Aug 2019 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/xiaoice\/","guid":"https:\/\/roomylee.github.io\/xiaoice\/","category":["dialog-system","xiaoice","blog"]},{"title":"Recurrent Convolutional Neural Networks for Text Classification (AAAI 2015)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI15\/paper\/view\/9745\">https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI15\/paper\/view\/9745<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Siwei Lai (Chinese Academy of Sciences)<\/li>\n      <li>Liheng Xu (Chinese Academy of Sciences)<\/li>\n      <li>Kang Liu (Chinese Academy of Sciences)<\/li>\n      <li>Jun Zhao (Chinese Academy of Sciences)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>AAAI 2015<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Traditional text classifier\ub294 \uc0ac\uc804, knowledge base, tree kernel \ub4f1\uc758 human-designed feature\uc5d0 \uc758\uc874\uc801\uc784<\/li>\n  <li>\uc6b0\ub9ac\ub294 \uc774\ub7f0 \uac83\uc5d0 \uc758\uc874\uc801\uc774\uc9c0 \uc54a\uc740 text classification\uc744 \uc704\ud55c recurrent convolutional neural network\uc744 \uc81c\uc548\ud568<\/li>\n  <li>recurrent structure\uac00 word representation\uc744 \ud559\uc2b5\ud560 \ub54c, \ubb38\ub9e5\uc801\uc778 \uc815\ubcf4\ub97c \ucd5c\ub300\ud55c \uc7a1\uc544\ub0c4<\/li>\n  <li>\uc774\ub294 traditional window-based neural network\uc5d0 \ube44\ud574\uc11c noise\uac00 \uc0c1\ub2f9\ud788 \uc801\uc74c<\/li>\n  <li>\uc6b0\ub9ac\ub294 \ub610\ud55c max-pooling layer\ub97c \ub450\uc5b4\uc11c \uc790\ub3d9\uc73c\ub85c \ub2e8\uc5b4\uac00 text classifiaction\ud558\ub294\ub370 \uc788\uc5b4\uc11c \uc911\uc694\ud55c \uc5ed\ud560\uc744 \ud558\ub294\uc9c0 \ud310\ub2e8\ud560 \uc218 \uc788\ub3c4\ub85d \ud558\uc600\uc74c<\/li>\n  <li>4\uac1c\uc758 commonly used dataset\uc73c\ub85c \uc2e4\ud5d8\ud568<\/li>\n  <li>\uc2e4\ud5d8 \uacb0\uacfc\ub294 \uba87\uba87 \ub370\uc774\ud130\uc5d0 \ub300\ud574\uc11c SOTA\ub97c \ub2a5\uac00\ud558\uba70, \ud2b9\ud788 document-level dataset\uc5d0\uc11c \ub354\uc6b1 \uadf8\ub7ec\ud568<\/li>\n<\/ul>\n\n<h2 id=\"introduction\">Introduction<\/h2>\n\n<ul>\n  <li>Text classification\uc5d0\uc11c\ub294 feature representation\uc774 \uac00\uc7a5 \ud575\uc2ec\uc778\ub370 \uc804\ud1b5\uc801\uc73c\ub85c n-gram\uc640 \uac19\uc740 bag of word\uc744 \uc8fc\ub85c \uc0ac\uc6a9\ud558\uc600\uc74c<\/li>\n  <li>traditional method\ub294 \ub2e8\uc5b4 \uc21c\uc11c\ub098 \ubb38\ub9e5\uc801\uc778 \uc815\ubcf4\ub97c \ubb34\uc2dc\ud558\ub294 \uacbd\uc6b0\uac00 \uc788\uc74c<\/li>\n  <li>\uc608\ub97c \ub4e4\uc5b4, <em>\u201cA sunset stroll along the South Bank affords an array of stunning vantage points.\u201d<\/em> \ub77c\ub294 \ubb38\uc7a5\uc5d0\uc11c <em>\u201cBank\u201d<\/em> (unigram) \ub97c \ubd84\uc11d\ud560 \ub54c, \uc6b0\ub9ac\ub294 \uc774 \ub2e8\uc5b4\uac00 \uae08\uc735\uae30\uad00\uacfc \ub451(the land beside a river) \uc911 \ubb34\uc5c7\uc744 \uc758\ubbf8\ud558\ub294\uc9c0 \ubaa8\ub97c \uc218 \uc788\uc74c -&gt; \ub2e4\uc758\uc5b4\uc5d0 \ub300\ud55c \ud574\uc11d \ubb38\uc81c<\/li>\n  <li>\uac8c\ub2e4\uac00 uppercase letter\ub85c \uc4f4 <em>\u201cSouth Bank\u201d<\/em> (bigram) \uc5d0 \ub300\ud574\uc11c London\uc5d0 \ub300\ud55c \uc9c0\uc2dd\uc774 \uc5c6\ub2e4\uba74 (\uc0ac\uc6b0\uc2a4\ubc45\ud06c\ub294 \ub7f0\ub358 \ud15c\uc988\uac15 \ub0a8\ubd80 \uc9c0\uc5ed \uc774\ub984) \uc774\ub97c \uae08\uc735\uae30\uad00\uc73c\ub85c \uc798 \ubabb \ubc1b\uc544\ub4dc\ub9ac\uac8c \ub428 -&gt; Named Entity\uc5d0 \ub300\ud55c \ud574\uc11d \ubb38\uc81c<\/li>\n  <li>\ub9cc\uc57d\uc5d0 <em>\u201cstroll along the South Bank\u201d<\/em> (5-gram) \uae4c\uc9c0 \ubcf4\uac8c \ub41c\ub2e4\uba74 \uc6b0\ub9ac\ub294 \uc27d\uac8c \ub73b\uc744 \uad6c\ubd84\ud560 \uc218 \uc788\uac8c \ub428<\/li>\n  <li>\uc774\ucc98\ub7fc \ub354\uc6b1 \ubcf5\uc7a1\ud55c like high order n-gram \ub4f1\uc758 feature\ub97c \uc0ac\uc6a9\ud558\uba74 \ubb38\ub9e5\uc801\uc778 \uc815\ubcf4\ub97c \uc798 \uce90\uce58\ud560 \uc218 \uc788\uc73c\ub098, data sparsity\uc758 \ubb38\uc81c\uac00 \uc0dd\uae40<\/li>\n<\/ul>\n\n<h2 id=\"related-work\">Related Work<\/h2>\n\n<ul>\n  <li>\uc804\ud1b5\uc801\uc73c\ub85c Text classification \uc774\ub77c\ub294 \ubd84\uc57c\uc758 \uc5f0\uad6c\ub294 feature engineering, feature selection and using differenct types of machine learning algorithm, 3\uac00\uc9c0 \uc8fc\uc81c\uac00 \uba54\uc778\uc744 \uc774\ub8f8<\/li>\n  <li>Feature engineering\uc744 \uc704\ud574 bag-of-word\ub97c feature\ub85c \ub110\ub9ac \uc0ac\uc6a9\ud558\uc600\uace0, POS tag\ub098 tree kernel \ub4f1\uc758 \ub354\uc6b1 \ubcf5\uc7a1\ud55c feature\ub4e4\ub3c4 \ub9ce\uc774 \uc0ac\uc6a9\ub428<\/li>\n  <li>Feature selection\uc740 noisy feature\ub97c \uc81c\uac70\ud558\uc5ec \uc131\ub2a5\uc744 \ub192\uc774\ub294 \uac83\uc744 \ubaa9\uc801\uc73c\ub85c \ud568 stop word \uc81c\uac70 \ubc29\ubc95\uc774 \uc774\uc5d0 \uc18d\ud568<\/li>\n  <li>Machine learning algorithm\uc740 logistic regression, naive bayes, svm \ub4f1\uc774 \uc788\uc74c<\/li>\n  <li>\ud558\uc9c0\ub9cc \uc774 \ubaa8\ub4e0 \ubc29\ubc95\ub4e4\uc740 data sparsity problem\uc774 \uc788\uc74c<\/li>\n  <li>\ub525\ub7ec\ub2dd \uae30\ubc18\uc758 representation learning\uc774 \uc774\ub7f0 sparsity problem\uc740 \ud574\uacb0\ud558\uace0\uc790 \ud568<\/li>\n  <li>\uc774\ub7f0 neural representation\uc744 word embedding\uc774\ub77c\uace0 \ud568<\/li>\n<\/ul>\n\n<h2 id=\"model\">Model<\/h2>\n\n<ul>\n  <li>input\uc740 sequence of word\uc778 document D, output\uc740 classification \uacb0\uacfc<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/39083820-2051fce8-45a6-11e8-884f-04910f73788b.png\" alt=\"arch\" \/><\/p>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/39083821-208023f2-45a6-11e8-8fb4-53ab6f1b8d45.png\" alt=\"eq\" \/><\/p>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/39083822-20c8505a-45a6-11e8-8d14-aa94e606dac9.png\" alt=\"dis\" \/><\/p>\n\n<h3 id=\"word-representation-learning\">Word Representation Learning<\/h3>\n\n<ul>\n  <li>\ud558\ub098\uc758 \ub2e8\uc5b4\ub294 x_i = [cl(w_i); e(w_i); cr(w_i)]\ub85c \uad6c\uc131\ub418\ub294\ub370 \uac01\uac01 \uc67c\ucabd context vector, word embedding, \uc624\ub978\ucabd context vector\uc774\uace0 \uc774\ub97c \ubaa8\ub450 concatenate\ud55c \uac83\uc784<\/li>\n  <li>context vector\ub294 \uc704\uc758 \ub124\ud2b8\uc6cc\ud06c \uadf8\ub9bc\uc744 \ucc38\uace0\ud558\uba74 \ub418\ub294\ub370 \uc88c\uc6b0\uc758 context\ub97c rnn \ub290\ub08c\uc73c\ub85c \uc804\ud30c\uc2dc\ud0b4<\/li>\n  <li>\uc708\ub3c4\uc6b0 \ubc94\uc704\uc5d0 \ud574\ub2f9\ud558\ub294 context\ub9cc \uc0ac\uc6a9\ud558\ub294 cnn\uacfc \ub300\uc870\uc801\uc774\uace0 \ub354\uc6b1 \uba85\ud655\ud55c \ub2e8\uc5b4\uc758 \uc758\ubbf8\ub97c \ubc18\uc601\uc2dc\ud0ac \uc218 \uc788\uc744 \uac83\uc774\ub77c\uace0 \ubd04<\/li>\n  <li>\uc6b0\ub9ac\ub294 convolutional layer\ub97c \uc9c1\uc811 \uc0ac\uc6a9\ud558\uc9c0\ub294 \uc54a\uc558\uc9c0\ub9cc cnn\uc758 \uc5ed\ud560\uc744 \ud560 \uc218 \uc788\ub294 recurrent structure\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc0ac\uc2e4\uc0c1 convolution\ub3c4 \uc544\ub2c8\uace0 recurrent\ub3c4 \uc544\ub2cc \ub450 \uac1c\ub97c \ud63c\ud569\ud55c \uc0c8\ub85c\uc6b4 architecture\uc784<\/li>\n  <li>\uc774\ub807\uac8c \uad6c\ud55c x vector\ub294 word representation\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"text-representation-learning\">Text Representation Learning<\/h3>\n\n<ul>\n  <li>\uadf8\ub9ac\uace0 \uc774\uc5d0 \ub300\ud574\uc11c \ud55c \ubc88 \ub354 weight\ub97c \uacf1\ud558\uace0 nonlinear function\uc744 \uac70\uccd0\uc11c y(2) vector\ub97c \uc5bb\uc5b4\ub0c4<\/li>\n  <li>y(2)\uc5d0 \ub300\ud574\uc11c element-wise maxpooling\uc744 \ud558\uc5ec \ucc28\uc6d0\uc744 \uc720\uc9c0\ud558\ub294 \ucd5c\uc885 y(3) vector\ub97c \uad6c\ud568<\/li>\n  <li>\uc774\ub807\uac8c \uad6c\ud55c y(3)\uc740 \ub9c8\uce58 text representation\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\uc774\ub807\uac8c \ud558\uba74 \ub2e4\uc591\ud55c \uae38\uc774\ub97c \uac00\uc9c0\ub358 text\ub4e4\uc774 fixed length vector\ub85c \ubcc0\ud568<\/li>\n  <li>average pooling\uc740 \ubcc4\ub85c\uc784. max pooling\uc744 \ud574\uc57c document(text)\uc5d0 \uc228\uc5b4\uc788\ub294 \ub9e4\uc6b0 \uc911\uc694\ud55c \uc7a0\uc7ac\uc801\uc778 \uc758\ubbf8 \uc694\uc18c(factor)\ub97c \ucc3e\uc544\ub0b4\ub824\uace0 \ud568. average\ub97c \ud558\uba74 feature\ub4e4\uc774 \ubb49\ub728\uadf8\ub824\uc9c0\ub294 \ub290\ub08c<\/li>\n  <li>y(3)\uc5d0\ub2e4\uac00 FC\ub97c \ubd99\uc774\uace0 softmax\ub97c \ucde8\ud574\uc11c \ucd5c\uc885 classification\uc744 \ud568<\/li>\n<\/ul>\n\n<h2 id=\"experiments\">Experiments<\/h2>\n\n<ul>\n  <li>20Newsgroups, Fudan set, ACL Anthology Network, Stanford Sentiment Treebank(SST), \ucd1d 4\uac00\uc9c0 \ub370\uc774\ud130 \uc14b\uc73c\ub85c \uc2e4\ud5d8\uc744 \uc9c4\ud589\ud568<\/li>\n  <li>\ube44\uad50 \ubaa8\ub378\uc740 bag-of-word \uae30\ubc18\uc758 traditional method\uc640 neural network \uae30\ubc18\uc758 \ubaa8\ub378\ub4e4<\/li>\n  <li>\uacb0\uacfc\ub294 \uc804\ubc18\uc801\uc73c\ub85c traditional method\ubcf4\ub2e4 neural network \uae30\ubc18 \ubaa8\ub378\ub4e4\uc774 \uc131\ub2a5\uc774 \uc88b\uc74c<\/li>\n  <li>\ub610\ud55c convolution \uae30\ubc18\uc758 neural network\uac00 SST \ub370\uc774\ud130 \uc14b\uc5d0\uc11c \ub354 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc774\ub294\ub370, \uc774\ub97c \ud1b5\ud574 convolution\uc744 \ud558\ub294 \uac83\uc774 \uad6c\ubd84\ub825\uc774 \ud070 (discriminative) feature\ub97c \ubf51\uc544\ub0b4\ub294\ub370 \ub354 \ud6a8\uacfc\uc801\uc774\ub77c\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uc74c<\/li>\n  <li>cnn\uc5d0\ub294 max pooling \uacfc\uc815\uc774 \uc788\uae30 \ub54c\ubb38\uc5d0 \uc5ec\uae30\uc11c \uc911\uc694\ud55c feature\ub4e4\uc774 \uc120\ubcc4\ub41c\ub2e4\uace0 \uc0dd\uac01\ud568<\/li>\n  <li>\uadf8\ub9ac\uace0 recursive \uae30\ubc18 \ubaa8\ub378\ub4e4\uc744 \uc2dc\uac04\ubcf5\uc7a1\ub3c4\uac00 O(n^2)\uc73c\ub85c \uc6b0\ub9ac\uc758 \ubaa8\ub378 O(n)\ubcf4\ub2e4 \ud07c. \uc2e4\uc81c\ub85c recursive \ubaa8\ub378\uc740 3~5\uc2dc\uac04 \uac78\ub9ac\uace0 \uc6b0\ub9ac\uaebc\ub294 \uba87 \ubd84 \ubc16\uc5d0 \uc548\uac78\ub9bc<\/li>\n<\/ul>\n\n<h3 id=\"contextual-information\">Contextual Information<\/h3>\n\n<ul>\n  <li>CNN\uc740 \ubb38\ub9e5\uc801\uc778 \uc815\ubcf4\ub97c \uc5bb\uae30 \uc704\ud574\uc11c window \uae30\ubc18\uc758 convolution\uc744 \ud558\uace0 \uc6b0\ub9ac\uac00 \uc81c\uc548\ud55c RCNN\uc758 \uacbd\uc6b0\ub294 recurrent structure\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>\ub54c\ubb38\uc5d0 cnn\uc740 window size\uc5d0 \uc601\ud5a5\uc744 \ub9ce\uc774 \ubc1b\uace0 \uc131\ub2a5\ub3c4 \uc6b0\ub9ac\uac00 \uc81c\uc548\ud55c RCNN\uc774 \ub354 \uc88b\uc558\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"learned-keywords\">Learned Keywords<\/h3>\n\n<ul>\n  <li>\uc6b0\ub9ac\uc758 RCNN\uacfc RNTN \ubaa8\ub378\uc744 \ube44\uad50<\/li>\n  <li>RCNN\uc758 \uacbd\uc6b0 maxpooling\uc5d0\uc11c \uc120\ud0dd\ub418\ub294 \ub2e8\uc5b4\ub97c \ucd94\ucd9c\ud55c \uac83\uc784 -&gt; \uc0c1\ub2f9\ud788 \uc88b\uc740 \ud45c\ud604 \ubc29\ubc95\uc778\ub4ef\ud568<\/li>\n  <li>tri-gram\uc73c\ub85c text\uc758 keywords\ub97c \ubf51\uc740 \uacb0\uacfc\uac00 table 3\uacfc \uac19\uc74c<\/li>\n  <li>RNTN\uc758 \uacbd\uc6b0 \uc804\ud615\uc801\uc778 \uad6c(phrases)\uc758 \ud615\ud0dc\ub97c \ubcf4\uc774\ub294\ub370 \ube44\ud574 RCNN\uc758 \uacbd\uc6b0 \uc911\uc694\ud55c \ud0a4\uc6cc\ub4dc \ud558\ub098\uc640 \uadf8 \uc591\ucabd\uc758 \ub2e8\uc5b4\ub97c \ubcf4\uc5ec\uc90c<\/li>\n  <li>\uc774\ub97c \ud1b5\ud574 \uc6b0\ub9ac\ub294 positive\ub85c \ud310\ubcc4\ud558\ub294\ub370 worth, sweetest, wonderful \ub4f1\uc758 \ub2e8\uc5b4\uac00 \uc911\uc694\ud55c \uc5ed\ud560\uc744 \ud55c\ub2e4\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uace0 negative\uc744 \ud310\ubcc4\ud558\ub294\ub370 awfully, bad, boring\uc774 \uc911\uc694\ud55c \uc5ed\ud560\uc744 \ud55c\ub2e4\ub294 \uac83\uc744 \uc54c \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<ul>\n  <li>\uc6b0\ub9ac\ub294 text classification\uc744 \uc704\ud55c recurrent convolutional neural network\ub97c \uc81c\uc548\ud568<\/li>\n  <li>\uc6b0\ub9ac\uc758 \ubaa8\ub378\uc740 recurrent structure\ub97c \ud1b5\ud574\uc11c contextual information\uc744 \uc7a1\uc544\ub0b4\uace0 maxpooling\uc744 \uc774\uc6a9\ud574 representation of text\ub97c \uc7a1\uc544\ub0b4\uc5c8\uc74c<\/li>\n  <li>CNN\uc774\ub098 RecursiveNN\ubcf4\ub2e4 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n","pubDate":"Sat, 21 Apr 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/rcnn-text-classification\/","guid":"https:\/\/roomylee.github.io\/rcnn-text-classification\/","category":["text-classification","recurrent-convolutional-neural-network","rcnn","blog"]},{"title":"Character-Aware Neural Language Models (AAAI 2016)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1508.06615\">https:\/\/arxiv.org\/abs\/1508.06615<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Yoon Kim (Harvard University)<\/li>\n      <li>Yachine Jernite (Harvard University)<\/li>\n      <li>Divid Sontag (New York University)<\/li>\n      <li>Alexander M. Rush (New York University)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>AAAI 2016<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>character-level input \ub9cc\uc744 \uc774\uc6a9\ud558\ub294 neural language model\uc744 \uc81c\uc548\ud568<\/li>\n  <li>\uc608\uce21(output)\uc740 word-level\uc5d0\uc11c \ud568<\/li>\n  <li>character\uc5d0 \ub300\ud574\uc11c CNN, highway network\ub97c \uc0ac\uc6a9\ud558\uc600\uace0 \uc774\ub4e4\uc758 output\uc744 LSTM \uae30\ubc18 RNN language model (RNN-LM)\uc5d0 \ub123\uc5b4\uc11c \ucd5c\uc885\uc801\uc73c\ub85c word-level \uc608\uce21\uc744 \ud558\ub294 \uac83\uc784<\/li>\n  <li>\ub2f9\uc5f0\ud788 state-of-the-art\ubcf4\ub2e4 \uc0c1\ub2f9\ud788 \uc88b\uc74c<\/li>\n  <li>\ud2b9\ud788 morphology\uac00 \ud48d\ubd80\ud55c \uc5b8\uc5b4\ub4e4\uc5d0\uc11c \ub354\uc6b1 \uc88b\uc740 \uacb0\uacfc\ub97c \ubcf4\uc784<\/li>\n<\/ul>\n\n<h2 id=\"introduction\">Introduction<\/h2>\n\n<ul>\n  <li>Language modeling\uc740 \uc778\uacf5\uc9c0\ub2a5, NLP \ubd84\uc57c\uc5d0\uc11c \uac00\uc7a5 \uae30\ucd08\uc801\uc778 task \uc911 \ud558\ub098\uc784<\/li>\n  <li>\uc5ec\uae30\uc11c language model\uc774\ub780, sequence of word\uc5d0 \ub300\ud55c \ud655\ub960 \ucd94\uc815 \ubaa8\ub378\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc74c. \uc5b4\ub5a4 sequence\uac00 \uc8fc\uc5b4\uc84c\uc744 \ub54c, \uadf8 \ub4a4\uc5d0 \uc62c \ub2e8\uc5b4 \ucd94\uc815\ud558\ub294 \ubaa8\ub378\uc774 \uc774\uc5d0 \ud574\ub2f9\ub428<\/li>\n  <li>\uc61b\ub0a0\uc5d0\ub294 n-gram \ubaa8\ub378\uc744 \uc8fc\ub85c \uc0ac\uc6a9\ud588\ub294\ub370, data sparsity\uc758 \ubb38\uc81c\ub85c \uc0c1\ub2f9\ud788 \uc131\ub2a5\uc774 \ub5a8\uc5b4\uc84c\uc74c<\/li>\n  <li>Neural language model (NLM)\uc740 word embeddings\uc758 \ubc29\ubc95\uc73c\ub85c n-gram\uc758 data sparsity \ubb38\uc81c\ub97c \ud574\uacb0\ud568<\/li>\n  <li>Mikolov\uac00 \ubc1c\ud45c\ud55c NLM \ubaa8\ub378\uc774 count-based n-gram language model\uc744 \ub2a5\uac00\ud558\uae34\ud588\uc9c0\ub9cc, \uc774 \uc5ed\uc2dc subword information (e.g. morphemes)\ub97c \ub2f4\uc744 \uc218 \uc5c6\uc5c8\uc74c<\/li>\n  <li>\uc608\ub97c \ub4e4\uba74, \uc0ac\uc804 \uc9c0\uc2dd(priori) \uc5c6\uc774 eventful, eventfully, uneventful, uneventfully\uac00 \uad00\uacc4\uac00 \uae4a\ub2e4\ub294 \uac83\uc744 embedding\ud560 \uc218 \uc5c6\ub2e4\ub294 \uac83\uc784<\/li>\n  <li>(\ud559\uc2b5 \ub370\uc774\ud130\uc5d0 \ub300\ud574) \ud76c\uadc0\ud55c \ub2e8\uc5b4\ub4e4\uc77c\uc218\ub85d embedding\uc774 \uc798\ubabb\ub420 \uac00\ub2a5\uc131\uc774 \ud06c\uace0 \uc774 \ub54c\ubb38\uc5d0 \uc131\ub2a5\uc758 \uc800\ud558\uac00 \uc0dd\uae40<\/li>\n  <li>\ud2b9\ud788 \ud615\ud0dc\uc18c(\uc5b4\uadfc)\uc774 \ud48d\ubd80\ud55c \uc5b8\uc5b4\uc774\uac70\ub098 SNS\uc5d0\uc11c\ucc98\ub7fc \ub2e8\uc5b4\ub97c dynamic\ud558\uac8c \uc0ac\uc6a9\ud558\uac8c \ub418\uba74 \uc774\ub7f0 \ubb38\uc81c\uac00 \ub354 \uc2ec\uac01\ud568<\/li>\n  <li>\uc6b0\ub9ac\ub294 character-level CNN\uacfc RNN-LM\uc744 \ud1b5\ud574 subword information \ud65c\uc6a9\ud55c language model\uc744 \uc81c\uc548\ud558\ub294 \ubc14\uc784<\/li>\n  <li>\uc774\uc804 \uc5f0\uad6c\uc640 \ub2ec\ub9ac morphological tagging (like POS)\uc774\ub098 \uc804\ucc98\ub9ac \ub2e8\uacc4\uac00 \ud544\uc694\uc5c6\uace0, word embedding\ub3c4 input\uc73c\ub85c \uc0ac\uc6a9\ud558\uc9c0 \uc54a\uc74c<\/li>\n  <li>\ub54c\ubb38\uc5d0 \ud6e8\uc52c \uc801\uc740 parameters\ub97c \uc0ac\uc6a9\ud558\uace0, \uadf8\ub7fc\uc5d0\ub3c4 \ubd88\uad6c\ud558\uace0 \ud604\uc7ac state-of-the-art \ubcf4\ub2e4 \ub3d9\ub4f1\ud558\uac70\ub098 \ub354 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h2 id=\"model\">Model<\/h2>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/37864158-2e1ecc06-2fae-11e8-8230-cf83cce548a7.png\" alt=\"model\" \/><\/p>\n\n<ul>\n  <li>\uc704 \uadf8\ub9bc\uc744 \ud1b5\ud574\uc11c \ubaa8\ub378 \uc804\uccb4 \ud750\ub984\uc744 \ubcfc \uc218 \uc788\ub294\ub370, \ud604\uc7ac \ubaa8\ub378\uc774 \ub2e8\uc5b4 *absurdity- \ub97c \uc785\ub825\uc73c\ub85c \ubc1b\uace0 \uc774\uc804 history(as represented by the hidden state)\ub97c \uc870\ud569\ud574\uc11c \ub2e4\uc74c\uc5d0 \uc62c \ub2e8\uc5b4\uc778 *is- \ub97c \uc608\uce21\ud558\ub294 \uc0c1\ud669\uc784<\/li>\n  <li>\uccab layer\ub294 lookup table of character embeddings (of dimension four)\uc744 \uc774\uc6a9\ud574\uc11c \ub2e8\uc5b4\ub97c \ubca1\ud130\ud654\ud568<\/li>\n  <li>embedding vector\uc5d0 \ub300\ud574\uc11c convolution\uc744 \uc9c4\ud589\ud568<\/li>\n  <li>\uadf8\ub9bc\uc5d0\uc11c \ud30c\ub780\uc0c9\uc740 width=2 \ud544\ud130 3\uac1c, width=3 \ud544\ud130 4\uac1c, width=4 \ud544\ud130 5\uac1c\ub97c \uc0ac\uc6a9\ud558\uc600\uc74c. (\uc65c \uac1c\uc218\ub97c \ub2e4\ub974\uac8c \uc0ac\uc6a9\ud588\ub294\uc9c0\ub294 \uc798 \ubaa8\ub974\uaca0\uc74c)<\/li>\n  <li>fixed-dimensional representation of the word\ub97c \uc5bb\uae30 \uc704\ud574\uc11c convolution \uacb0\uacfc (feature maps)\uc5d0 max-over-time pooling (=max-pooling)\uc744 \uc9c4\ud589\ud568<\/li>\n  <li>\uc989, \uc5ec\uae30\uae4c\uc9c0 \ud55c \uacb0\uacfc\ub97c representation of the word\ub77c\uace0 \ubcfc \uc218 \uc788\ub294 \uac70 \uac19\uc74c<\/li>\n  <li>\uc774\ub807\uac8c \uc5bb\uc740 representation of the word (vector)\ub97c highway network\uc5d0 \ub123\uc74c. highway network\uc774 \ubb34\uc5c7\uc778\uc9c0\ub294 \uc544\uc9c1 \uc798 \ubaa8\ub974\uaca0\uc74c<\/li>\n  <li>highway network\uc758 output\uc740 \ucd5c\uc885 \ubaa8\ub378\uc778 multi-layer LSTM\uc5d0 \ub4e4\uc5b4\uac00\uace0 output\uc5d0 softmax\ub97c \ucde8\ud574 \ub2e4\uc74c\uc5d0 \uc62c \ub2e8\uc5b4\uc778 *is- \ub97c \uc608\uce21\ud558\ub294 \uac83\uc784. \uc774\ub54c\ub294 word embedding lookup table\uc744 \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h3 id=\"highway-network\">Highway Network<\/h3>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/38014035-2c1e1fb4-32a2-11e8-8120-3f300f305816.png\" alt=\"eq\" \/><\/p>\n\n<ul>\n  <li>\uccab \ubc88\uc9f8 \uc2dd\uc774 highway network\uc758 \ubaa8\ub4e0 \uc5f0\uc0b0 \uc218\uc2dd\uc784<\/li>\n  <li>\uc758\ubbf8\ub85c \ubcf4\uc790\uba74 input \uc6d0\ubcf8\uc774\ub791 fully connected layer \ud55c \ubc88 \uac70\uce5c \uac70 \uc911\uc5d0\uc11c \ub354 \uad1c\ucc2e\uc740 \uac78 \ud559\uc2b5\ud558\uaca0\ub2e4\ub294 \uac83\uc774\uace0 \uc774 \ub458\uc758 \ubc18\uc601 \ube44\uc728\uc740 t\uc640 (1-t)\ub85c \ub098\ud0c0\ub0a8<\/li>\n  <li>\ubc18\uc601 \ube44\uc728 t \uc5ed\uc2dc fully connected layer\ub97c \uac70\uccd0\uc11c \uacc4\uc0b0\ud558\uac8c \ub418\uace0 \uc774\ub294 \ub450 \ubc88\uc9f8 \uc2dd\uacfc \uac19\uc774 \ud45c\ud604\ud560 \uc218 \uc788\uc74c<\/li>\n  <li>t\ub294 \ube44\uc728\uc774\uae30 \ub54c\ubb38\uc5d0 0~1\uc758 \uac12\uc744 \uc5bb\uc5b4\uc57c \ud558\uace0 \ub530\ub77c\uc11c sigmoid \ud568\uc218\ub85c \ucc98\ub9ac\ud568<\/li>\n  <li>fully connected\uc758 \uac00\uc911\uce58 W \ubca1\ud130\ub4e4(WT, WH)\uc740 \ubaa8\ub450 square matrix\uc784. \uc989 \ub2f9\uc5f0\ud55c \uac70\uc9c0\ub9cc in\/out\uc758 \ucc28\uc6d0\uc740 \uadf8\ub300\ub85c \uc720\uc9c0\uc2dc\ud0a8\ub2e4\ub294 \uac70\uc784<\/li>\n<\/ul>\n\n<h2 id=\"experimental-setup\">Experimental Setup<\/h2>\n\n<ul>\n  <li>Perplexity(PPL)\ub77c\ub294 \ud3c9\uac00\uc9c0\ud45c\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>PPL\uc740 loss function\uc73c\ub85c NLL(Negative Log Likelihood)\ub97c \uc0ac\uc6a9\ud558\ub294\ub370 \uc774\ub97c sequence \uae38\uc774 T\ub85c \ub098\ub204\uace0 exp\ub97c \ucde8\ud55c \uac12\uc784<\/li>\n<\/ul>\n\n<h2 id=\"discussion\">Discussion<\/h2>\n\n<h3 id=\"learned-word-representation\">Learned Word Representation<\/h3>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/38018490-fc207f42-32af-11e8-86cb-089808a89e1a.png\" alt=\"table\" \/><\/p>\n\n<ul>\n  <li>\ubaa8\ub378\uc758 \uc5ec\ub7ec \uce35\uc5d0 \ub300\ud55c Word Representation\uc744 \ube44\uad50 \ubc0f \ubd84\uc11d\ud558\ub294 \ub0b4\uc6a9\uc784. Table 6\uc744 \ucc38\uace0<\/li>\n  <li>Table 6\uc5d0\uc11c before highway\ub97c \ubcf4\uba74 <em>you<\/em> \uc758 \uc720\uc0ac\ub2e8\uc5b4\ub85c <em>your<\/em>, <em>young<\/em>, <em>four<\/em>, <em>youth<\/em> \ucc98\ub7fc edit distance\uac00 \uac00\uae4c\uc6b4 \uc560\ub4e4\uc774 \ub098\uc634<\/li>\n  <li>\uc774\ub294 \ud45c\uba74\uc801\uc778 \ud615\ud0dc(\ube44\uc2b7\ud55c \uae00\uc790)\ub97c \ud559\uc2b5\ud588\ub2e4\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>after highway\ub97c \ubcf4\uba74 <em>you<\/em> \uc758 \uc720\uc0ac\ub2e8\uc5b4\ub85c \ucca0\uc790 \uc0c1\uc73c\ub85c \uac70\ub9ac\uac00 \uc788\ub294 <em>we<\/em> \uac00 \ubf51\ud798. <em>while<\/em> \uc758 \uc720\uc0ac\ub2e8\uc5b4\ub85c <em>though<\/em> \uac00 \ubf51\ud788\ub294 \uac83\ub3c4 \uc5ed\uc2dc \ub9c8\ucc2c\uac00\uc9c0\uc784<\/li>\n  <li>\ub530\ub77c\uc11c highway layer\uac00 \uc774\ub7f0 \ucca0\uc790\ub9cc\uc73c\ub85c\ub294 \uc54c \uc218 \uc5c6\ub294 semantic feature\ub97c \ud559\uc2b5\ud588\ub2e4\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\ub2e4\ub9cc <em>his<\/em> \uc758 \uc720\uc0ac\ub2e8\uc5b4\ub85c <em>hhs<\/em> \uac00 \uaf3d\ud788\ub294 \uac83\uacfc \uac19\uc774 \uc2e4\uc218\ub3c4 \uc874\uc7ac\ud558\ub294 \uac83\uc774 \ud559\uc2b5 \ub370\uc774\ud130\uc758 \ud06c\uae30\uac00 \uc791\uae34\ud574\ub3c4 \uc774 approach\uc758 \ud55c\uacc4\uc810\uc73c\ub85c \ubcfc \uc218 \uc788\uc74c<\/li>\n  <li>\uc911\uc694\ud55c \uac83\uc740 Out-of-Vocabulary(OOV)\uc5d0\uc11c \uc2e0\uae30\ud55c \uacb0\uacfc\uac00 \ub098\uc628\ub2e4\ub294 \uac83\uc784<\/li>\n  <li>before\/after highway\uc758 \uacbd\uc6b0, <em>computer-aided<\/em> \ub098 <em>misinformed<\/em> \uc640 \uac19\uc740 \ub2e8\uc5b4\uc758 \uc720\uc0ac\ub2e8\uc5b4\ub85c \ud488\uc0ac(POS)\uac00 \uac19\uc740 \ub2e8\uc5b4\ub97c \ucc3e\uc544\ub0c4<\/li>\n  <li><em>looooook<\/em> \uc758 \uc720\uc0ac\ub2e8\uc5b4\ub85c <em>look<\/em> \uc744 \ucc3e\ub294 \uac83\uc73c\ub85c \ubcf4\uc544, \uc798\ubabb\ub41c \ub2e8\uc5b4\ub098 \uc2e0\uc870\uc5b4 \uac19\uc774 noisy domain\uc5d0 \ub300\ud55c text normalization\uc73c\ub85c \uc751\uc6a9\ud574\ubcfc \uc218\ub3c4 \uc788\uc744 \uac83 \uac19\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"highway-layers\">Highway Layers<\/h3>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/38020507-60f1302e-32b5-11e8-8da4-42102789e009.png\" alt=\"table78\" \/><\/p>\n\n<ul>\n  <li>Table 7\uc740 highway layer\uc758 \uac1c\uc218\ub97c \ubcc0\ud654\uc2dc\ucf30\uc744 \ub54c\uc758 \uc131\ub2a5\uc744 \ub098\ud0c0\ub0b4\ub294 \ud45c\uc784. \uc218\uce58\uac00 \ub0ae\uc744\uc218\ub85d \uc88b\uc740 \uac70\uc784<\/li>\n  <li>One MLP Layer\ub294 highway\ub97c \uc548\ud558\uace0 \uadf8\ub0e5 MLP\ub97c \ud588\uc744 \ub54c\ub97c \uc758\ubbf8\ud558\ub294 \ub4ef<\/li>\n  <li>\uc99d\uba85\ud560 \uc218\ub294 \uc5c6\uc9c0\ub9cc(anecdotally) \uc2e4\ud5d8\ud558\uba74\uc11c \uc544\ub798\uc640 \uac19\uc740 \ub290\ub08c\uc744 \ubc1b\uc74c\n    <ul>\n      <li>\ud558\ub098\uc5d0\uc11c \ub450\uac1c\uc758 layer\ub97c \uac16\ub294 \uac83\uc774 \uc911\uc694\ud558\uace0, \uc774 \uc774\uc0c1 \uce35\uc744 \uc313\uc544\ub3c4 \uadf8\ub2e5 \uc131\ub2a5\ud5a5\uc0c1\uc740 \uc5c6\uc74c. \ubb3c\ub860 \ub370\uc774\ud130 \uc14b \ud06c\uae30\uc5d0 \ub530\ub77c \ub2ec\ub77c\uc9c0\uae30\ub294 \ud560 \ub4ef<\/li>\n      <li>max-pooling \uc774\uc804\uc5d0 convolutional layer\ub294 \uc313\uc544\ub3c4 \ubcc4\ub85c \ub3c4\uc6c0\uc740 \uc548\ub428<\/li>\n      <li>word embedding\uc744 \uc4f4 \ubaa8\ub378\uc5d0 \ub300\ud574\uc11c highway layer\uac00 \uc131\ub2a5 \ud5a5\uc0c1\uc744 \uac00\uc838\uc624\uc9c0 \uc54a\uc74c<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"effect-of-corpusvocab-sizes\">Effect of Corpus\/Vocab Sizes<\/h3>\n\n<ul>\n  <li>\n    <table>\n      <tbody>\n        <tr>\n          <td>Table 8\uc740 word-level model\uc5d0\uc11c character-level model\ub85c \ubc14\uafe8\uc744 \ub54c, PPL\uc774 \uac10\uc18c\ud558\ub294 (\uc131\ub2a5\uc774 \uc99d\uac00\ud558\ub294) \ube44\uc728\uc744 corpus \ud06c\uae30\uc640 vocabulary \ud06c\uae30\uc5d0 \ub530\ub77c \ub098\ud0c0\ub0b8 \uac83\uc784. T\ub294 \ucf54\ud37c\uc2a4\ud06c\uae30,<\/td>\n          <td>V<\/td>\n          <td>\ub294 vocabulary \ud06c\uae30\uc784<\/td>\n        <\/tr>\n      <\/tbody>\n    <\/table>\n  <\/li>\n  <li>vocabulary \ud06c\uae30\ub97c \ubcc0\ud654\uc2dc\ud0a4\uae30 \uc704\ud574\uc11c, most frequent k word\ub97c \uc0ac\uc6a9\ud558\uace0 \ub098\uba38\uc9c0\ub294 \ub2e4 <unk> \ud1a0\ud070\uc73c\ub85c \uce58\ud658\ud574\ubc84\ub9bc. \uc989 \uc0ac\uc804\uc758 \ud06c\uae30\uac00 k\uac00 \ub418\ub294 \uac70\uc784<\/unk><\/li>\n  <li>\n    <table>\n      <tbody>\n        <tr>\n          <td>T=1m,<\/td>\n          <td>V<\/td>\n          <td>=100k\uac00 \ube44\uc5b4\uc788\ub294 \uc774\uc720\ub294 \ucf54\ud37c\uc2a4\uac00 \uc791\uc544\uc11c 100k\ubcf4\ub2e4 \ub2e8\uc5b4 \uac1c\uc218\uac00 \uc801\uc5b4\uc11c \uadf8\ub7f0 \uac70\uc784<\/td>\n        <\/tr>\n      <\/tbody>\n    <\/table>\n  <\/li>\n  <li>\ud45c\ub97c \ubcf4\uba74 vocabulary \ud06c\uae30\uac00 \ud06c\uace0 \ucf54\ud37c\uc2a4 \ud06c\uae30\uac00 \uc791\uc744\uc218\ub85d character-level model\uc774 \ub354\uc6b1 \uc131\ub2a5 \ud5a5\uc0c1\uc744 \uac00\uc838\uc634<\/li>\n  <li>\uc758\ubbf8\ub97c \uc0dd\uac01\ud574\ubcf4\uba74, \ucf54\ud37c\uc2a4 \ud06c\uae30\uac00 \uc791\uc744\uc218\ub85d word representation\uc744 \ud559\uc2b5\ud558\uae30 \ucda9\ubd84\uce58 \uc54a\uc544\uc11c word-level model\uc758 \uc131\ub2a5\uc774 \uc548\ub098\uc62c\ud14c\uace0, \uadf8\uc5d0 \ube44\ud574 character-level model\uc740 \uae00\uc790 \uc790\uccb4\ub97c \ubcf4\uae30\uc5d0 \ud559\uc2b5 \ub370\uc774\ud130\uac00 \uc801\uc5b4\ub3c4 \uc131\ub2a5 \ud558\ub77d\uc774 \ud06c\uc9c0 \uc54a\uc740 \uac83 \uac19\uc74c.<\/li>\n  <li>\ub610\ud55c vocabulary\uac00 \ud074\uc218\ub85d \ub2e4\uc591\ud55c \ub2e8\uc5b4\uc758 \ud488\uc0ac\uc5d0\uc11c \uc624\ub294 \ud615\ud0dc(ed, ing, ly)\ub97c \ubcfc \uc218 \uc788\uc5b4\uc11c character-level\uc740 \uc88b\uc744 \uac83 \uac19\uace0, word-level\ub3c4 \ubb3c\ub860 \uc88b\uae34 \ud558\uaca0\uc9c0\ub9cc \uc120\ud0dd\uc9c0\uac00 \ub9ce\uc544\uc9c0\ub294 \ud6a8\uacfc\ub3c4 \uc0dd\uaca8\uc11c \ubd84\ub958 \uc131\ub2a5\uc774 \ub5a8\uc5b4\uc9c0\ub294 \uacb0\uacfc\ub97c \ub0b3\uc740 \uac8c \uc544\ub2cc\uac00 \uc2f6\uc74c<\/li>\n  <li>\uc5ec\ud558\ud2bc, \ubaa8\ub4e0 \ucf00\uc774\uc2a4\uc5d0 \ub300\ud574\uc11c character-level model\uc774 \ub354 \uc88b\uc74c (\ud45c\ub97c \ubcf4\uba74 \ub2e4 +00% \uc774\ubbc0\ub85c)<\/li>\n<\/ul>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<ul>\n  <li>\uc6b0\ub9ac\ub294 character-level input\uc5d0 \ub300\ud55c neural language model\uc744 \uc81c\uc548\ud568<\/li>\n  <li>parameter\uac00 \uc801\uc74c\uc5d0\ub3c4 \ubd88\uad6c\ud558\uace0 \uc774\uc804 \ubaa8\ub378\uc5d0 \uc900\ud558\uac70\ub098 \ub354 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>CharCNN\uacfc highway layer\ub97c \uc774\uc6a9\ud574\uc11c word2vec\uc758 \ud55c\uacc4\uc810\uc744 \uadf9\ubcf5\ud558\ub294 word embedding\uc744 \ud560 \uc218 \uc788\ub2e4\ub294 \uac00\ub2a5\uc131\uc744 \ubcf4\uc5ec\uc90c<\/li>\n  <li>\ubaa8\ub4e0 NLP task\ub294 input\uc5d0 \ub300\ud55c sequential processing of word\ub97c \ud574\uc57c\ud568.<\/li>\n  <li>\ub530\ub77c\uc11c, Encoder-Decoder model \uac19\uc740 \ub2e4\ub978 model\uc758 input\uc5d0 \ub300\ud574 \uc6b0\ub9ac\uc758 architecture\ub97c \uc0ac\uc6a9\ud558\uba74 \ud765\ubbf8\ub85c\uc6b4 \uacb0\uacfc\ub97c \uae30\ub300\ud574\ubcfc \uc218 \uc788\uc744 \uac70\uc784<\/li>\n<\/ul>\n","pubDate":"Thu, 29 Mar 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/character-aware-lm\/","guid":"https:\/\/roomylee.github.io\/character-aware-lm\/","category":["character-level","language-model","blog"]},{"title":"FaceNet: A Unified Embedding for Face Recognition and Clustering (CVPR 2015)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/arxiv.org\/abs\/1503.03832\">https:\/\/arxiv.org\/abs\/1503.03832<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Florian Schroff (Google)<\/li>\n      <li>Dmitry Kalenichenko (Google)<\/li>\n      <li>James Philbin (Google)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>CVPR 2015<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Implementing face verification and recognition efficiently at scale presents serious challenges to current approaches.\n    <ul>\n      <li>\u27a4 \ud06c\uae30\uc5d0 \ub9de\uac8c \ud6a8\uc728\uc801\uc73c\ub85c \uc5bc\uad74 \ud655\uc778 \ubc0f \uc778\uc2dd\uc744 \uad6c\ud604\ud558\ub294 \uac83\uc740 \ud604\uc7ac\uc758 \uc811\uadfc\ubc95\uc73c\ub85c \ubcf4\uc558\uc744 \ub54c \uc0c1\ub2f9\ud788 \ub3c4\uc804\uc801\uc778 \uacfc\uc81c\uc774\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>In this paper, we present <em>FaceNet<\/em>, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity.\n    <ul>\n      <li>\u27a4 \uc774 \ub17c\ubb38\uc5d0\uc11c \uc6b0\ub9ac\ub294 <em>FaceNet<\/em>\uc744 \uc18c\uac1c\ud560 \uac83\uc774\ub2e4. <em>FaceNet<\/em>\uc740 \uc720\ud074\ub9ac\ub514\uc548 \uacf5\uac04\uc5d0 \uc5bc\uad74 \uc774\ubbf8\uc9c0\ub97c \ub9e4\ud551\ub418\ub294 \uc88c\ud45c\ub97c \ud559\uc2b5\uc2dc\ud0a4\uace0, \ub9e4\ud551\ub41c \uc88c\ud45c\ub4e4\uc758 \uac70\ub9ac\ub294 \uc5bc\uad74\uc758 \uc720\uc0ac\ub3c4\ub97c \uc758\ubbf8\ud558\uac8c \ub41c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Our method uses a deep CNN trained to directly optimize the embedding itself, rather than an intermediate bottleneck layer as in previous deep learning approaches.\n    <ul>\n      <li>\u27a4 \uc6b0\ub9ac\ub294 \uc774\uc804 \uc5f0\uad6c\ucc98\ub7fc \uc911\uac04\uc5d0 bottleneck layer\ub97c \uc0ac\uc6a9\ud558\uae30 \ubcf4\ub2e4 \uc9c1\uc811\uc801\uc73c\ub85c embedding layer\ub97c \ucd5c\uc801\ud654\ud558\ub3c4\ub85d CNN\uc744 \ud559\uc2b5\uc2dc\ucf30\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>On the widely used <em>Labeled Faces in the Wild (LFW)<\/em> dataset and <em>YouTube Faces<\/em> DB, our system achieves a accuracy better than the previous best published result.\n    <ul>\n      <li>\u27a4 \uc6b0\ub9ac\uc758 \uc2dc\uc2a4\ud15c\uc740 <em>LFW<\/em> dataset\uacfc <em>YouTube Faces<\/em> DB\uc5d0 \ub300\ud574\uc11c \uc774\uc804 \ucd5c\uace0 \uae30\ub85d\ubcf4\ub2e4 \ub354 \ub098\uc740 \uc815\ud655\ub3c4\ub97c \uc5bb\uc5c8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>In this paper we present a unified system for \u27a4 \uc6b0\ub9ac\ub294 \uc544\ub798\uc758 task\ub97c \uc704\ud55c \ud1b5\ud569 \uc2dc\uc2a4\ud15c\uc744 \uc81c\uc548\ud55c\ub2e4.\n    <ol>\n      <li>face verification (is this the same person) \u27a4 \uac19\uc740 \uc0ac\ub78c\uc778\uac00?<\/li>\n      <li>face recognition (who is this person) \u27a4 \uc774 \uc0ac\ub78c\uc740 \ub204\uad6c\uc778\uac00?<\/li>\n      <li>face clustering (find common people among these faces) \u27a4 \uacf5\ud1b5\ub41c \uc0ac\ub78c\uc744 \ucc3e\uc544\ub77c(\ubb36\uc5b4\ub77c)<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/35256631-0fd0f62a-0038-11e8-85e4-67dd005ab981.png\" alt=\"facenet\" class=\"center\" \/><\/p>\n\n<ul>\n  <li>The network(CNN) is trained such that the squared L2 distances in the embedding space directly correspond to face similarity: faces of the same person have small distances and faces of distinct people have large distances.\n    <ul>\n      <li>\u27a4 \ub124\ud2b8\uc6cc\ud06c\ub294 embedding space\uc5d0\uc11c squared L2 distance\uac00 \uc5bc\uad74 \uc720\uc0ac\ub3c4\uc5d0 \ub300\uc751\ud558\ub3c4\ub85d \ud559\uc2b5\ub41c\ub2e4. \uac19\uc740 \uc0ac\ub78c\uc758 \uc5bc\uad74\uc758 \uacbd\uc6b0 \uac70\ub9ac\uac00 \uc791\uace0 \ub2e4\ub978 \uc0ac\ub78c\uc740 \ud070 \uac70\ub9ac\ub97c \uac16\ub294\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Once this embedding has been produced, \u27a4 \ud55c\ubc88 embedding\uc774 \ud559\uc2b5\ub418\uace0 \ub098\uba74,\n    <ol>\n      <li>face verification simply involves thresholding the distance between the two embeddings\n        <ul>\n          <li>\u27a4 face verification\uc740 distance\uc5d0 \ub300\ud55c thresholding\uc744 \ud1b5\ud574 \ud574\uacb0\ud560 \uc218 \uc788\ub2e4.<\/li>\n        <\/ul>\n      <\/li>\n      <li>face recognition becomes a k-NN classification problem\n        <ul>\n          <li>\u27a4 face recognition\uc740 k-NN \ubd84\ub958 \ubb38\uc81c\ub85c \ud480 \uc218 \uc788\ub2e4.<\/li>\n        <\/ul>\n      <\/li>\n      <li>face clustering can be achieved using off-the-shelf techniques such as k-means or agglomerative clustering\n        <ul>\n          <li>\u27a4 face clustering\uc740 \uac70\ub9ac k-means \uac19\uc740 off-the-shelf \ud14c\ud06c\ub2c9\uc744 \uc0ac\uc6a9\ud558\uc5ec \ud480 \uc218 \uc788\ub2e4.<\/li>\n        <\/ul>\n      <\/li>\n    <\/ol>\n  <\/li>\n  <li>Previous face recognition approaches based on deep networks take an intermediate bottleneck layer as a representation.\n    <ul>\n      <li>\u27a4 \uc774\uc804\uc5d0 deep networks \uae30\ubc18\uc758 face recognition \uc5f0\uad6c\ub294 representation(embeddings)\ub85c \uc911\uac04\uc5d0 \uc788\ub294 bottleneck layer\ub97c \uc0ac\uc6a9\ud588\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>The downsides of this approach are its indirectness and its inefficiency. \u27a4 \uc774\ub7f0 \uc811\uadfc\uc758 \ub2e8\uc810\uc740 \ube44\uc9c1\uc811\uc131(\uac04\uc811\uc131)\uacfc \ube44\ud6a8\uc728\uc131\uc774\ub2e4.\n    <ul>\n      <li>one has to hope that the bottleneck representation generalizes well to new faces\n        <ul>\n          <li>\u27a4 \uc0c8\ub85c\uc6b4 \uc5bc\uad74\uc5d0 \ub300\ud574\uc11c bottleneck representation\uc774 \uc798 \uc77c\ubc18\ud654\ud558\uc5ec \ud310\ub2e8\ud574\uc57c \ud55c\ub2e4. (\uc0c8\ub85c\uc6b4 \uc5bc\uad74\uc744 \uc798 \uc778\uc2dd\ud558\uc9c0 \ubabb \ud560 \uac00\ub2a5\uc131\uc774 \ud06c\ub2e4.)<\/li>\n        <\/ul>\n      <\/li>\n      <li>representation size per face is usually very large (1000s of dimensions)\n        <ul>\n          <li>\u27a4 \uc5bc\uad74\uc5d0 \ub300\ud55c representation size\uac00 \ubcf4\ud1b5 \ub108\ubb34 \ud06c\ub2e4.<\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>Incontrast, <em>FaceNet<\/em> directly trains its output to be a compct 128-D embedding using a triplet based loss function based on LMNN.\n    <ul>\n      <li>\u27a4 \ubc18\uba74, <em>FaceNet<\/em>\uc740 LMNN\uc5d0\uc11c\uc758 triplet\uae30\ubc18\uc758 loss function\uc744 \uc0ac\uc6a9\ud558\uc5ec \uc9c1\uc811\uc801\uc73c\ub85c output\uc774 128-D embedding\uc774 \ub418\ub3c4\ub85d \ud559\uc2b5\uc744 \uc2dc\ud0a8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Our triplets consist of two matching face thumbnails and a non-matching face thumbnail and the loss aims to separate the positive pair from the negative by a distance margin.\n    <ul>\n      <li>\u27a4 triplets\uc740 2\uac1c\uc758 matching face\uc640 1\uac1c\uc758 non-matching face\ub85c \uad6c\uc131\ub418\uace0 loss\ub294 positive pair\ub97c negative\ub85c\ubd80\ud130 \uac70\ub9ac \uc0c1\uc73c\ub85c \ub5bc\uc5b4\ub0b4\ub294 \uac83\uc744 \ubaa9\uc801\uc73c\ub85c \ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Choosing which triplets to use turns out to be very important for achieving good performance and, inspired by curriculum learning, we present a novel online negative exemplar mining strategy which ensures consistently increasing difficulty of triplets as the network trains.\n    <ul>\n      <li>\u27a4 triplets\uc744 \uc120\uc815\ud558\ub294 \uac83\uc774 \uc88b\uc740 \uc131\ub2a5\uc744 \uc5bb\ub294\ub370 \uc788\uc5b4\uc11c \ub9e4\uc6b0 \uc911\uc694\ud558\ub2e4. \uc6b0\ub9ac\ub294 curriculum learning\uc5d0\uc11c \uc601\uac10\uc744 \ubc1b\uc544 \ub124\ud2b8\uc6cc\ud06c \ud559\uc2b5\uc5d0 \uc788\uc5b4\uc11c triplet\uc758 \uc5b4\ub824\uc6c0\uc744 \uc9c0\uc18d\uc801\uc73c\ub85c \uc99d\uac00\uc2dc\ud0a4\ub294 \uc0c8\ub85c\uc6b4 online negative exemplar mining strategy\uc744 \uc81c\uc2dc\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"3-method\">3. Method<\/h2>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/35470350-3d688b74-038b-11e8-992b-2ed6db6f6a5b.png\" alt=\"architecture_triplet\" \/><\/p>\n\n<ul>\n  <li>We employ the triplet loss that directly reflects what we want to achieve in face verification, recognition and clustering.\n    <ul>\n      <li>\u27a4 \uc6b0\ub9ac\ub294 triplet loss\ub97c \uc774\uc6a9\ud558\uc5ec face verification, recognition and clustering\uc5d0\uc11c \uc131\ucde8\ud558\uae38 \uc6d0\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>we strive for an embedding f(x), from an image x into a feature space <img src=\"https:\/\/latex.codecogs.com\/svg.latex?R^d\" \/>, such that the squared distance between all faces, independent of imaging conditions, of the same identity is small, whereas the squared distance between a pair of face images from different identities is large.\n    <ul>\n      <li>\u27a4 f(x)\ub77c\ub294 \uc784\ubca0\ub529 \ud568\uc218\ub97c \uc5bb\uace0\uc790 \ud55c\ub2e4. \uc5b4\ub5a4 \uc774\ubbf8\uc9c0 x\ub85c \ubd80\ud130 feature space <img src=\"https:\/\/latex.codecogs.com\/svg.latex?R^d\" \/>\ub97c \uc5bb\ub294 \uac83\uc774\ub2e4. \uac19\uc740 identity pair\uc758 \uacbd\uc6b0\uc5d0\ub294 feature space\uc5d0\uc11c\uc758 distance\uac00 \uc791\uace0 \ub2e4\ub978 identity pair\uc758 \uacbd\uc6b0\uc5d0\ub294 distance\uac00 \ucee4\uc9c0\uac8c \uc784\ubca0\ub529\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"31-triplet-loss\">3.1. Triplet Loss<\/h3>\n\n<ul>\n  <li>Here we want to ensure that an image <img src=\"https:\/\/latex.codecogs.com\/svg.latex?x_i^a\" \/> (anchor) of a specific person is closer to all other images <img src=\"https:\/\/latex.codecogs.com\/svg.latex?x_i^p\" \/> (positive) of the same person than it is to any image <img src=\"https:\/\/latex.codecogs.com\/svg.latex?x_i^n\" \/> (negative) of any other person. This is visualized in Figure 3.\n    <ul>\n      <li>\u27a4 \ud2b9\uc815 \uc778\ubb3c\uc744 anchor\ub85c \ub450\uace0 \uc774\uc640 \uac19\uc740 \uc0ac\ub78c\ub4e4\uc778 positive\ub97c \ub2e4\ub978 \uc0ac\ub78c\uc778 negative\ubcf4\ub2e4 \uac70\ub9ac\uac00 \uac00\uae5d\ub3c4\ub85d \ud55c\ub2e4. \uc2dc\uac01\uc801\uc73c\ub85c \ub098\ud0c0\ub0b4\uba74 Figure 3\uc640 \uac19\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/latex.codecogs.com\/svg.latex?||f(x_i^a)-f(x_i^p)||_2^2+\\alpha&lt;||f(x_i^a)-f(x_i^n)||_2^2\" \/><\/p>\n\n<ul>\n  <li>where \u03b1 is a margin that is enforced between positive and negative pairs\n    <ul>\n      <li>\u27a4 \u03b1\ub294 positive\uacfc negative \uac04\uc758 margin\uc744 \uc758\ubbf8\ud55c\ub2e4. \uc989, anchor\uc5d0 \ub300\ud574 negative\ub294 positive\ubcf4\ub2e4 \ucd5c\uc18c \u03b1 \uc774\uc0c1 \ub5a8\uc5b4\uc838 \uc788\ub3c4\ub85d \ud558\ub294 \uac83\uc774\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>The loss that is being minimized is then<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/latex.codecogs.com\/svg.latex?L=\\sum_i^N{[||f(x_i^a)-f(x_i^p)||_2^2-||f(x_i^a)-f(x_i^n)||_2^2+\\alpha]_+}\" \/><\/p>\n\n<ul>\n  <li><img src=\"https:\/\/latex.codecogs.com\/svg.latex?[x]_+=max(0,x)\" \/><\/li>\n  <li><img src=\"https:\/\/latex.codecogs.com\/svg.latex?||f(x)||_2=1\" \/><\/li>\n  <li><img src=\"https:\/\/latex.codecogs.com\/svg.latex?\\alpha=0.2\" \/><\/li>\n<\/ul>\n\n<h3 id=\"32-triplet-selection\">3.2. Triplet Selection<\/h3>\n\n<ul>\n  <li>Given <img src=\"https:\/\/latex.codecogs.com\/svg.latex?x_i^a\" \/>, we want to select an <img src=\"https:\/\/latex.codecogs.com\/svg.latex?x_i^p\" \/> (hard positive) such that <img src=\"https:\/\/latex.codecogs.com\/svg.latex?argmax_{x_i^p}{||f(x_i^a)-f(x_i^p)||_2^2}\" \/> and similarly <img src=\"https:\/\/latex.codecogs.com\/svg.latex?x_i^n\" \/> (hard negative) such that <img src=\"https:\/\/latex.codecogs.com\/svg.latex?argmin_{x_i^n}{||f(x_i^a)-f(x_i^n)||_2^2}\" \/>.\n    <ul>\n      <li>\u27a4 anchor\uac00 \uc8fc\uc5b4\uc84c\uc744 \ub54c hard positive(positive \uc911\uc5d0 \uac00\uc7a5 \uba3c \uac83)\ub791 hard negative(negative \uc911\uc5d0 \uac00\uc7a5 \uac00\uae4c\uc6b4 \uac83)\ub97c \uad6c\ud558\uae38 \uc6d0\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>But, it is infeasible to compute the argmin and argmax across the whole training set.\n    <ul>\n      <li>\u27a4 \ud558\uc9c0\ub9cc \ubaa8\ub4e0 \ud559\uc2b5 \ub370\uc774\ud130\uc5d0 \ub300\ud574\uc11c argmin, argmax\ub97c \ud558\ub294 \uac83\uc740 \ud604\uc2e4\uc801\uc73c\ub85c \ubd88\uac00\ub2a5\ud558\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>we use large mini-batches in the order of a few thousand exemplars and only compute the argmin and argmax within a mini-batch.\n    <ul>\n      <li>\u27a4 \uadf8\ub798\uc11c mini-batch\ub97c \uc0ac\uc6a9\ud574\uc11c \ud558\ub098\uc758 batch\uc5d0 \ud574\ub2f9\ud558\ub294 \uc57d \uc218\ucc9c \uac1c\uc758 \uc0d8\ud50c\uc5d0 \ub300\ud574\uc11c\ub9cc argmin, argmax\ub97c \uacc4\uc0b0\ud558\ub3c4\ub85d \ud558\uc600\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Instead of picking the hardest positive, we use all anchor-positive pairs in a mini-batch while still selecting the hard negatives.\n    <ul>\n      <li>\u27a4 hardest positive\ub97c \uace0\ub974\ub294 \ub300\uc2e0 positive \uc804\uccb4\ub97c \ub2e4 \uc0ac\uc6a9\ud574\uc11c \ud559\uc2b5\ud558\uc600\uace0, \ubc18\uba74 negative\uc758 \uacbd\uc6b0\ub294 \uadf8\ub300\ub85c hard negative\ub97c \uace0\ub974\ub3c4\ub85d \ud558\uc600\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>but we found in practice that the all anchor-positive method was more stable and converged slightly faster at the beginning of training.\n    <ul>\n      <li>\u27a4 \uaf2d hard-positive\uc744 \uc548\uc4f0\uace0 \ubaa8\ub4e0 positive\ub97c \uc37c\uc744 \ub54c \ub354 \uc548\uc815\uc801\uc774\uace0 \ucd08\ubc18\uc5d0 \ub354 \ube60\ub974\uac8c \uc218\ub834\ud558\ub294 \uac83\uc744 \uacbd\ud5d8\uc801\uc73c\ub85c \uc54c \uc218 \uc788\uc5c8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Selecting the hardest negatives can in practice lead to bad local minima early on in training, specifically it can result in a collapsed model (i.e. f(x) = 0).\n    <ul>\n      <li>\u27a4 hardest negative\ub97c \uc120\ud0dd\ud558\ub294 \uac83\uc740 local minima\uc758 \uc704\ud5d8\uc774 \uc788\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>In order to mitigate this, it helps to select <img src=\"https:\/\/latex.codecogs.com\/svg.latex?x_i^n\" \/> such that\n    <ul>\n      <li>\u27a4 \uc774\ub97c \uc644\ud654\uc2dc\ud0a4\uae30\uc704\ud574, \uc544\ub798\uc758 \uc2dd\uc73c\ub85c negative\ub97c \uad6c\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/latex.codecogs.com\/svg.latex?||f(x_i^a)-f(x_i^p)||_2^2&lt;||f(x_i^a)-f(x_i^n)||_2^2\" \/><\/p>\n\n<ul>\n  <li>We call these negative exemplars semi-hard, as they are further away from the anchor than the positive exemplar, but still hard because the squared distance is close to the anchor-positive distance.\n    <ul>\n      <li>\u27a4 \uc774\ub807\uac8c \uad6c\ud55c negative \uc0d8\ud50c\uc744 semi-hard\ub77c\uace0 \ud55c\ub2e4. positive\ubcf4\ub2e4 \uba3c negative\ub97c \uad6c\ud558\ub294 \uac83\uc778\ub370 \uc774 \uacbd\uc6b0 \uc120\ud0dd\ub41c negative\uac00 positive\ub791 \uac00\uae5d\uae30 \ub54c\ubb38\uc5d0 hard\ub85c \ubcfc \uc218 \uc788\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"33-deep-convolutional-networks\">3.3. Deep Convolutional Networks<\/h3>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/35526429-b1cba772-056a-11e8-95c8-837904c3e981.png\" alt=\"convnet\" \/><\/p>\n","pubDate":"Thu, 29 Mar 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/facenet\/","guid":"https:\/\/roomylee.github.io\/facenet\/","category":["face-net","embedding","face-verification","face-recognition","face-clustering","blog"]},{"title":"Distant supervision for relation extraction without labeled data (ACL 2009)","description":"<ul>\n  <li>\n    <p>Paper Link: <a href=\"https:\/\/web.stanford.edu\/~jurafsky\/mintz.pdf\">https:\/\/web.stanford.edu\/~jurafsky\/mintz.pdf<\/a><\/p>\n  <\/li>\n  <li>Author\n    <ul>\n      <li>Mike Mintz (Stanford University)<\/li>\n      <li>Steven Bills (Stanford University)<\/li>\n      <li>Rion Snow (Stanford University)<\/li>\n      <li>Dan Jurafsky (Stanford University)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>ACL 2009<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Relation extraction \ubd84\uc57c\uc5d0\uc11c ACE dataset \uae30\ubc18\uc758 supervised learning\uc740 \uc791\uc740 hand-labeled \ucf54\ud37c\uc2a4\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc6b0\ub9ac\ub294 labeled data\uac00 \ud544\uc694\uc5c6\ub294 \uc0c8\ub85c\uc6b4 \ubc29\ubc95\uc744 \uc81c\uc548\ud568<\/li>\n  <li>\uc6b0\ub9ac\ub294 Freebase(\uacf5\uac1c\ub41c relation\uc744 \ud3ec\ud568\ud558\ub294 semantic database\uc784)\ub97c \uae30\ubc18\uc73c\ub85c \ud558\uc5ec distant supervision\uc774\ub77c\ub294 \ubc29\uc2dd\uc744 \uc81c\uc548\ud568<\/li>\n  <li>\uc6b0\ub9ac\uc758 distant supervision \uc54c\uace0\ub9ac\uc998\uc740 supervised IE(combining 400K noisy pattern feature in a probabilistic classifier)\uc640 unsupervised IE(extracting large numbers of relations from large corpora of any domain)\uc758 \uc7a5\uc810\uc744 \uc870\ud569\ud55c \ud615\ud0dc\uc784<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>3\uac00\uc9c0 learning paradigm\uc774 \uc788\uc74c<\/li>\n<\/ul>\n\n<ol>\n  <li>Supervised approach\n    <ul>\n      <li>supervised approach\ub294 entity\uc640 relation\uc5d0 \ub300\ud55c hand-labeled corpus\uac00 \uc874\uc7ac\ud574\uc57c \ud568<\/li>\n      <li>\ud558\uc9c0\ub9cc labeled training data\ub294 \ub9cc\ub4e4\uae30\uac00 \ub9e4\uc6b0 \uc5b4\ub835\uace0 \ud2b9\uc815 \ub3c4\uba54\uc778 \ucf54\ud37c\uc2a4\ub85c \ud559\uc2b5\ud558\uae30\uc5d0 classifier\uac00 \ud3b8\ud5a5(biased)\ub420 \uc218 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>Unsupervised approach\n    <ul>\n      <li>\ub450 \ubc88\uc9f8, unsupervised approach\ub294 entity \uc0ac\uc774\uc758 \uad00\uacc4 string of word\ub97c \ucd94\ucd9c\ud55c \ub4a4, \uadf8 string of word\uc5d0 \ub300\ud574 \ud074\ub7ec\uc2a4\ud130\ub9c1 \ubc0f simplification(\ucd94\uc0c1\ud654?)\uc758 \uacfc\uc815\uc744 \uac70\uccd0 relation class\ub97c \uc815\uc758\ud558\uace0 \uc774\ub97c \uae30\ubc18\uc73c\ub85c dataset instance\ub97c \uc0dd\uc131\ud558\ub294 \ubc29\uc2dd\uc784<\/li>\n      <li>\ub9e4\uc6b0 \ud070 dataset\uc744 \ub9cc\ub4e4 \uc218 \uc788\uc9c0\ub9cc, \uc704 \uacfc\uc815\uc5d0\uc11c \uc815\uc758\ub41c relation\uc744 \ud2b9\uc815 knowledge base\uc5d0\uc11c \uc694\uad6c\ud558\ub294 relation class\uc5d0 mapping\uc2dc\ud0a4\ub294 \uac83\uc774 \uc27d\uc9c0 \uc54a\ub2e4\ub294 \ub2e8\uc810\uc774 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>Bootstrap approach\n    <ul>\n      <li>\ub9c8\uc9c0\ub9c9 \uc138 \ubc88\uc9f8, bootstrap approach\ub294 \uc791\uc740 \uc218\uc758 seed instance(or pattern)\uc744 \uc0ac\uc6a9\ud558\ub294 \ubc29\ubc95\uc784<\/li>\n      <li>seed\ub85c \ud070 \ucf54\ud37c\uc2a4\uc5d0\uc11c \uc0c8\ub85c\uc6b4 pattern\uc744 \ub9cc\ub4e4\uace0 \uadf8 pattern\uc73c\ub85c \uc0c8\ub85c\uc6b4 instance\ub97c \ucc3e\uace0 \ub610 \uadf8\uac78\ub85c \uc0c8\ub85c\uc6b4 pattern\uc744 \ub9cc\ub4e4\uace0, \uc774\ub97c \ubc18\ubcf5\ud558\ub294 \ubc29\uc2dd\uc778\ub370 precision\uc774 \ub0ae\uace0 semantic drift\uc758 \ub2e8\uc810\uc774 \uc788\uc74c<\/li>\n      <li>\uc6b0\ub9ac\ub294 \uc704\uc758 3\uac00\uc9c0 \ubc29\ubc95\uc758 \uc7a5\uc810\ub4e4\uc774 \uc870\ud569\ub41c distant supervision\uc774\ub77c\ub294 \uc0c8\ub85c\uc6b4 paradigm\uc744 \uc81c\uc2dc\ud558\ub824\uace0 \ud568<\/li>\n      <li>Distant supervision\uc740 Freebase\ub77c\ub294 large semantic database\ub97c \uc0ac\uc6a9\ud568<\/li>\n      <li>Distant supervision\uc758 \ud575\uc2ec \uc544\uc774\ub514\uc5b4 \uc911 \ud558\ub098\ub294 Freebase relation\uc73c\ub85c \uc54c\ub824\uc9c4 entity pair\ub97c \ud3ec\ud568\ud558\ub294 \uc5b4\ub5a4 \ubb38\uc7a5\uc774 \uc788\uc744 \ub54c, \ud574\ub2f9 \ubb38\uc7a5\uc5d0\uc11c entity pair\ub294 Freebase\uc640 \ub3d9\uc77c\ud55c relation\uc744 \uac16\ub294\ub2e4\uace0 \ubcf4\ub294 \uac70\uc784<\/li>\n      <li>labeled text\uac00 \uc544\ub2cc (knowledge) database\ub97c \uae30\ubc18\uc73c\ub85c \ud558\uae30\uc5d0 domain\uc5d0 \ub300\ud55c overfitting \ubb38\uc81c\uc5d0 \ub300\ud574 \ubcf4\ub2e4 \uc790\uc720\ub85c\uc6c0<\/li>\n    <\/ul>\n  <\/li>\n<\/ol>\n\n<h2 id=\"2-previous-work\">2. Previous work<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"3-freebase\">3. Freebase<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"4-architecture\">4. Architecture<\/h2>\n\n<ul>\n  <li>training step\n    <ul>\n      <li>\ubaa8\ub4e0 instance sentence\uc5d0 ner \ucc98\ub9ac\ub97c \ud568<\/li>\n      <li>\ub9cc\uc57d \uc5b4\ub5a4 \ubb38\uc7a5\uc774 \ub450 entity\ub97c \ud3ec\ud568\ud558\uace0 \ub450 entity\uac00 Freebase\uc5d0\uc11c \ud2b9\uc815 relation\uc758 \uad00\uacc4\uac00 \uc788\uc73c\uba74, \ud574\ub2f9 \ubb38\uc7a5\uc73c\ub85c\ubd80\ud130 feature\ub97c \ucd94\ucd9c\ud558\uace0 relation\uc758 feature vector\uc5d0 \uc774\ub97c \ub354\ud568<\/li>\n      <li>\uc5ec\ub7ec \ubb38\uc7a5\uc73c\ub85c\ubd80\ud130\uc758 (relation, entity1, entity2) \ud615\ud0dc\uc758 tuple\uc5d0 \ub300\ud55c feature\uac00 \ubaa8\ub450 \uc870\ud569\ub418\uc5b4 \ub354\uc6b1 \ud48d\ubd80\ud55c feature vector\ub97c \ub9cc\ub4ec<\/li>\n    <\/ul>\n  <\/li>\n  <li>testing step\n    <ul>\n      <li>\ub3d9\uc77c\ud558\uac8c ner \ucc98\ub9ac\ub97c \ud568<\/li>\n      <li>\uc774\ubc88\uc5d0\ub294 \ubaa8\ub4e0 entity pair\uac00 \ub098\ud0c0\ub098\ub294 \ubb38\uc7a5\uc744 \uc7a0\uc7ac\uc801\uc778 relation instance\ub85c \ubd04<\/li>\n      <li>entity pair\uac00 \ubb38\uc7a5\uc5d0 \ub098\uc624\uba74 feature\ub97c \ubf51\uc544\uc11c \uadf8 entity pair\uc758 feature vector\uc5d0 \ub354\ud568<\/li>\n      <li>\uc608\ub97c \ub4e4\uc5b4 \uc5b4\ub5a4 entity pair\uac00 test dataset\uc5d0 \uc788\ub294 10\uac1c\uc758 \ubb38\uc7a5\uc5d0 \ub4f1\uc7a5\ud558\uace0 \uac01 \ubb38\uc7a5\uc740 3\uac1c\uc758 feature\ub97c \ucd94\ucd9c\ud588\ub2e4\uace0 \ud588\uc744 \ub54c, \uadf8 entity pair\ub294 \ucd1d 30\uac1c\uc758 \uc5f0\uad00\ub41c feature\ub97c \uc5bb\uac8c \ub418\ub294 \uac83\uc784<\/li>\n      <li>regression classifier\uac00 (10\uac1c \uc911) \uac01 \ubb38\uc7a5\uc5d0 \ub4f1\uc7a5\ud55c entity pair\uc758 relation\uc744 \uc608\uce21\ud560 \ub54c, 10\uac1c \ubb38\uc7a5 \uc804\uccb4\uc758 feature\ub4e4\uc744 \ubaa8\ub450 \uc0ac\uc6a9\ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li>Example 1\n    <ul>\n      <li>Freebase\uc5d0 \uc788\ub294 <em>location-contains<\/em> relation\uc744 \uc0dd\uac01\ud574\ubcf4\uc790<\/li>\n      <li>\ub610\ud55c, \uc774 \uad00\uacc4\ub97c \uac16\ub294 <em>&lt;Virginia, Richmond&gt;<\/em> \uc640 <em>&lt;France, Nantes&gt;<\/em> pair instance\ub97c \uc0dd\uac01\ud574\ubcf4\uc790<\/li>\n      <li><em>\u2018Richmond, the capital of Virginia\u2019<\/em> \ud639\uc740 <em>\u2018Henry\u2019s Edict of Nantes helped the Protestants of France\u2019<\/em> \uc640 \uac19\uc740 \ubb38\uc7a5\uc774 \uc788\uc744 \ub54c, \uc774 \ubb38\uc7a5\uc73c\ub85c\ubd80\ud130 feature\ub97c \ucd94\ucd9c\ud574\uc57c \ud568<\/li>\n      <li>Richmond sentence(\uccab \ubc88\uc9f8 \ubb38\uc7a5)\ucc98\ub7fc \ub9e4\uc6b0 \uc720\uc6a9\ud55c \ubb38\uc7a5\uc774 \uc788\ub294\uac00\ud558\uba74 Nantes sentence(\ub450 \ubc88\uc9f8)\ucc98\ub7fc \uadf8\ub2e5 \uc4f8\ubaa8 \uc5c6\ub294 \ubb38\uc7a5\ub3c4 \uc788\uc74c<\/li>\n      <li>testing\ud560 \ub54c, <em>\u2018Vienna, the capital of Austria\u2019<\/em> \ub77c\ub294 \ubb38\uc7a5\uc744 \uc6b0\uc5f0\ud788 \ub9cc\ub0ac\ub2e4\uba74, \uc774 \ubb38\uc7a5\uc758 \ud558\ub098 \ud639\uc740 \uadf8 \uc774\uc0c1\uc758 feature\ub294 Richmond sentence\uc758 feature\uc640 \ub9e4\uce6d\ub420 \uac83\uc784<\/li>\n      <li>\uadf8\ub9ac\uace0 <em>&lt;Austria, Vienna&gt;<\/em> \uac00 <em>location-contrains<\/em> relation\uc5d0 \uc18d\ud55c\ub2e4\ub294 \uadfc\uac70\ub97c \uc81c\uacf5\ud560 \uac83\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc6b0\ub9ac\uc758 architecture\uc758 \uac00\uc7a5 \ud070 \uc7a5\uc810 \uc911 \ud558\ub098\ub294 \uac19\uc740 relation\uc744 \uac16\ub294 \uc11c\ub85c \ub2e4\ub978 \ub9ce\uc740 \ubb38\uc7a5\uc73c\ub85c\ubd80\ud130 \uc5bb\uc740 \uc815\ubcf4\ub97c \uc870\ud569\ud560 \uc218 \uc788\ub2e4\ub294 \uac83\uc784<\/li>\n  <li>Example 2\n    <ul>\n      <li><em>&lt;Steven Spielberg, Saving Private Ryan&gt;<\/em> \uc774\ub77c\ub294 entity pair\uac00 \uc788\ub2e4\uace0 \ud558\uc790<\/li>\n      <li>\uc544\ub798\uc758 \ub450 \ubb38\uc7a5\uc740 <em>film-director<\/em> relation\uc5d0 \ub300\ud55c \ubb38\uc7a5\uc784\n        <ul>\n          <li>[Steven Splielberg]\u2019s film [Saving Private Ryan] is loosely based on the brothers\u2019 story.<\/li>\n          <li>Allison co-produced the Academy Award-winning [Saving Private Ryan], directed by [Steven Spielberg] \u2026<\/li>\n        <\/ul>\n      <\/li>\n      <li>\uccab \ubc88\uc9f8 \ubb38\uc7a5\uc740 <em>film-director<\/em> relation\uc5d0 \ub300\ud55c feature\ub85c \uc0ac\uc6a9\ud560 \uc218 \uc788\uc9c0\ub9cc, <em>film-writer<\/em> \ub610\ub294 <em>film-producer<\/em> relation\uc5d0 \ub300\ud55c feature\ub85c\ub3c4 \uc0ac\uc6a9\ud560 \uc218 \uc788\uc74c<\/li>\n      <li>\ub450 \ubc88\uc9f8 \ubb38\uc7a5\ub3c4 <em>CEO<\/em> relation(consider \u2018Robert Mueller directed the FBI\u2019)\uc774\ub77c\uace0\ub3c4 \ubcfc \uc218 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774\ucc98\ub7fc \ub9ce\uc740 \ubb38\uc7a5\ub4e4\uc758 \uc815\ubcf4\ub97c \uc870\ud569\ud558\uba74 \uc0c8\ub85c\uc6b4 \uc758\ubbf8\ub97c \uc774\ub04c\uc5b4\ub0bc \uc218\ub3c4 \uc788\ub2e4\ub294 \uac83\uc784<\/li>\n<\/ul>\n\n<h2 id=\"5-features\">5. Features<\/h2>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36368184-74c54eca-1599-11e8-812b-06513589d786.png\" alt=\"figure\" \/><\/p>\n\n<h3 id=\"5-1-lexical-features\">5-1. Lexical features<\/h3>\n\n<ul>\n  <li>lexical features\ub294 \ub450 entity \uc8fc\ubcc0\uc758 \ub2e8\uc5b4\uc5d0 \ub300\ud55c \uc815\ubcf4\ub97c \uc124\uba85\ud568\n    <ul>\n      <li>\ub450 entity \uc0ac\uc774\uc758 sequence of word<\/li>\n      <li>\uadf8 \ub2e8\uc5b4\ub4e4\uc758 part-of-speech<\/li>\n      <li>entity \uc911 \ub204\uac00 \uba3c\uc800\uc778\uac00(\uc21c\uc11c)<\/li>\n      <li>entity 1\uc758 \uc67c\ucabd k\uac1c \ub2e8\uc5b4\uc640 POS<\/li>\n      <li>entity 2\uc758 \uc624\ub978\ucabd k\uac1c \ub2e8\uc5b4\uc640 POS<\/li>\n      <li>k\ub294 {0, 1, 2} \uc911 \ud558\ub098<\/li>\n    <\/ul>\n  <\/li>\n  <li>Penn Treebank \ud0dc\uadf8 \uae30\ubc18\uc758 maximum entropy tagger\ub97c \uc774\uc6a9\ud574 pos tagging\uc744 \ud568<\/li>\n  <li>Penn Treebank\ub294 \ubb38\uc7a5\uc744 syntactic\/semantic tree structure\ub85c \ubc14\uafb8\ub294 \ud504\ub85c\uc81d\ud2b8\uc774\uace0 \uac70\uae30\uc11c \uc0ac\uc6a9\ud558\ub294 pos \ud0dc\uadf8\ub97c \uc774\uc6a9\ud55c\ub2e4\ub294 \uac70\uc784<\/li>\n<\/ul>\n\n<h3 id=\"5-2-syntactic-features\">5-2. Syntactic features<\/h3>\n\n<ul>\n  <li>dependency parse tree\ub97c \uc774\uc6a9\ud574\uc11c \ub450 entity \uc0ac\uc774\uc758 dependency path\uc758 tag\ub97c \uc774\uc6a9\ud568<\/li>\n  <li>\uc88c\uc6b0 window k \ub2e8\uc5b4\uc5d0 \ub300\ud574\uc11c\ub3c4 \uc720\uc0ac\ud55c \ubc29\uc2dd\uc73c\ub85c dependency\ub97c feature\ub85c \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc704\uc758 Figure 1 \ucc38\uace0<\/li>\n<\/ul>\n\n<h3 id=\"5-3-named-entity-tag-features\">5-3. Named entity tag features<\/h3>\n\n<ul>\n  <li>\ub450 entity\uc758 Named entity tag\ub97c feature\ub85c \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h3 id=\"5-4-feature-conjunction\">5-4. Feature conjunction<\/h3>\n\n<ul>\n  <li>\uc704\uc758 feature\ub4e4\uc744 \uacb0\ud569\uc2dc\ucf1c\uc11c \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc704\uc758 Table 3\uc744 \ucc38\uace0\ud558\uba74 \uc774\ud574\ud558\uae30 \uc26c\uc6c0<\/li>\n<\/ul>\n\n<h2 id=\"6-implementation\">6. Implementation<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"7-evaluation\">7. Evaluation<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"8-discussion\">8. Discussion<\/h2>\n\n<ul>\n  <li>Distant supervision \uc54c\uace0\ub9ac\uc998\uc740 \ub9ce\uc740 \uc218\uc758 relation\uc5d0 \ub300\ud574 \ub192\uc740 precision\uc73c\ub85c pattern\uc744 \ucd94\ucd9c\ud560 \uc218 \uc788\uc74c<\/li>\n  <li>6, 7 section\uc758 \uc2e4\ud5d8\uacfc \uacb0\uacfc\uc5d0 \ub300\ud55c discussion\uc740 \uc0dd\ub7b5<\/li>\n  <li>syntactic feature\uac00 distant supervision\uc5d0\uc11c\ub294 \ud655\uc2e4\ud788 \uc720\uc6a9\ud568<\/li>\n  <li>Future work\ub97c \uc544\ub798\uc640 \uac19\uc774 \uc81c\uc2dc\ud568\n    <ul>\n      <li>chunk-based syntactic feature\uac00 full parsing\uc758 \uc624\ubc84\ud5e4\ub4dc\ub97c \uc904\uc77c \uc218 \uc788\uc744 \uac83<\/li>\n      <li>coreference resolution\uc774 \uc131\ub2a5\ud5a5\uc0c1\uc744 \uac00\uc838\uc62c \uac83<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n","pubDate":"Thu, 29 Mar 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/distant-supervision-relation-extraction\/","guid":"https:\/\/roomylee.github.io\/distant-supervision-relation-extraction\/","category":["distant-supervision","relation-extraction","blog"]},{"title":"Literature mining of host\u2013pathogen interactions: comparing feature-based supervised learning and language-based approaches (Bioinformatics 2012)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22285561\">https:\/\/www.ncbi.nlm.nih.gov\/pubmed\/22285561<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Thanh Thieu (University of Missouri)<\/li>\n      <li>Sneha Joshi (University of Missouri)<\/li>\n      <li>Samantha Warren (University of Missouri)<\/li>\n      <li>Dmitry Korkin (University of Missouri)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>Bioinformatics 2012<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<h3 id=\"motivation\">Motivation<\/h3>\n\n<ul>\n  <li>\uc804\uc5fc\ubcd1(infectious disease)\uc5d0\uc11c \uc219\uc8fc(host)\uc640 \ubcd1\uc6d0\uade0(pathogen) \uac04\uc758 \uad00\uacc4(host-pathogen interations; HPIs)\uac00 \uc0c1\ub2f9\ud788 \uc911\uc694\ud558\ub2e4.<\/li>\n  <li>\ud0c0\ucf13\ud558\ub294 \uc9c8\ubcd1 \ub610\ub294 \uc219\uc8fc \uc720\uae30\uccb4(host organism)\uc5d0 \ub300\ud55c \ub2e4\uc591\ud55c \ub370\uc774\ud130 \ubca0\uc774\uc2a4\uac00 \uc874\uc7ac\ud558\uace0 HPI\ub294 \uc5ec\ub7ec \ub370\uc774\ud130 \ubca0\uc774\uc2a4\uc5d0 \uac78\uccd0\uc11c \ub098\ud0c0\ub09c\ub2e4.<\/li>\n  <li>Biomedical literature \ub85c\ubd80\ud130 \uc790\ub3d9\uc73c\ub85c HPI\ub97c \ucd94\ucd9c\ud558\ub294 \ubc29\ubc95\uc740 \uc774\ub7f0 \ub370\uc774\ud130 \ubca0\uc774\uc2a4(repository)\ub97c \ub9cc\ub4dc\ub294\ub370 \ub9e4\uc6b0 \uc911\uc694\ud558\ub2e4.<\/li>\n<\/ul>\n\n<h3 id=\"results\">Results<\/h3>\n\n<ul>\n  <li>\uc6b0\ub9ac\ub294 2\uac00\uc9c0 \uc0c8\ub85c\uc6b4 approach\ub97c \uc81c\uc548\n    <ol>\n      <li>PubMed\uc5d0 \uc788\ub294 title \ub610\ub294 abstract\uac00 HPI data\ub97c \ud3ec\ud568\ud558\ub294\uc9c0 \uc5ec\ubd80\ub97c \ucc3e\uc544\ub0b4\ub294 \uac83<\/li>\n      <li>\uc720\uae30\uccb4\uc640 \ub2e8\ubc31\uc9c8 \uac04\uc758 \uc0c1\ud638\uc791\uc6a9 \uc815\ubcf4\ub97c \ucd94\ucd9c\ud574\ub0b4\ub294 \uac83<\/li>\n    <\/ol>\n  <\/li>\n  <li>\uccab \ubc88\uc9f8 approach\ub294 SVM\uc744 \uc774\uc6a9\ud55c feature-based supervised learning \ubc29\ubc95\uc774\ub2e4.<\/li>\n  <li>\uac01 \ubb38\uc7a5\uc73c\ub85c\ubd80\ud130 host\/pathogen organism\uc758 \uc774\ub984, \ub2e8\ubc31\uc9c8(protein), \uc720\uc804\uc790(gene), HPI\ub97c \ub098\ud0c0\ub0b4\ub294 \ud0a4\uc6cc\ub4dc, protein-protein interaction (PPI) \ub4f1\uc744 feature\ub97c \ucd94\ucd9c\ud558\uc5ec SVM\uc744 \ud559\uc2b5\uc2dc\ud0a8\ub2e4.<\/li>\n  <li>\ub450 \ubc88\uc9f8\ub294 language-based \ubc29\ubc95\uc778\ub370, link grammar parser\uc640 training example\ub85c \ubd80\ud130 \uc0dd\uc131\ud55c semantic pattern\uc744 \uc870\ud569\ud55c\ub2e4.<\/li>\n  <li>\uc9c1\uc811 \ub9cc\ub4e0 HPI data\ub97c \uae30\ubc18\uc73c\ub85c train\uacfc test\ub97c \uc9c4\ud589\ud558\uc600\ub2e4.<\/li>\n  <li>Classification task\uc5d0 \ub300\ud574 \uae30\uc874 PPI\uc5d0 \ub300\ud55c approach\ubcf4\ub2e4 accuracy\uc640 recall \uba74\uc5d0\uc11c \uc88b\uc558\ub2e4.<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>HPI\ub97c \ucd94\ucd9c\ud558\uae30 \uc704\ud55c \ub2e4\uc591\ud55c approach\uac00 \uc81c\uc548\ub418\uc5c8\uc73c\ub098, \uc544\uc9c1 biomedical literature\ub85c\ubd80\ud130 molecular HPI data\ub97c \ucd94\ucd9c\ud558\uae30 \uc704\ud55c \uc644\uc804 \uc790\ub3d9 \uc2dc\uc2a4\ud15c\uc740 \ub9cc\ub4e4\uc5b4\uc9c4 \ubc14\uac00 \uc5c6\ub2e4.<\/li>\n  <li>Biomolecular \uc815\ubcf4\uc5d0 \ub300\ud55c \ubc29\ubc95\uc740 \uc544\ub798\uc758 3\uac00\uc9c0\ub85c \ud06c\uac8c \ub098\ub20c \uc218 \uc788\ub2e4.\n    <ol>\n      <li>text\uc5d0\uc11c protein or gene identification\ud558\uae30<\/li>\n      <li>protein\uacfc gene\uc744 literature \uae30\ubc18 \ud568\uc218\uc801\uc73c\ub85c annotation\ud558\uae30<\/li>\n      <li>biological molecule \uac04\uc758 \uad00\uacc4 \uc815\ubcf4 \ucd94\ucd9c\ud558\uae30 (ex. protein-RNA or gene)<\/li>\n    <\/ol>\n  <\/li>\n  <li>\uc138 \ubc88\uc9f8 \ubc29\ubc95\uc73c\ub85c \ucc3e\uc740 \uad00\uacc4\ub294 gene\uacfc protein\uc774 text\uc5d0\uc11c \ub3d9\uc2dc\uc5d0 \ub098\ud0c0\ub098\ub294 \uacbd\uc6b0\ubd80\ud130 PPI \uac80\ucd9c \ubc0f \uc2e0\ud638\uc804\ub2ec \ub124\ud2b8\uc6cc\ud06c\uc640 \ub300\uc0ac \uacbd\ub85c \ud655\uc778\uae4c\uc9c0 \uc774\ub978\ub2e4.<\/li>\n  <li>HPI\uc5d0 \ube44\ud574 PPI\uac00 \ubcf4\ub2e4 \ub9ce\uc740 \uc5f0\uad6c\uac00 \uc774\ub8e8\uc5b4\uc838 \uc788\ub2e4.<\/li>\n  <li>PPI\ub97c \ucc3e\uae30 \uc704\ud55c \uac00\uc7a5 \ubca0\uc774\uc9c1\ud55c \ubc29\ubc95\uc740 protein\uc774\ub098 gene\uc774 \uac19\uc740 \ubb38\uc7a5\uc5d0 \ub3d9\uc2dc\uc5d0 \ub098\ud0c0\ub098\uba74 PPI\uac00 \uc788\ub2e4\uace0 \ubcf4\ub294 \uac83\uc774\ub2e4.<\/li>\n  <li>\ubcf4\ub2e4 \ubc1c\uc804\ub41c \ubc29\ubc95\uc740 \uc758\ubbf8\uc801 \uad6c\uc870\ub97c \uc7a1\uc544\ub0b4\uae30 \uc704\ud55c pattern matching \uae30\ubc95\uc774\ub2e4.<\/li>\n  <li>\uc774\ub7f0 \ud328\ud134\uc740 \uc218\ub3d9\uc73c\ub85c \ub9cc\ub4e4\uac70\ub098 dynamic programming \ub4f1\uc744 \uc774\uc6a9\ud574\uc11c \uc790\ub3d9\uc73c\ub85c \ub9cc\ub4e4\uc5b4\ub0b8\ub2e4.<\/li>\n  <li>\ud328\ud134 \ub9d0\uace0 \ub610\ub2e4\ub978 \ubc29\ubc95\uc740 \ubc14\ub85c feature \uae30\ubc18\uc758 \uba38\uc2e0\ub7ec\ub2dd \uae30\ubc95\uc774\ub2e4.<\/li>\n  <li>link grammar, context-free grammar \ub4f1 dictionary rule \uae30\ubc18\uc758 feature\ub97c \uc0ac\uc6a9\ud558\uc5ec \uad00\ub828\ub41c \ub2e8\uc5b4 pair\ub97c \ucc3e\uc544\ub0b4\ub294 \ubc29\uc2dd\uc774 SOTA\uc774\ub2e4.<\/li>\n  <li>\ucd5c\uadfc PPI mining\uc744 \uc704\ud55c subtask\ub294 \ub2e4\uc74c\uacfc \uac19\ub2e4.\n    <ol>\n      <li>PPI \uad00\ub828 \ubb38\uc11c\uc758 classification<\/li>\n      <li>PPI\uac00 \uc788\ub294 \ubb38\uc7a5\uc744 identification<\/li>\n      <li>interaction\uc774 \uc788\ub294 protein pair\ub97c identification<\/li>\n    <\/ol>\n  <\/li>\n  <li>\uc704 subtask\uc5d0 \ub300\ud574 \uba38\uc2e0\ub7ec\ub2dd \uae30\ubc18\uc758 \ubc29\ubc95\ub4e4\uc744 \uc2dc\ub3c4\ud574\ubcf4\uace0 \uc788\ub2e4.<\/li>\n  <li>\uc6b0\ub9ac\ub294 PubMed publication\uc758 title\uacfc abstract\ub85c\ubd80\ud130 HPI text mining\uc744 \uc704\ud55c 2\uac00\uc9c0 approach\ub97c \uc81c\uc548\ud55c\ub2e4.<\/li>\n  <li>\uccab \ubc88\uc9f8\ub294 SVM \uae30\ubc18\uc758 supervised learning feature-based approach<\/li>\n  <li>\ub450 \ubc88\uc9f8\ub294 link grammar\ub97c \uc774\uc6a9\ud55c language-based approach<\/li>\n<\/ul>\n\n<h2 id=\"2-methods\">2. Methods<\/h2>\n\n<ul>\n  <li>\n    <p>\uc6b0\ub9ac\ub294 \uc544\ub798\uc758 3\uac00\uc9c0 subtask\ub97c \uc815\uc758\ud588\ub2e4.<\/p>\n\n    <ol>\n      <li>Task 1: biomedical publication\uc758 title\uacfc abstract\uc774 \ud569\uccd0\uc9c4 expanded abstract\uc774 \uc8fc\uc5b4\uc9c0\uba74, <em>HPI-relevant<\/em> \uc778\uc9c0 \uacb0\uc815. <em>HPI-relevant<\/em> \ub780 host\uacfc pathogen\uc758 interaction\uacfc \uadf8 \uc774\ub984\uacfc \uac19\uc740 HPI \uc815\ubcf4\uac00 \ub098\ud0c0\ub098\ub294\uc9c0\ub97c \uc758\ubbf8\ud55c\ub2e4.<\/li>\n      <li>HPI \uc815\ubcf4\ub97c \ud3ec\ud568\ud558\ub294 expanded abstract, \uc989 <em>HPI-relevant<\/em> \uc778 text\uc5d0 \ub300\ud574\uc11c \uc815\ud655\ud788 \uc5b4\ub5a4 \ubb38\uc7a5\uc5d0 HPI \uc815\ubcf4\uac00 \ub098\ud0c0\ub098\ub294\uc9c0\ub97c \uacb0\uc815\ud558\ub294 \uac83.<\/li>\n      <li>interaction\uc5d0 \uad00\uc5ec\ud558\ub294 \ud2b9\uc815 host\uc640 pathogen pair\ub97c <em>HPI-relevant<\/em> abstract\uc5d0\uc11c \ucc3e\uc544\ub0b4\ub294 \uac83.<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<h3 id=\"21-feature-based-approach\">2.1 Feature-based approach<\/h3>\n\n<ul>\n  <li>feature-based approach\ub294 5\uac00\uc9c0 \ub2e8\uacc4\ub85c \uc774\ub8e8\uc5b4\uc9c4\ub2e4.\n    <ol>\n      <li>abstract\uc5d0\uc11c protein\/gene\uacfc (host) organism\uc744 \ucc3e\uc544\uc11c tagging\ud55c\ub2e4. (NER)<\/li>\n      <li>abstract\uc73c\ub85c\ubd80\ud130 feature vector\ub97c \uc0dd\uc131\ud574\ub0b8\ub2e4.<\/li>\n      <li>\ud574\ub2f9 abstract\uc774 <em>HPI-relevant<\/em> \ud55c\uc9c0 \uc218\ub3d9\uc73c\ub85c labeling\ud558\uace0, feature vector\ub97c input\uc73c\ub85c \ud574\ub2f9 label\uc744 output\uc73c\ub85c \ub450\uc5b4\uc11c \uc774 abstract\uc758 <em>HPI-relevant<\/em> \uc5ec\ubd80\ub97c \ubd84\ub958\ud558\ub3c4\ub85d supervised learning\uc744 \ud55c\ub2e4.<\/li>\n      <li>\ub9cc\uc57d <em>HPI-relevant<\/em> \ud558\ub2e4\uba74, (1)abstract\uc5d0\uc11c \uc5b4\ub5a4 \ubb38\uc7a5\uc774 HPI \uc815\ubcf4\ub97c \uac00\uc9c0\uace0 \uc788\ub294\uc9c0\ub97c \ucc3e\uace0 (2)\uc774 \uc815\ubcf4\uac00 \uc5bc\ub9c8\ub098 \ud655\uc2e4\ud55c\uc9c0 \uc54c\ub824\uc8fc\uace0 (3)\uadf8 \ubb38\uc7a5\uc5d0\uc11c protein\/gene\uacfc (host) organism\uc758 \uc815\ubcf4\ub97c \ucd94\ucd9c\ud574\uc900\ub2e4.<\/li>\n      <li>\ud559\uc2b5\uc744 \ub9c8\uce5c \uc2dc\uc2a4\ud15c\uc5d0 \ub300\ud574\uc11c testset\uc73c\ub85c \ud3c9\uac00\ud55c\ub2e4.<\/li>\n    <\/ol>\n  <\/li>\n  <li>Text preprocessing\n    <ul>\n      <li>\uba3c\uc800 \ud55c\uc904\uc529 \uc798\ub790\uc74c<\/li>\n      <li>\u2018i.e.\u2019, \u2018e.g.\u2019, \u2018vs.\u2019 \uac19\uc740 \uc57d\uc5b4\uc758 period(.)\ub294 \ub2e4 \uacf5\ubc31\uc73c\ub85c \ubc14\uafd4\uc900 \ub4a4 period\ub97c \uae30\uc900\uc73c\ub85c \uc798\ub790\ub2e4.<\/li>\n      <li>Entity tagging\uc744 \uc704\ud574\uc11c NLProt\ub77c\ub294 NER tool\uc744 \uc0ac\uc6a9\ud588\ub2e4. \uc774 tool\uc5d0 \ub300\ud55c \uc124\uba85\uc740 \uc0dd\ub7b5<\/li>\n    <\/ul>\n  <\/li>\n  <li>Support vector machines\n    <ul>\n      <li>\uc774 \ubb38\uc81c\ub294 \uac04\ub2e8\ud788 \uc0dd\uac01\ud574\ubcf4\uba74 abstract\uc744 N\uac1c\uc758 feature vector\ub85c \ubcc0\ud658\uc2dc\ud0a4\uace0 \uc774\ub97c input\uc73c\ub85c \ud558\uc5ec HPI \uc815\ubcf4\uac00 \uc788\ub294\uc9c0 \uc5ec\ubd80\ub97c binary classification (y={-1,1}) \ud558\ub294 \uac83\uc774\ub2e4.<\/li>\n      <li>\uc6b0\ub9ac\ub294 \uc774\ub97c \uc704\ud574 supervised learning model \uc911 \ud558\ub098\uc778 SVM\uc744 \uc0ac\uc6a9\ud588\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Feature vectors\n    <ul>\n      <li>\uac01 abstract\uc740 12 \ucc28\uc6d0\uc758 feature vector\ub85c \ubcc0\ud658\ub41c\ub2e4.<\/li>\n      <li>\ub300\ubd80\ubd84\uc758 feature\ub294 HPI topic\uacfc \uad00\ub828\ub41c keyword\ub97c \ub2ec\ub824\uc788\ub2e4.<\/li>\n      <li>feature x1\uacfc x2\ub294 abstract\uc5d0 host\uc640 pathogen\uc774 \uc874\uc7ac\ud558\ub294\uc9c0\ub97c \ub098\ud0c0\ub0b8\ub2e4. \uc774 feature\ub4e4\uc740 NER \uacb0\uacfc\uc5d0 \uae30\ubc18\ud558\uc5ec \uacb0\uc815\ub41c\ub2e4.<\/li>\n      <li>feature x3\uacfc x4\ub294 \ud0dc\uae45\ub41c host\uc640 pathogen\uc774 \ub098\ud0c0\ub0b8 \ud69f\uc218\ub97c \uc758\ubbf8\ud55c\ub2e4.<\/li>\n      <li>feature x5\ub294 binary feature\uc774\uba70, PPI keyword\uc758 \uc874\uc7ac \uc5ec\ubd80\ub97c \ub098\ud0c0\ub0b8\ub2e4.<\/li>\n      <li>feature x6\uacfc x7\uc740 PPI(HPI\uc544\ub2cc\uac00..?) keyword\uc758 \uac1c\uc218\uc5d0 \ub300\ud55c \ud1b5\uacc4 \uc815\ubcf4\ub97c \ub098\ub098\ub0b4\ub294\ub370, \uac01\uac01 \uc804\uc81c \ub2e8\uc5b4 \uc911 interaction keyword \uc218\uc5d0 \ub300\ud55c \ubc31\ubd84\uc728, \uc804\uccb4 \ubb38\uc7a5 \uc218 \uc911 interaction keyword\ub97c \ud3ec\ud568\ud558\ub294 \ubb38\uc7a5\uc758 \uc218\ub97c \ub098\ub098\ub0b8\ub2e4. interaction keyword\ub294 \uc0ac\uc804\uc73c\ub85c \uc815\uc758\ub418\uc5b4 \uc788\ub2e4.<\/li>\n      <li>feature x8\uc740 typicality of each keyword\ub97c \ub098\ud0c0\ub0b4\ub294\ub370, typicality of keyword\ub780 \ud574\ub2f9 keyword\ub97c \ud3ec\ud568\ud55c abstract\uc758 \uac1c\uc218\ub97c \uc758\ubbf8\ud55c\ub2e4.<\/li>\n      <li>feature x9\ub294 abstract\uc5d0\uc11c experimental keyword\uc758 \uac1c\uc218\ub97c \uc758\ubbf8\ud55c\ub2e4. experimental keyword\ub294 \uc0ac\uc804\uc73c\ub85c \uc815\uc758\ub418\uc5b4 \uc788\ub2e4.<\/li>\n      <li>feature x10\uc740 abstract\uc5d0\uc11c \uc804\uccb4 \ub2e8\uc5b4 \uc218 \uc911 negative keyword\uc758 \uac1c\uc218\uc5d0 \ub300\ud55c \ubc31\ubd84\uc728\uc73c\ub85c \uc815\uc758\ud55c\ub2e4.<\/li>\n      <li>feature x11\uc740 abstract\uc5d0\uc11c negative keyword\uac00 HPI \uc815\ubcf4\uc5d0 \uc4f0\uc600\ub294\uc9c0\uc5d0 \ub300\ud55c \uc5ec\ubd80\ub97c \ub098\ud0c0\ub0b8\ub2e4. \uc774\ub294 \ud55c \ubb38\uc7a5\uc5d0\uc11c interaction keyword\uc640 negation keyword \uc0ac\uc774\uc758 \ub2e8\uc5b4 \uc218\ub85c \uc815\uc758\ud55c\ub2e4.<\/li>\n      <li>feature x12\ub294 HPI-specific keyword\uc5d0 \ub300\ud55c \uac83\uc774\uace0, \uc774\ub294 \uc804\uccb4 abstract\uc5d0 \ub098\ud0c0\ub09c \ub2e8\uc5b4\uc758 \uac1c\uc218\uc5d0 \ub300\ud55c \ud574\ub2f9 keyword\uc758 \ube44\uc728(\ubc31\ubd84\uc728)\ub85c \uc815\uc758\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Supervised training and classification using SVM\n    <ul>\n      <li>SVM\uc744 \uc774\uc6a9\ud574 abstract\uc774 <em>HPI-relevant<\/em> \uc778\uc9c0 \uc544\ub2cc\uc9c0\ub97c \ud559\uc2b5\uc2dc\ud0a8\ub2e4.<\/li>\n      <li>\ud559\uc2b5\ub41c SVM\uc744 \ub450 \ubc88\uc5d0 \uac78\uccd0\uc11c \uc0ac\uc6a9\ud558\uac8c \ub41c\ub2e4.<\/li>\n      <li>\uccab \ubc88\uc9f8\ub294 abstract\uc774 <em>HPI-relevant<\/em> \ud55c \uc9c0 \uc544\ub2cc\uc9c0\ub97c \ud310\ubcc4\ud558\ub294\ub370 \uc0ac\uc6a9\ub418\uace0, \ub450 \ubc88\uc9f8\ub294 \ub9cc\uc57d relevant\ud55c abstract\uc5d0 \ub300\ud574\uc11c \uc5b4\ub5a4 \ubb38\uc7a5\uc774 <em>HPI-relevant<\/em> data\ub97c \uac00\uc7a5 \ub9ce\uc774 \ud3ec\ud568\ud558\uace0 \uc788\uc744 \uc9c0 \uacb0\uc815\ud558\ub294\ub370 \uc0ac\uc6a9\ub41c\ub2e4.<\/li>\n      <li>\ub450 \ubc88\uc9f8\uc758 \uacbd\uc6b0, \uac01 \ubb38\uc7a5 \ubcc4\ub85c feature vector\ub97c \uc0dd\uc131\ud574\ub0b4\uace0 \uc774\ub97c SVM\uc758 input\uc73c\ub85c \ub123\ub294\ub2e4.<\/li>\n      <li>SVM\uc758 accuracy\ub294 \ubcf4\ud1b5 \ucd5c\uc801\ud654\ud560 \uc218 \uc788\ub294 parameter\uc758 \uc218\uc5d0 \ub2ec\ub824\uc788\ub2e4.<\/li>\n      <li>\uc6b0\ub9ac\ub294 \ub450\uac1c\uc758 parameter C\uc640 gamma\ub97c \uc0ac\uc6a9\ud558\uc600\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Handling information uncertainty\n    <ul>\n      <li>SVM\uc73c\ub85c \ubd84\ub958\ub41c \ud558\ub098\uc758 abstract\uc5d0\uc11c \ub098\uc628 HPI data\ub97c \uac16\uace0 \uc788\ub294 \ubb38\uc7a5 \uc911, \ud55c \ubb38\uc7a5\uc774\ub77c\ub3c4 interaction keyword \uc55e\uc5d0 uncertainty keyword\uac00 \ub098\ud0c0\ub098\ub294 \uacbd\uc6b0 uncertainty\ud558\ub2e4\uace0 \ud310\ubcc4\ud558\uace0 \uacb0\uacfc\uc5d0\uc11c \uc81c\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"22-language-based-approach\">2.2 Language-based approach<\/h3>\n\n<ul>\n  <li>\ub450 \ubc88\uc9f8 approach\ub294 language-based formalism\uc5d0 \ub300\ud55c \uac83\uc778\ub370 \uad6c\uccb4\uc801\uc73c\ub85c\ub294 link grammar \ub77c\ub294 \uac83\uc744 \uc0ac\uc6a9\ud55c\ub2e4.<\/li>\n  <li>\uc6b0\ub9ac\uc758 approach\ub294 PPI\ub97c \uc815\ubcf4\ub97c \ucd94\ucd9c\ud558\ub294 language-based system\uacfc \uc720\uc0ac\ud558\uc9c0\ub9cc pipeline\uc5d0 \uc0c8\ub85c\uc6b4 \ubaa8\ub4c8\uc744 \ucd94\uac00\ud574\uc8fc\uc5b4\uc57c \ud55c\ub2e4.<\/li>\n  <li>Method organization\n    <ul>\n      <li>HPI mining pipeline\uc740 8\uac00\uc9c0 \uc2a4\ud15d\uc73c\ub85c \uad6c\uc131\ub41c\ub2e4.\n        <ol>\n          <li>text preprocessing<\/li>\n          <li>entity(host, pathogen) tagging<\/li>\n          <li>grammar parsing (dependency structure)<\/li>\n          <li>anaphora resolution (\ub300\uba85\uc0ac \ucc98\ub9ac)<\/li>\n          <li>syntactic extraction (\ubcf5\uc7a1\ud55c \ubb38\uc7a5\uc744 \uac04\ub2e8\ud55c \ubb38\uc7a5\uc73c\ub85c \ucabc\uac2c)<\/li>\n          <li>role matching (semantic role\uc5d0 \ub300\ud55c \uacb0\uc815)<\/li>\n          <li>interaction keyword tagging<\/li>\n          <li>HPI information \ucd94\ucd9c<\/li>\n        <\/ol>\n      <\/li>\n      <li>feature based approach\uc640 \ub2e4\ub974\uac8c, language-based approach\ub294 Task 2\uc640 3 (HPI \ud3ec\ud568 \ubb38\uc7a5 \ucc3e\uae30, \ubb38\uc7a5\uc5d0\uc11c host\uc640 pathogen pair \ubc0f interacting protein\/gene \ucc3e\uae30)\uc5d0\uc11c \ubcf4\ub2e4 \uc9c1\uc811\uc801\uc73c\ub85c \ub2e4\ub8ec\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Entity tagging\n    <ul>\n      <li>feature-based approach\ubcf4\ub2e4 \uc138\uc138\ud558\uac8c entity tagging\uc744 \uc9c4\ud589\ud568<\/li>\n      <li>\uc544\ub798\uc758 3\ub2e8\uacc4\ub97c \uac70\uce68\n        <ol>\n          <li>NLProt\ub97c \uc774\uc6a9\ud55c pretein\/gene tagging (feature-based approach\uc5d0\uc11c \uc774\uc6a9\ud55c \ubc29\ubc95)<\/li>\n          <li>host\/pathogen organism dictionary-based matching<\/li>\n          <li>post-precessing<\/li>\n        <\/ol>\n      <\/li>\n      <li>NLProt\ub97c \uc801\uc6a9\uc2dc\ud0a4\uace0 UniProt\ub97c \uc774\uc6a9\ud574\uc11c synonym\uc5d0 \ub300\ud55c \uadf8\ub8f9\ud551 \ucc98\ub9ac\ub97c \ud55c\ub2e4. \uc774\uac8c 1, 2\ub2e8\uacc4<\/li>\n      <li>synonym\uc740 NCBI Taxonom IDs\ub97c \uc774\uc6a9\ud574\uc11c \ucc98\ub9ac\ud558\uc600\ub2e4.<\/li>\n      <li>post-processing \uc2a4\ud15d\uc5d0\uc11c\ub294 mutual context \uc815\ubcf4\ub97c detection accuracy \ud5a5\uc0c1\uc744 \uc704\ud574 \uc0ac\uc6a9\ud588\ub2e4.<\/li>\n      <li>\uc6b0\ub9ac \uc2dc\uc2a4\ud15c\uc740 (1)\uc0ac\uc804\uc5d0 \uc5c6\ub294 \ucd94\uac00\uc801\uc778 host\/pathogen \uc815\ubcf4\ub97c \ucc3e\uace0 (2)protein\/gene\uc744 \uc62c\ubc14\ub978 organism\uc5d0 \ub2e4\uc2dc \ud560\ub2f9\ud558\uae30 \uc704\ud574\uc11c link grammar\uac00 \uc81c\uacf5\ud558\ub294 phrase structure\ub97c \uc0ac\uc6a9\ud588\ub2e4.<\/li>\n      <li>\uadf8\ub9ac\uace0 \ud574\ub2f9 structure\uc5d0 \ub300\ud574\uc11c \uc544\ub798\uc640 \uac19\uc740 \ud328\ud134\uc744 \uc774\uc6a9\ud588\ub2e4.\n        <ol>\n          <li>Organism name + protein name (e.g. \u2018Arabidopsis RIN4 protein\u2019)<\/li>\n          <li>Protein name + preposition + organism name (e.g. \u2018RXLX of human\u2019)<\/li>\n        <\/ol>\n      <\/li>\n      <li>\uc608\ub97c \ub4e4\uc5b4, \u2018Arabidopsis RIN4 protein\u2019 \uac19\uc740 \uacbd\uc6b0, NLProt\ub97c \uc0ac\uc6a9\ud558\uba74 RIN4\ub97c pathogenic organism\uc774\ub77c\uace0 \ud560\ud150\ub370, dictionary matching\uc744 \ud558\uba74 host organism\uc774\ub77c\uace0 \ud560 \uac83\uc774\uace0 \uc774\uc5d0 \ub300\ud574 post-processing\uc744 \ud558\uba74 \ud574\ub2f9 \uad6c(phrase)\ub97c pattern 1\ub85c \ucc98\ub9ac\ud558\uc5ec host protein\uc778 Arabidopsis\ub97c RIN4\ub97c \ud3ec\ud568\ud558\ub294 organism\uc73c\ub85c \ubcfc \uac83\uc774\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Link grammar parsing\n    <ul>\n      <li>link grammar\ub294 dependency\ub97c \ub0b4\ud3ec\ud558\ub294 context-free grammar\uc774\ub2e4.<\/li>\n      <li>\ub8f0 \uae30\ubc18\uc73c\ub85c \uad00\ub828\ub41c \ub2e8\uc5b4 pair\ub4e4\uc744 link \uac78\uc5b4\uc900\ub2e4.<\/li>\n      <li>open source\uc778 link grammar parser\ub97c \uc0ac\uc6a9\ud588\ub2e4.<\/li>\n      <li>biomedical\uc5d0 costomize\ub41c BioLG \ub77c\ub294 \uac83\uacfc English-language semantic dependency replationship extractor\uc778 RelEx \ub77c\ub294 \uac83\ub3c4 \ucd94\uac00 feature\ub85c \uc0ac\uc6a9\ud558\uc600\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>A three-layer entity framework (BioLG?)\n    <ul>\n      <li>three-layer entity framework\uc740 entity tagging module\uc744 \uc704\ud574\uc11c \ub9cc\ub4e4\uc5b4\uc84c\ub2e4. Fig 3 \ucc38\uace0<\/li>\n      <li>\uac00\uc7a5 \uc544\ub798\uce35\uc5d0\ub294 UniProt\ub098 NCBI Taxonomy ID\uc5d0 \uc758\ud574\uc11c \uc815\uc758\ub41c real entity \uc9d1\ud569\uc73c\ub85c \uc774\ub8e8\uc5b4\uc838\uc788\ub2e4.<\/li>\n      <li>\uc911\uac04\uce35\uc740 abstract\uc5d0 \uc788\ub294 \ubaa8\ub4e0 \ubb38\uc7a5\ub4e4\uc758 \uc9d1\ud569\uc73c\ub85c \uc774\ub8e8\uc5b4\uc838\uc788\ub2e4.<\/li>\n      <li>\uc774 \uc911\uac04\uce35\uc5d0\uc11c \uac01 textual entity\ub294 \uc720\uc77c\ud55c real entity\uc5d0 \ub9e4\ud551\ub41c\ub2e4.<\/li>\n      <li>\uac00\uc7a5 \uc704\uce35\uc740 \uac01 \ubb38\uc7a5\ub4e4\ub85c\ubd80\ud130 \uc120\ud0dd\ub41c \uac00\uc7a5 \uc88b\uc740 link grammar parse\ub85c \uad6c\uc131\ub41c\ub2e4.<\/li>\n      <li>\ud558\ub098\uc758 \ubb38\uc7a5\uc5d0\uc11c \ub2e4\uc218\uc758 link grammar parse\uac00 \ub098\uc62c \uc218 \uc788\uc73c\uba70, \ud558\ub098 \ud639\uc740 \uadf8 \uc774\uc0c1\uc758 link grammar node\uac00 \ud558\ub098\uc758 textual entity\uc5d0 \uc5f0\uacb0\ub41c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Anaphora resolution (RelEx?)\n    <ul>\n      <li>\uc774 \ubaa8\ub4c8\uc5d0\uc11c\ub294 \ub300\uba85\uc0ac\uc758 \uc758\ubbf8\ub97c \uacb0\uc815\ud558\ub294 \uc791\uc5c5\uc744 \ud55c\ub2e4.<\/li>\n      <li>interaction\uc740 \uc5ec\ub7ec \ubb38\uc7a5\uc5d0 \uac78\uccd0\uc11c \ub300\uba85\uc0ac\ub97c \uc0ac\uc6a9\ud558\uba70 \ub098\ud0c0\ub098\uae30 \ub54c\ubb38\uc5d0 anaphora resolution\uc740 \ub9e4\uc6b0 \uc911\uc694\ud558\ub2e4.<\/li>\n      <li>RelEx\ub77c\ub294 anaphora resolution module\uc744 \uc0ac\uc6a9\ud588\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Syntactic extraction\n    <ul>\n      <li>\ubcf5\ubb38 \ub4f1\uc758 \ubcf5\uc7a1\ud55c \ubb38\uc7a5\uc774 \uc885\uc885 \uc788\ub294\ub370 \uc6b0\ub9ac \uc2dc\uc2a4\ud15c\uc758 syntactic extractor module\uc774 simple sentence\ub97c \ucc3e\uc544\uc900\ub2e4.<\/li>\n      <li>\uc774 \uc2dc\uc2a4\ud15c\uc740 \u2018The Pseudomonas syringae type III effector protein avirulence protein B (AvrB) is delivered into plant cells, where it targets the Arabidopsis RIN4 protein\u2019 \ub97c \u2018The Pseudomonas syringae type III effector protein avirulence protein B (AvrB) is delivered into plant cells\u2019\uc640 \u2018The Pseudomonas syringae type III effector protein avirulence protein B (AvrB) targets the Arabidopsis RIN4 protein\u2019, \ub450 \uac1c\uc758 \uc2ec\ud50c\ud55c \ubb38\uc7a5\uc73c\ub85c \ubd84\ub9ac\uc2dc\ucf1c \uc900\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Interaction keyword tagging\n    <ul>\n      <li>interaction keyword\ub97c tagging\ud558\uae30 \uc704\ud574 WordNet\uc744 \uc774\uc6a9\ud574\uc11c stemming\uc744 \ud55c\ub2e4.<\/li>\n      <li>\uc218\ub3d9\uc73c\ub85c \ub9cc\ub4e0 interaction keyword dictionary\ub97c \uae30\ubc18\uc73c\ub85c stem dictionary\ub97c \ub9cc\ub4e0\ub2e4.<\/li>\n      <li>\uc774\ub97c \uc774\uc6a9\ud574\uc11c stemming\uc744 \uc9c4\ud589\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Role type matching\n    <ul>\n      <li>\uc774 \ubaa8\ub4c8\uc5d0\uc11c\ub294 subject, verb, object, modifiying phrase \ub4f1\uc758 \uac01 \ubb38\ubc95\uc801 \uc694\uc18c(syntactic component)\uc758 \uc5ed\ud560\uc774 \uacb0\uc815\ub41c\ub2e4.<\/li>\n      <li>\ud558\ub098\uc758 host, pathogen entity \ud639\uc740 interaction keyword \ub4f1\uc740 elementary type\uc73c\ub85c, \ub450 \uac1c\uba74 partial, \uc138 \uac1c\uba74 complete\uc73c\ub85c role\uc758 \ud0c0\uc785\uc744 \ub098\ub234\ub2e4. (\uc65c \ub098\ub208\uac70\uc9c0\u2026)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Interaction extraction\n    <ul>\n      <li>\uc544\ub798\uc640 \uac19\uc740 \ud328\ud134\uc744 \uc774\uc6a9\ud574\uc11c host\uc640 pathogen\uc758 interaction\uc744 \ucd94\ucd9c\ud574\ub0b8\ub2e4.\n        <ol>\n          <li>A + interaction verb + B<\/li>\n          <li>Interaction noun + \u2018between\u2019 + A + \u2018and\u2019 + B<\/li>\n          <li>Interaction noun + \u2018of\u2019 + A + \u2018by\u2019 + B<\/li>\n        <\/ol>\n      <\/li>\n      <li>\uac01\uac01\uc740 syntactic component(subject, verb, object \ub4f1)\ub77c\uace0 \ubcfc \uc218 \uc788\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Uncertainty analysis\n    <ul>\n      <li>negation keyword(\u2018does not\u2019, \u2018cannot\u2019)\uacfc uncertainty keyword(\u2018possibly\u2019, \u2018may\u2019) \uc0ac\uc804\uc744 \uc774\uc6a9\ud558\uc5ec \ubd88\ud655\uc2e4\ud55c \uc815\ubcf4\ub97c \uac78\ub7ec\ub0b8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Interaction normalization\n    <ul>\n      <li>\uc5ec\ub7ec \ubb38\uc7a5\uc5d0 \uc911\ubcf5\ub418\uc5b4 \ub098\ud0c0\ub098\ub294 HPI\ub97c \ucc3e\uc544\ub0b4\ub294 \uac83\uc774\ub2e4.<\/li>\n      <li>\uac19\uc740 tuple\ub85c \ub098\ud0c0\ub098\ub294 host\/pathogen\/protein\/gene\uc5d0 \ub300\ud574\uc11c \uc911\ubcf5\uc73c\ub85c \ucc98\ub9ac\ud55c\ub2e4.<\/li>\n      <li>\uc774\ubbf8 Entity tagging \ub2e8\uacc4\uc5d0\uc11c protein\/gene\uacfc organism\uc758 \uc774\ub984\uc740 real entity\ub85c normalize \ub418\uc5c8\uae30\uc5d0, real entity \ub2e8\uc5d0\uc11c \uc911\ubcf5\uccb4\ud06c\ub97c \ud55c\ub2e4.<\/li>\n      <li>HPI\uac00 \uc5ec\ub7ec \uc904\uc5d0 \ub098\ud0c0\ub0a0 \ub54c, \uadf8 \uc911 \ud558\ub098\ub77c\ub3c4 negative\ud558\uba74 \ud574\ub2f9 HPI\ub97c negative\ub85c \ucc98\ub9ac\ud558\uace0 uncertainty\ub3c4 \ub9c8\ucc2c\uac00\uc9c0\uc774\ub2e4. \ub9cc\uc57d \uc774 \ub450 \uc870\uac74\uc5d0 \ud574\ub2f9\ud558\uc9c0 \uc54a\uc744 \ub54c\ub9cc certain HPI\ub85c \ubcf8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"23-assessment\">2.3 Assessment<\/h3>\n\n<ul>\n  <li>2\uac00\uc9c0 \ubc29\ubc95\uc73c\ub85c \ud3c9\uac00\ud588\ub2e4. \ud558\ub098\ub294 \uc11c\ub85c \ube44\uad50\ud558\ub294 \ubc29\ubc95, \uadf8\ub9ac\uace0 \ud558\ub098\ub294 PPI SOTA\uc5d0 \uae30\ubc18\ud55c naive protocol.<\/li>\n  <li>Naive protocol\n    <ul>\n      <li>\uc774 \ubc29\ubc95\uc740 PPI\ub97c \uc774\uc6a9\ud574\uc11c HPI \ubb38\uc7a5\uc744 \ucd94\ucd9c\ud558\ub294 \ubc29\uc2dd\uc774\ub2e4. \ud3c9\uac00\ud558\ub294 \ubc29\uc2dd\uc774 \uc544\ub2cc \ub4ef.<\/li>\n      <li>(1) PPI\ub97c \uae30\uc900\uc73c\ub85c abstract\uc774 PPI \uc815\ubcf4\ub97c \uac16\uace0 \uc788\ub294\uc9c0 \ubcf4\uace0, (2) PPI \uc911 \ucd5c\uc18c \ud558\ub098\uc758 host, \ud558\ub098\uc758 pathogen keyword\ub97c \ud3ec\ud568\ud558\uace0 \uc788\uc73c\uba74 HPI\ub97c \ud3ec\ud568\ud558\uace0 \uc788\ub2e4\uace0 \ubcf4\ub294 \uac83\uc774\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Assessment of approaches\n    <ul>\n      <li>naive approach\ub97c \ud3ec\ud568\ud55c \ucd1d 3\uac1c\uc758 approach (\uc704\uc5d0\uc11c \uc18c\uac1c\ud55c \ub450\uac1c)\uc758 \uc131\ub2a5\uc740 \uc544\ub798\uc640 \uac19\uc774 \ud3c9\uac00 \ub41c\ub2e4.\n        <ul>\n          <li>Task 1 : abstract\uc774 HPI-relevant\ud55c \uc9c0 binary classification\n            <ul>\n              <li>accuracy, precision, recall, F-score, AUC(area under ROC curve)\ucd1d 5\uac00\uc9c0\uc758 \uc9c0\ud45c\ub97c \uc0ac\uc6a9\ud574\uc11c \ud3c9\uac00\ud55c\ub2e4. AUC \uac19\uc740 \uacbd\uc6b0 feature-based method\uc5d0 \ub300\ud574\uc11c\ub9cc \ud55c\ub2e4.<\/li>\n              <li>feature-based approach\uc758 \uacbd\uc6b0 3\uac00\uc9c0 \ubc29\uc2dd\uc73c\ub85c \uc9c0\ud45c\ub97c \uad6c\ud558\uac8c \ub418\ub294\ub370, \uc6b0\uc120 350\uac1c\uc758 abstract \uc911 75%\uc778 262\uac1c\uc758 abstract\uc740 train\uc73c\ub85c \uc4f0\uace0 \ub098\uba38\uc9c0\ub294 test\ub85c \uc4f4\ub2e4.<\/li>\n              <li>\uccab \ubc88\uc9f8 \ubc29\uc2dd\uc740 \uc704\uc5d0\uc11c \ubd84\ud560\ud55c testset\uc73c\ub85c \ud3c9\uac00\ub97c \uc9c4\ud589\ud558\ub294 \uac83\uc774\ub2e4.<\/li>\n              <li>\ub450 \ubc88\uc9f8 \ubc29\uc2dd\uc740 trainset\uc5d0 \ub300\ud574\uc11c 10-fold CV\ub97c \uc9c4\ud589\ud558\ub294 \uac83\uc774\ub2e4.<\/li>\n              <li>\ub9c8\uc9c0\ub9c9 \uc138 \ubc88\uc9f8 \ubc29\uc2dd\uc740 train\uacfc test\ub97c \ud569\uccd0\uc11c leave-one-out CV\ub97c \uc9c4\ud589\ud558\ub294 \uac83\uc774\ub2e4.<\/li>\n              <li>language-based\uc640 naive approach \uc5ed\uc2dc testset\uc5d0 \ub300\ud574\uc11c \ud3c9\uac00\ub97c \uc9c4\ud589\ud558\uace0 feature-based\uc640 \ube44\uad50\ud55c\ub2e4.<\/li>\n            <\/ul>\n          <\/li>\n          <li>Task 2 : HPI-relevant abstract\uc5d0\uc11c \uc5b4\ub5a4 \ubb38\uc7a5\uc774 HPI-relevant information\uc744 \ub2f4\uace0 \uc788\ub294\uc9c0\n            <ul>\n              <li>\ub450\uac00\uc9c0 \uc885\ub958\ub85c sentence\ub97c annotation \ud588\ub2e4.<\/li>\n              <li>\uccab \ubc88\uc9f8\ub294 host\uc640 pathogen \ubaa8\ub450\uac00 \ud55c \ubb38\uc7a5\uc5d0 \ub098\ud0c0\ub098\uba74 complete, \uc5ec\ub7ec \ubb38\uc7a5\uc5d0 \uc774 \uc815\ubcf4\uac00 \ub098\ub258\uc5b4\uc838 \uc788\uc73c\uba74 partial\uc774\ub77c\uace0 annotation \ud55c\ub2e4.<\/li>\n              <li>\ub2e4\uc74c\uc758 2\uac00\uc9c0 \ud3c9\uac00 \uc9c0\ud45c\ub97c \uc0ac\uc6a9\ud55c\ub2e4.\n                <ol>\n                  <li>prediction accuracy\ub97c \ubcf4\uae30 \uc704\ud55c HPI-relevant\ub77c\uace0 \ucd94\ucd9c\ub41c \ubb38\uc7a5 \uc911 tp sentence\uc758 \ube44\uc728<\/li>\n                  <li>prediction coverage\ub97c \ubcf4\uae30 \uc704\ud55c positive\uc73c\ub85c annotation\ub41c \ubb38\uc7a5 \uc911 \uc2e4\uc81c\ub85c \uadf8\ub807\uac8c \uc608\uce21\ub41c \ube44\uc728<\/li>\n                <\/ol>\n              <\/li>\n              <li>\ub610\ud55c \uc704\uc758 \ub450 \uc9c0\ud45c\ub97c \ub2e4\uc74c 4\uac00\uc9c0 set\uc5d0 \ub300\ud574\uc11c \ud3c9\uac00\ud55c\ub2e4.\n                <ol>\n                  <li>language-based model\uc5d0 \uc758\ud574\uc11c both organism\uc774 \ucd94\ucd9c\ub41c abstract\uc5d0 \uc788\ub294 complete sentence\ub4e4\uc758 \uc9d1\ud569<\/li>\n                  <li>language-based model\uc5d0 \uc758\ud574\uc11c both organism\uc774 \ucd94\ucd9c\ub41c abstract\uc5d0 \uc788\ub294 partial sentence\ub4e4\uc758 \uc9d1\ud569<\/li>\n                  <li>language-based model\uc5d0 \uc758\ud574\uc11c both protein\/gene\uc774 \ucd94\ucd9c\ub41c abstract\uc5d0 \uc788\ub294 complete sentence\ub4e4\uc758 \uc9d1\ud569<\/li>\n                  <li>language-based model\uc5d0 \uc758\ud574\uc11c both protein\/gene\uc774 \ucd94\ucd9c\ub41c abstract\uc5d0 \uc788\ub294 partial sentence\ub4e4\uc758 \uc9d1\ud569<\/li>\n                <\/ol>\n              <\/li>\n              <li>\uadf8\ub798\uc11c \ucd5c\uc885\uc801\uc73c\ub85c 8\uac1c\uc758 measure\uac00 \uac01 approach\ub9c8\ub2e4 \ub098\uc628\ub2e4.<\/li>\n            <\/ul>\n          <\/li>\n          <li>Task 3 : \ubb38\uc7a5\uc5d0\uc11c host-pathogen pair interaction \ub9de\ucd94\uae30\n            <ul>\n              <li>host, pathogen pair\ub97c \ucc3e\uc558\ub294\uc9c0, organism\uc744 \ucc3e\uc558\ub294\uc9c0\uc5d0 \ub300\ud55c precision, recall, f-score\ub97c \ubcf8\ub2e4.<\/li>\n              <li>\ub530\ub77c\uc11c 6\uac00\uc9c0 \uc9c0\ud45c\uac00 \ub098\uc624\uace0, protein\/gene\uacfc organism \uc911 \ud558\ub098 \uc774\uc0c1\ub9cc \ub098\ud0c0\ub098\ub294 \uacbd\uc6b0\uc5d0 \ub300\ud574 \ub610 6\uac00\uc9c0 \uc9c0\ud45c\ub97c \ubcf8\ub2e4.<\/li>\n            <\/ul>\n          <\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"3-results\">3. Results<\/h2>\n\n<h3 id=\"31-data-collection\">3.1 Data collection<\/h3>\n\n<ul>\n  <li>MEDLINE\/PubMed database\uc5d0\uc11c \ub370\uc774\ud130\ub97c \uc218\uc9d1\ud588\ub2e4.<\/li>\n  <li>data\ub294 29 host\uc640 77 pathogen organism \uc0ac\uc774\uc5d0 PPI \uc815\ubcf4\ub97c \ud3ec\ud568\ud55c 175 positive abstract\uc640 HPI \uc815\ubcf4\uac00 \uc5c6\ub294 175 negative abstract\uc73c\ub85c \uad6c\uc131\ub418\uc5b4 \uc788\ub2e4.<\/li>\n  <li>positive set\uc740 \ucd5c\uc18c \ud558\ub098\uc758 host\uc640 \ud558\ub098\uc758 pathogen\uc744 \ud3ec\ud568\ud558\ub294 abstract\ub85c \uad6c\uc131\ub41c\ub2e4.<\/li>\n  <li>\ub610\ud55c, \uac01 abstract\ub294 host \uc774\ub984, pathogen \uc774\ub984, host protein\/gene, pathogen protein\/gene, certain\/uncertain HPI\uc5d0 \ub300\ud574\uc11c annotation\ub41c\ub2e4.<\/li>\n  <li>negative set\ub3c4 positive\uacfc \uc720\uc0ac\ud558\uac8c \ub9cc\ub4e0\ub2e4.<\/li>\n<\/ul>\n\n<h3 id=\"32-evaluation-of-feature-based-language-based-and-naive-approaches\">3.2 Evaluation of feature-based, language-based and naive approaches<\/h3>\n\n<ul>\n  <li>precision, recall, f-score, AUC \ub4f1\uc744 \ubd04<\/li>\n<\/ul>\n","pubDate":"Sat, 03 Mar 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/literature-mining-host-pathogen-interaction\/","guid":"https:\/\/roomylee.github.io\/literature-mining-host-pathogen-interaction\/","category":["literature-mining","host-pathogen-interaction","feature-based-supervised-learning","language-based-supervised-learning","blog"]},{"title":"Incorporating Relation Paths in Neural Relation Extraction (EMNLP 2017)","description":"<ul>\n  <li>Paper Link: &lt;#  <a href=\"http:\/\/aclweb.org\/anthology\/D17-1186\">[pdf]<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Wenyuan Zeng (Tsinghua University)<\/li>\n      <li>Yankai Lin (Tsinghua University)<\/li>\n      <li>Zhiyuan Liu (Tsinghua University)<\/li>\n      <li>Maosong Sun (Tsinghua University)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>EMNLP 2017<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Distantly supervised relation extraction\uc740 plain text\ub85c\ubd80\ud130 \uc0c8\ub85c\uc6b4 relation fact\ub97c \ucc3e\uc544\ub0b4\uae30 \uc704\ud574 \ub110\ub9ac \uc0ac\uc6a9\ub428<\/li>\n  <li>Target entity pair \uc0ac\uc774\uc758 relation\uc744 \uc608\uce21\ud558\uae30 \uc704\ud55c \ud604\uc7ac\uc758 \ubc29\ubc95\ub4e4\uc740 entity pair\uac00 \ud3ec\ud568\ub41c \ubb38\uc7a5 \uadf8 \uc790\uccb4\uc5d0 \ub300\ud574 \uc758\uc874\uc801\uc784<\/li>\n  <li>\uc0ac\uc2e4, target entity pair \uc911 \ud558\ub098\uc758 entity\ub9cc\uc744 \ud3ec\ud568\ud55c \ubb38\uc7a5\uc740 \ub9e4\uc6b0 \ub9ce\uc774 \uc788\uace0 \uc774 \ubb38\uc7a5\ub4e4\uc740 \ub9e4\uc6b0 \uc720\uc6a9\ud55c \uc815\ubcf4\ub97c \uc81c\uacf5\ud558\uc9c0\ub9cc, \uc544\uc9c1 relation extraction\uc5d0 \uc774\uc6a9\ub41c \ubc14\uac00 \uc5c6\uc74c<\/li>\n  <li>\uc774\ub7f0 \uc774\uc288\ub97c \ucc98\ub9ac\ud558\uae30 \uc704\ud574,\n    <ul>\n      <li>\uc6b0\ub9ac\ub294 \uc778\uc811\ud55c entity\ub4e4\uc744 \ud1b5\ud574\uc11c \ub450 target entity \uac04\uc758 \ucd94\ub860 \uc5f0\uc1c4(inference chains)\ub97c \ub9cc\ub4ec<\/li>\n      <li>\uadf8\ub9ac\uace0 \ubb38\uc7a5 \uc790\uccb4\uc640 inference chains\uc73c\ub85c\ubd80\ud130 relational semantics\ub97c encoding\ud558\uae30 \uc704\ud55c path-based neural relation extraction model\ub97c \uc81c\uc548\ud558\ub294 \ubc14\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>real-world datasets\uc73c\ub85c \uc2e4\ud5d8\ud55c \uacb0\uacfc\ub97c \ud1b5\ud574\uc11c,\n    <ul>\n      <li>\uc6b0\ub9ac\uc758 \ubaa8\ub378\uc774 \ud558\ub098\uc758 target entity\ub9cc \ub098\ud0c0\ub09c \ubb38\uc7a5\ub4e4\uc744 \uc644\uc804\ud788 \uc0ac\uc6a9\ud558\uc600\uace0<\/li>\n      <li>relation extraction\uc758 baseline\ub4e4\uc5d0 \ube44\ud574 \uc0c1\ub2f9\ud55c \uc131\ub2a5 \ud5a5\uc0c1\uc744 \uac00\uc838\uc654\ub2e4\ub294 \uac83\uc744 \ubcf4\uc5ec\uc904 \uac83\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>Github\uc5d0 \uc18c\uc2a4\ub97c \uacf5\uac1c\ud568. <a href=\"https:\/\/github.com\/thunlp\/PathNRE\">https:\/\/github.com\/thunlp\/PathNRE<\/a><\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Knowledge Bases (KBs)\ub294 \ud604\uc2e4\uc758 \uc0ac\uc2e4\ub4e4\uc5d0 \ub300\ud55c \ud6a8\uacfc\uc801\uc778 \uc815\ud615 \ub370\uc774\ud130\ub97c \uc81c\uacf5\ud558\uace0 Web search\ub098 QA \ub4f1\uc758 NLP \uc751\uc6a9 \ubd84\uc57c\uc5d0\uc11c \uc911\uc694\ud55c \uc790\uc6d0\uc73c\ub85c \uc0ac\uc6a9\ub428<\/li>\n  <li>Freebase, DBpedia, YAGO \ub4f1\uc758 KB\ub97c \ub9ce\uc774 \uc0ac\uc6a9\ud558\ub294\ub370 \uc774\ub4e4\uc740 multi-relational data\uc774\uace0 triple \ud615\ud0dc\ub85c \ud45c\ud604\ub428<\/li>\n  <li>Real world\uc758 fact\ub294 \ubb34\ud55c\uc774 \uc99d\uac00\ud558\ub294\ub370 \ube44\ud574, \ud604\uc874\ud558\ub294 KB\ub294 \uc774\uc5d0 \ube44\ud574 \ud55c\ucc38 \ubd80\uc871\ud568<\/li>\n  <li>\ucd5c\uadfc\uc5d0\ub294 \ub9ce\uc740 \uc591\uc758 \ub2e4\ub978 structure type\uc744 \ud3ec\ud568\ud558\ub294 petabytes \ub2e8\uc704\uc758 \uc790\uc5f0\uc5b4 text\ub97c \uc774\uc6a9\ud560 \uc218 \uc788\uace0, \uc774\ub294 \uc790\ub3d9\uc73c\ub85c \ubaa8\ub974\ub294(unknown) relational fact\ub97c \ucc3e\ub294\ub370 \uc911\uc694\ud55c \uc790\uc6d0\uc774 \ub428<\/li>\n  <li>\uc774\ub54c\ubb38\uc5d0 RE\ub294 plain text\ub85c\ubd80\ud130 \uc815\ud615 \uc815\ubcf4(structured information)\ub97c \ucd94\ucd9c\ud558\ub294 \ud14c\uc2a4\ud06c\ub85c \uc815\uc758\ud568<\/li>\n  <li>\ub300\ubd80\ubd84\uc758 supervised RE system\ub4e4\uc740 labeled data\uac00 \ucda9\ubd84\ud558\uc9c0 \ubabb\ud568. \uc218\ub3d9 \ud0dc\uae45\uc740 \uc2dc\uac04\uacfc \ub178\ub3d9\ub825\uc774 \ub9ce\uc774 \ub4ec<\/li>\n  <li>\uc774\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574 KB\ub97c \uc774\uc6a9\ud574 plain text\ub85c\ubd80\ud130 \uc790\ub3d9\uc73c\ub85c training data\ub97c \ub9cc\ub4e4\uc5b4\uc8fc\ub294 distant supervision\uc774 \ub098\uc634. \uadf8\ub9ac\uace0 \uc5ec\uae30\uc5d0 \ub354\ud574\uc11c neural model\uc744 \ub9cc\ub4dc\ub294 \ub178\ub825\ub4e4\uc774 \uc788\uc5c8\uc74c<\/li>\n  <li>\uc774\ub7f0 model\ub4e4\uc5d0\ub294 \ud55c \uac00\uc9c0 \uc911\uc694\ud55c \uacb0\uc810\uc774 \uc788\ub294\ub370, \ubc14\ub85c \ub450 target entity\uac00 \ub098\ud0c0\ub098\ub294 \ubb38\uc7a5\ub9cc\uc73c\ub85c \ud559\uc2b5\uc744 \ud55c\ub2e4\ub294 \uac83\uc784<\/li>\n  <li><strong>\uadf8\ub7ec\ub098 \uc624\uc9c1 \ud558\ub098\uc758 entity\ub9cc \ub098\ud0c0\ub098\ub294 \ubb38\uc7a5\ub3c4 \uc720\uc6a9\ud55c \uc815\ubcf4\ub97c \uc81c\uacf5\ud558\uace0 inference chains\uc744 \ub9cc\ub4dc\ub294\ub370 \ub3c4\uc6c0\uc774 \ub428<\/strong><\/li>\n  <li><strong>\uc608\ub97c \ub4e4\uc5b4, \u201c<em>h<\/em> is father of <em>e<\/em>\u201d and \u201c<em>e<\/em> is the father of <em>t<\/em>\u201d \ub77c\ub294 \ub450 \ubb38\uc7a5\uc774 \uc788\uc73c\uba74 \uc6b0\ub9ac\ub294 <em>h<\/em>\uac00 <em>t<\/em>\uc758 \ud560\uc544\ubc84\uc9c0(grandfather)\ub77c\ub294 \uc0ac\uc2e4\uc744 \uc720\ucd94\ud560 \uc218 \uc788\ub2e4\ub294 \uac70\uc784<\/strong><\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36712183-e6782d6c-1bca-11e8-8a8d-6c8d4ff5d1f9.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>\uc774 \uc5f0\uad6c\uc5d0\uc11c \uc6b0\ub9ac\ub294 Figure 1\uacfc \uac19\uc740 relation path\ub97c \ud1b5\ud55c path-based neural relation extraction model\uc744 \uc81c\uc548\ud558\ub294 \ubc14\uc784\n    <ul>\n      <li>\uccab\uc9f8, \uc6b0\ub9ac\ub294 CNN\uc744 \uc774\uc6a9\ud574\uc11c \ubb38\uc7a5\uc758 \uc758\ubbf8(semantics)\ub97c \uc784\ubca0\ub529\uc2dc\ud0a4\uace0\uc790 \ud568<\/li>\n      <li>\uadf8\ub7ec\uace0\ub098\uc11c, \uc6b0\ub9ac\ub294 inference chain\uc774 \uc8fc\uc5b4\uc84c\uc744 \ub54c(given) \uac01 relation\ub4e4\uc758 \ud655\ub960\uc744 \uce21\uc815\ud560 \uc218 \uc788\ub294 relation path encoder\ub97c \ub9cc\ub4ec<\/li>\n      <li>\ub9c8\uc9c0\ub9c9\uc73c\ub85c relation\uc744 prediction\ud558\uae30 \uc704\ud574 \uc704\uc758 \ub450 \uc815\ubcf4\uc778 direct sentences\uc640 relation path\ub97c \ud569\uce68(combination)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Real-world dataset\uc73c\ub85c \ud3c9\uac00\ud588\uace0 baseline\ub4e4\ubcf4\ub2e4 \uc0c1\ub2f9\ud788 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n  <li>entity \ud558\ub098\ub9cc \ub4f1\uc7a5\ud558\ub294 \ubb38\uc7a5\ub3c4 \uc774\uc6a9\ud558\ubbc0\ub85c\uc11c \uc6b0\ub9ac\uc758 \ubaa8\ub378\uc740 \ub354\uc6b1 robust\ud558\uace0 noisy instance\uac00 \uc99d\uac00\ud574\ub3c4 \uc798 \uc791\ub3d9\ud568<\/li>\n  <li>Plain text\uc5d0\uc11c relation path\ub97c \uc774\uc6a9\ud574\uc11c neural relation extraction\uc744 \ud55c \ucd5c\ucd08\uc758 \uc5f0\uad6c\uc784<\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"3-our-method\">3. Our Method<\/h2>\n\n<ul>\n  <li>[target entity pair, entity pair\ub97c \ud3ec\ud568\ud55c \ubb38\uc7a5\ub4e4, relation path\ub4e4], \ucd1d 3\uac00\uc9c0\uac00 \uc8fc\uc5b4\uc84c\uc744 \ub54c, \uc6b0\ub9ac\uc758 \ubaa8\ub378\uc740 \ud574\ub2f9 entity pair\uc5d0 \ub300\ud55c \uac01 relation\uc758 \uc2e0\ub8b0\ub3c4(confidence)\ub97c \uce21\uc815\ud558\ub294 \uc5ed\ud560\uc744 \ud568<\/li>\n  <li>\uc774\ubc88 section\uc5d0\uc11c \uc6b0\ub9ac\ub294 \ubaa8\ub378\uc758 3\uac00\uc9c0 \ud30c\ud2b8\ub97c \uc18c\uac1c\ud560 \uac70\uc784\n    <ol>\n      <li><strong>Text Encoder<\/strong>: target entity pair\ub97c \ud3ec\ud568\ud55c \ubb38\uc7a5\uc774 \uc8fc\uc5b4\uc9c0\uba74, CNN\uc73c\ub85c \ubb38\uc7a5\uc744 semantic space\uc5d0 \uc784\ubca0\ub529\uc2dc\ud0a4\uace0 \uac01 relation\uc5d0 \ud574\ub2f9\ub420 \ud655\ub960\uc744 \uad6c\ud568<\/li>\n      <li><strong>Relation Path Encoder<\/strong>: \ub450 target entity \uc0ac\uc774\uc758 relation path\uac00 \uc8fc\uc5b4\uc9c0\uba74, \uc774 path\ub97c \uc870\uac74\uc73c\ub85c \ud558\uc5ec \uac01 relation\uc5d0 \ud574\ub2f9\ub420 \ud655\ub960\uc744 \uad6c\ud568<\/li>\n      <li><strong>Joint Model<\/strong>: 1. direct sentences \uc640 2. relation paths \uc758 \uc815\ubcf4\ub97c \ud1b5\ud569\ud558\uc5ec \uac01 relation class\uc5d0 \ub300\ud55c confidence\ub97c \uc608\uce21\ud568<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36712296-7e91bfd2-1bcb-11e8-951d-bf64868ad400.png\" alt=\"figure2\" \/><\/p>\n\n<h3 id=\"31-text-encoder\">3.1. Text Encoder<\/h3>\n\n<ul>\n  <li>\uae30\ubcf8\uc801\uc778 CNN for RE \ubaa8\ub378<\/li>\n  <li>word and position embeddings -&gt; text convolution -&gt; max pooling -&gt; tanh -&gt; FC -&gt; softmax<\/li>\n  <li>output\uc5d0 \ub300\ud574\uc11c multi-instance learning\uc744 \uc9c4\ud589<\/li>\n  <li>multi-instance learning\uc774\ub780 \uac01 relation\uc5d0 \ub300\ud574\uc11c \uac00\uc7a5 \ud655\ub960\uc774 \ub192\uc740(\uac00\uc7a5 relation\uc774 \uba85\ud655\ud55c) \ubb38\uc7a5\ub9cc \ubf51\uc544\uc11c \ud559\uc2b5\ud558\ub294 \ubc29\uc2dd\uc784. \uc774\uc804 \uae30\uc800(basis) \ub17c\ubb38\uc5d0\uc11c\ub294 bag\uc774\ub77c\ub294 \uac78 \ub480\ub358 \uac70 \uac19\uc740\ub370 \uadf8\ub7f0 \uc5b8\uae09\uc740 \uc5c6\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"32-relation-path-encoder\">3.2. Relation Path Encoder<\/h3>\n\n<ul>\n  <li>\uc6b0\ub9ac\ub294 relation path\uc758 \ucd94\ub860 \uc815\ubcf4\ub97c embedding\ud558\uae30 \uc704\ud574\uc11c Relation Path Encoder\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>Relation Path Encoder\ub294 relation path\uac00 \uc8fc\uc5b4\uc84c\uc744 \ub54c (given), \uac01 relation\uc5d0 \ud574\ub2f9\ub420 \ud655\ub960\uc744 \uce21\uc815\ud568<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36721730-170dc6ca-1bef-11e8-912a-84e24f2fcfe3.png\" alt=\"eq\" \/><\/p>\n\n<ul>\n  <li>\uc608\ub97c \ub4e4\uc5b4,\n    <ul>\n      <li>(h, t) pair \uac04\uc758 path p1\uc744 {(h, e), (e, t)}\ub77c\uace0 \ud558\uace0 \uc774\ub294 rA, rB\uc5d0 \ud574\ub2f9\ub41c\ub2e4\uace0 \ud558\uc790<\/li>\n      <li>(h, e)\uc640 (e, t) \uac01\uac01\uc740 \ucd5c\uc18c \ud558\ub098 \uc774\uc0c1\uc758 sentence\uc5d0\uc11c \ub098\ud0c0\ub09c\ub2e4\uace0(correspond) \ud558\uc790<\/li>\n      <li>\uc774\ub54c \uc6b0\ub9ac\uc758 \ubaa8\ub378\uc740 \uc704\uc758 Equation 8\uc758 \uc2dd\uc73c\ub85c \ud655\ub960\uc744 \uacc4\uc0b0\ud568<\/li>\n      <li>Equation 8\uc758 \uc124\uba85\uc744 \uc704\ud574 Equation 9\ub97c \uba3c\uc800 \ubcf4\uc790\n        <ul>\n          <li>Equation 9\ub294 \ud2b9\uc815 relaion\uacfc, rA\uc640 rB\ub97c \uc5f0\uc1c4\uc2dc\ud0a8 relation\uacfc\uc758 \uc720\uc0ac\ub3c4\ub97c \uc758\ubbf8\ud568<\/li>\n          <li>\uc5f0\uc1c4\uc2dc\ud0a8 relation\uc740 \uc2e4\uc81c\ud558\ub294 \ud074\ub798\uc2a4\uac00 \uc544\ub2c8\uace0 rA\uc640 rB\uc758 \ud569\uc784<\/li>\n          <li>rA\uc640 rB\ub97c \ube44\ub86f\ud55c \ubaa8\ub4e0 relation class\ub294 binary vector representation\uc774 \uc544\ub2cc distributed representation\uc774\uace0 \ub530\ub77c\uc11c \uc704\uc5d0\uc11c \ub9d0\ud55c rA\uc640 rB\uc758 \ud569\uc740 \ubca1\ud130 element-wise sum\uc774\ub77c\uace0 \ubcfc \uc218 \uc788\uc74c<\/li>\n          <li>(-)\ub97c \ubd99\uc778 \uc774\uc720\ub294 L1(\uc808\ub300\uac12)\uc774 \uc791\uc744\uc218\ub85d(0\uc5d0 \uac00\uae4c\uc6b8\uc218\ub85d) ri\uc640 (rA+rB)\uac00 \uc720\uc0ac\ud55c \uac83\uc744 \ud45c\ud604\ud558\uae30 \uc704\ud574\uc11c\uc784<\/li>\n        <\/ul>\n      <\/li>\n      <li>\ub530\ub77c\uc11c Equation 8\uc740 \uc774\ub7f0 \uc720\uc0ac\ub3c4(Equation 9)\uc5d0 \ub300\ud574\uc11c softmax\ub97c \ucde8\ud574\uc11c \ud655\ub960\uac12\uc73c\ub85c \ubcc0\ud658\uc2dc\ud0a8 \uc148\uc774 \ub428<\/li>\n      <li>\n        <table>\n          <tbody>\n            <tr>\n              <td>\uadf8\ub807\uae30\uc5d0 Equation 8\uc5d0\uc11c p(r<\/td>\n              <td>rA,rB)\uc758 \uc758\ubbf8\uc640 \uc6b0\ubcc0\uc758 \uc2dd\uc740 \uc77c\ub9e5\uc0c1\ud1b5\ud55c\ub2e4\uace0 \ubcfc \uc218 \uc788\ub294 \uac83\uc784<\/td>\n            <\/tr>\n          <\/tbody>\n        <\/table>\n      <\/li>\n      <li>\n        <table>\n          <tbody>\n            <tr>\n              <td>\uc55e\uc5d0 Equation 9\uc5d0\uc11c (-)\ub97c \ubd99\uc600\uae30\uc5d0 \uc720\uc0ac\ub3c4\uac00 \ub192\uc744\uc218\ub85d(Equation 9\uac00 \uc791\uc744\uc218\ub85d) p(r<\/td>\n              <td>rA,rB)\uac00 \ucee4\uc9c0\uac8c \ub428<\/td>\n            <\/tr>\n          <\/tbody>\n        <\/table>\n      <\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc704\uc758 \uc608\uc2dc\uc5d0\uc11c, Equation 9\uc758 ri\uac00 relation path pi: (h \u2013rA-&gt; e \u2013rB-&gt; t) \uc758\ubbf8\uc801\uc73c\ub85c \uc720\uc0ac\ud558\ub2e4\uba74 ri\uc758 embedding \uac12\uc774\ub791 (rA+rB)\uc758 \uc784\ubca0\ub529 \uac12\uc774 \ub9e4\uc6b0 \uac00\uae5d\ub2e4\ub294 \uac83\uc744 \uac00\uc815\uc73c\ub85c \ud568<\/li>\n  <li>\uc989, rA=father, rB=mother \ub77c\uace0 \ud558\uba74 father vector + mother vector = spouse vector\uac00 \ub418\ub3c4\ub85d relation class vector embeddings\uc744 \ud559\uc2b5\uc2dc\ud0a4\ub294 \ub290\ub08c\uc774\uc9c0 \uc54a\uc744\uae4c \uc2f6\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36728622-b7bea584-1c04-11e8-9026-c617d3d98846.png\" alt=\"eq2\" \/><\/p>\n\n<ul>\n  <li>\uacb0\uacfc\uc801\uc73c\ub85c relation-path score function\uc740 Equation 10\uacfc \uac19\uc774 \uc815\uc758\ub428<\/li>\n  <li>E(h,rA,e)\ub294 entity h\uc640 e\uac00 \ub4f1\uc7a5\ud558\ub294 \ubb38\uc7a5\uc5d0\uc11c \ub450 entity\uac00 rA\ub77c\ub294 \uad00\uacc4\ub97c \uac00\uc9c8 \ud655\ub960\uc744 \uc758\ubbf8\ud568<\/li>\n  <li>\uc2e4\uc81c\ub85c\ub294 \ub450 entity \uc0ac\uc774\uc5d0 \uc5ec\ub7ec \uac1c\uc758 relation path\uac00 \uc874\uc7ac\ud558\uae30 \ub54c\ubb38\uc5d0 Equation 11\uc744 \ud1b5\ud574 \ucd5c\uc885\uc801\uc73c\ub85c \uac00\uc7a5 \ud655\ub960 \ub192\uc740 relation path\ub97c \uc5bb\uac8c \ub428<\/li>\n<\/ul>\n\n<h3 id=\"33-joint-model\">3.3. Joint Model<\/h3>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36729969-36f87c9a-1c09-11e8-845a-dc6c720d885f.png\" alt=\"eq3\" \/><\/p>\n\n<ul>\n  <li>Direct sentence\uc5d0 \ub300\ud55c score\uc640 relation-path score\ub97c \ud569\uce5c(joint) \ucd5c\uc885 global score function\uc740 \uc704\uc758 Equation 12\uc640 \uac19\uc74c<\/li>\n  <li>\n    <table>\n      <tbody>\n        <tr>\n          <td>\uc5ec\uae30\uc11c \u03b1 = (1 - E(h,r,t<\/td>\n          <td>S)) * \u03b2 \uc784. \u03b2\ub294 \uc0c1\uc218\uc774\uba70 direct sentence\uc640 relation path \uac04\uc758 \uc0c1\ub300\uc801 \ube44\uc911\uc744 \uc870\uc815\ud558\ub294 \uc778\uc790\uc784<\/td>\n        <\/tr>\n      <\/tbody>\n    <\/table>\n  <\/li>\n  <li>\n    <table>\n      <tbody>\n        <tr>\n          <td>\u03b1\ub97c E(h,r,t<\/td>\n          <td>S)\uc5d0 \ub300\ud55c \uc2dd\uc73c\ub85c \ub454 \uc774\uc720\ub294, \ub9cc\uc57d E(h,r,t<\/td>\n          <td>S)\uac00 \ucda9\ubd84\ud788 \ud06c\ub2e4\uba74 \uad73\uc774 extra \uc815\ubcf4\uc778 relation path\uc5d0 \uc8fc\ubaa9\ud558\uc9c0 \uc54a\uc544\ub3c4 \ucda9\ubd84\ud788 \uc2e0\ub8b0\ud560\ub9cc\ud55c prediction\uc744 \ud560 \uc218 \uc788\uae30 \ub54c\ubb38\uc784<\/td>\n        <\/tr>\n      <\/tbody>\n    <\/table>\n  <\/li>\n  <li>Joint model\uc758 \uc7a5\uc810 \uc911 \ud558\ub098\ub294 error propagation\uc744 \uc644\ud654\uc2dc\ud0ac \uc218 \uc788\ub2e4\ub294 \uac83\uc784<\/li>\n  <li>\uadf8\ub0e5 direct sentence\uc640 relation-path \uac04\uc5d0 \ubd88\ud655\uc2e4\uc131\uc744 \uc11c\ub85c \uc801\uc808\ud788 \ubcf4\uc644\ud574\uc900\ub2e4\uace0 \ub9d0\ud558\ub294 \uac83 \uac19\uc74c. (\ub2f9\uc5f0\ud55c \uc598\uae34\ub370 \uc65c \uc7a5\uc810\uc774\ub77c\uace0 \uad73\uc774 \uc9d1\uc5b4\uc11c \ub9d0\ud558\ub294\uc9c0\ub294 \ubaa8\ub974\uaca0\uc74c)<\/li>\n<\/ul>\n\n<h3 id=\"34-optimization-and-implementation-details\">3.4. Optimization and Implementation Details<\/h3>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36730716-c583fa96-1c0b-11e8-8142-938365e6b185.png\" alt=\"eq4\" \/><\/p>\n\n<ul>\n  <li>\uc704\uc758 \uc2dd(Equation 13)\uc774 \ubc14\ub85c objective function (loss function)\uc784<\/li>\n  <li>Stochastic gradient descent\ub97c \uc0ac\uc6a9\ud574\uc11c objective function\uc744 \ucd5c\ub300\ud654\ud558\ub294 \ubc29\ud5a5\uc73c\ub85c \ud559\uc2b5(optimization)\ud568<\/li>\n  <li>W_E\ub9cc skip-gram model\ub85c \ucd08\uae30\ud654\ud558\uace0 \ub098\uba38\uc9c0 \ud30c\ub77c\ubbf8\ud130\ub294 \ub2e4 \ub79c\ub364<\/li>\n  <li>output layer\uc5d0 dropout \uc801\uc6a9<\/li>\n  <li>relation path structure\ub294 \ud559\uc2b5 \uc774\uc804\uc5d0 \ucd94\ucd9c\ud558\uace0 \uc800\uc7a5\ud574\ub480\uc74c. \ucd94\ucd9c\uc744 \uc5b4\ub5bb\uac8c \ud588\ub294\uc9c0\ub294 \uc5b8\uae09 \uc5c6\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"4-dataset\">4. Dataset<\/h2>\n\n<ul>\n  <li>New York Times corpus<\/li>\n  <li>ACE\ub294 \ub370\uc774\ud130\uac00 \ub108\ubb34 \uc801\uc5b4\uc11c \ud559\uc2b5\uc2dc\ud0a4\uae30 \ubd80\uc801\ud569\ud558\ub2e4\uace0 \ubd04<\/li>\n<\/ul>\n\n<h2 id=\"5-experiments\">5. Experiments<\/h2>\n\n<ul>\n  <li>Precision\/Recall curve\uc640 Precision@N (P@N)\uacfc F1 score\ub97c \uc9c0\ud45c\ub85c \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc774\uc804 Zeng et al.\uc758 2014\ub144 \ub17c\ubb38\uc758 CNN \ubaa8\ub378\uacfc 2015\ub144 \ub17c\ubb38\uc758 CNN(PCNN, multi-instance learning), \ub450 \uac00\uc9c0\ub97c \ube44\uad50 \ubaa8\ub378\ub85c \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h2 id=\"6-conclusion-and-future-work\">6. Conclusion and Future Work<\/h2>\n\n<ul>\n  <li>Relation path\ub97c encoding \uc2dc\ud0a8 neural relation extraction\uc744 \ud574\ubd04<\/li>\n  <li>Entity pair \uc911 \ub450 \uac1c \ubaa8\ub450 \ud639\uc740 \ud55c \uac1c\uc758 entity\ub9cc\uc774\ub77c\ub3c4 \ud3ec\ud568\ub41c \ubb38\uc7a5\uc744 \uc774\uc6a9\ud558\uc600\uace0 \ub355\ubd84\uc5d0 noisy data\uc5d0 \ub300\ud574 \ub354\uc6b1 robust\ud574\uc9d0<\/li>\n  <li>\uc774\uc804 baseline\uc5d0 \ube44\ud574 \uc0c1\ub2f9\ud55c \uc131\ub2a5 \ud5a5\uc0c1<\/li>\n  <li>Future work\uc73c\ub85c\ub294, (1) platin text\uc640 KB\uc758 relation path\ub97c \uc870\ud569\ud574\uc11c \uc0ac\uc6a9\ud574\ubcf4\uae30, (2) rnn \uac19\uc740 \uac70\ub85c relation path \uac04\uc758 \ub354\uc6b1 \ubcf5\uc7a1\ud55c \uc0c1\uad00\uc131\uc744 encoding\ud558\uae30 (\uc6b0\ub9ac\ub294 2 step path\uc600\uc9c0\ub9cc, multi-step path\ub97c \uc774\uc6a9)<\/li>\n<\/ul>\n","pubDate":"Tue, 27 Feb 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/relation-path-neural-relation-extraction\/","guid":"https:\/\/roomylee.github.io\/relation-path-neural-relation-extraction\/","category":["relation-path","neural-relation-extraction","relation-extraction","blog"]},{"title":"Relation Extraction with Multi-instance Multi-label Convolutional Neural Networks (COLING 2016)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/pdfs.semanticscholar.org\/8731\/369a707046f3f8dd463d1fd107de31d40a24.pdf\">https:\/\/pdfs.semanticscholar.org\/8731\/369a707046f3f8dd463d1fd107de31d40a24.pdf<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Xiaotian Jiang (University of Chinese Academy of Sciences)<\/li>\n      <li>Quan Wang (University of Chinese Academy of Sciences)<\/li>\n      <li>Peng Li (University of Chinese Academy of Sciences)<\/li>\n      <li>Bin Wang (University of Chinese Academy of Sciences)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>COLING 2016<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Distant supervision\uc740 relation extraction\uc744 \uc704\ud574 \uc790\ub3d9\uc73c\ub85c labeled data\ub97c \uc0dd\uc131\ud558\ub294\ub370 \ud6a8\uacfc\uc801\uc778 \ubc29\ubc95\uc784<\/li>\n  <li>\uc804\ud1b5\uc801\uc778 \ubc29\uc2dd\ub4e4\uc740 handcrafted feature\uc5d0 \uc0c1\ub2f9\ud788 \uc758\uc874\uc801\uc774\uace0 \uc774\uc5d0 \ub300\ud55c error\uac00 \uadf8\ub300\ub85c \uc804\ud574\uc9c0\uac8c \ub428<\/li>\n  <li>\uadf8\ub798\uc11c \ucd5c\uadfc\uc5d4 neural network\ub97c \uc774\uc6a9\ud574\uc11c relation classification\uc744 \uc704\ud574 \uc790\ub3d9\uc73c\ub85c feature\ub97c \ubf51\uc544\uc8fc\ub294 \ubc29\uc2dd\uc774 \uc81c\uc548\ub418\uace0 \uc788\uc74c<\/li>\n  <li>\ud558\uc9c0\ub9cc \uc774\ub7f0 \ubc29\uc2dd\ub4e4\uc740 traditional expressed-at-least-once assumption\uc744 \ub450\uace0 \uc788\uace0, \uc5ec\ub7ec \ubb38\uc7a5\uc5d0 \uac78\uccd0\uc11c \ub098\ud0c0\ub098\ub294 \uc815\ubcf4\ub97c \ud65c\uc6a9\ud560 \uc218 \uc5c6\uc74c<\/li>\n  <li>\ub610\ud55c, \ub3d9\uc77c\ud55c entity pair\uc5d0 \ub300\ud574\uc11c \uc5ec\ub7ec \uac1c\uc758 relation\uc774 \ub098\ud0c0\ub0a0 \uc218 \uc788\ub2e4\ub294 \uc0ac\uc2e4\uc744 \ubb34\uc2dc\ud558\uace0 \uc788\uc74c<\/li>\n  <li>\uc774 \ub17c\ubb38\uc5d0\uc11c \uc6b0\ub9ac\ub294 multi-instance multi-label cnn for distantly supervised RE\ub97c \uc81c\uc548\ud558\ub294 \ubc14\uc784<\/li>\n  <li>\uc6b0\uc120 expressed-at-least-once assumption\uc744 \uc644\ud654\uc2dc\ud0ac \uac83\uc774\uace0, cross-sentence max-pooling\uc744 \ud1b5\ud574 \ub2e4\ub978 \ubb38\uc7a5 \uac04\uc758 \uc815\ubcf4 \uacf5\uc720\ub97c \ud560 \uc218 \uc788\ub3c4\ub85d \ud560 \uac83\uc784<\/li>\n  <li>multi-label learning\uc744 \ud1b5\ud574\uc11c overlapping relation\uc744 \ub2e4\ub8f0 \uc218 \uc788\uc74c<\/li>\n  <li>state-of-the-art \ubcf4\ub2e4 \uc0c1\ub2f9\ud788 \uc88b\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Relation extraction\uc740 binary relation\uc744 plain text\ub85c \ubd80\ud130 \ucd94\ucd9c\ud558\ub294 task\ub77c\uace0 \uc815\uc758\ud568<\/li>\n  <li>Supervised method\ub4e4\uc774 \ub192\uc740 \uc131\ub2a5\uc73c\ub85c \uc778\ud574 \ub110\ub9ac \uc0ac\uc6a9\ub428\n    <ul>\n      <li>\ud558\uc9c0\ub9cc human annotation\uc774 \ud544\uc694\ud558\uace0 \uc774\ub97c \ub9cc\ub4e4\uae30 \uc704\ud574 \uc2dc\uac04\uc774 \ub9ce\uc774 \uc18c\ubaa8\ub41c\ub2e4\ub294 \ub2e8\uc810\uc774 \uc788\uc74c<\/li>\n      <li>\uadf8\ub798\uc11c knowledge base \uae30\ubc18\uc73c\ub85c \uc790\ub3d9\uc73c\ub85c labeled data\ub97c \uc0dd\uc131\ud574\uc8fc\ub294 distant supervision\uc774\ub77c\ub294 \uae30\ubc95\uc774 \ub4f1\uc7a5\ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li>supervised RE\ub294 POS tags, dependency path, named entity tags \ub4f1\uc758 lexical and syntactic feature\uac00 \uc0ac\uc6a9\ub428<\/li>\n  <li>\ud558\uc9c0\ub9cc \uc774\ub7f0 feature\ub4e4\uc740 NLP algorithm(tool)\uc744 \uc0ac\uc6a9\ud558\uace0 \uc788\uace0 \uc774 \ub54c\ubb38\uc5d0 error\ub97c \uac16\uac8c \ub428<\/li>\n  <li>\uc774\ub7f0 error\ub294 \uae34 \ubb38\uc7a5\uc77c\uc218\ub85d \ub354 \uc2ec\uac01\ud55c \ubb38\uc81c\ub97c \ucd08\ub798\ud558\ub294\ub370, \ubd88\ud589\ud558\uac8c\ub3c4 \uc774\ub7f0 \uae34 \ubb38\uc7a5\ub4e4\uc774 \ucf54\ud37c\uc2a4\uc758 \ub300\ubd80\ubd84\uc744 \ucc28\uc9c0\ud558\uace0 \uc788\uc74c<\/li>\n  <li>\ubb38\uc81c\uac00 \uc788\ub294 feature\ub97c \uc0ac\uc6a9\ud558\ub294 distant supervision \ubc29\ubc95\uc740 error\ub97c \uc804\ud30c\uc2dc\ud0a4\uace0 \uc131\ub2a5 \uc800\ud558\uc758 \uc8fc\ubc94\uc774 \ub428<\/li>\n  <li>\uadf8\ub798\uc11c \ucd5c\uadfc\uc5d0\ub294 \uc790\ub3d9\uc73c\ub85c feature\ub97c \ucd94\ucd9c\ud558\ub294 deep nearal network \uae30\ubc18\uc758 \uc5f0\uad6c\uac00 \ub9ce\uc774 \uc9c4\ud589\ub418\uace0 \uc788\uc74c<\/li>\n  <li>\ud2b9\ud788 piecewise convolutional neural network (PCNN)\uc774 \uc88b\uc740 \uacb0\uacfc\ub97c \ub0b4\uace0 distant supervised relation extraction\uc5d0\uc11c \uc0c1\ub2f9\ud55c \ud5a5\uc0c1\uc744 \ubcf4\uc600\uc73c\ub098 \uc544\uc9c1 (\uc544\ub798\uc758) \uba87\uac00\uc9c0 \uacb0\uc810\uc744 \uac00\uc9c0\uace0 \uc788\uc74c<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36642226-b7ebadd0-1a7f-11e8-8b01-a43aab6473d4.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>\uccab\uc9f8, PCNN\uc740 labeled data\ub97c \uc0dd\uc131\ud574\ub0b4\ub294\ub370 expressed-at-least-once assumption\uc744 \uc0ac\uc6a9\ud55c\ub2e4\ub294 \uac83\uc784\n    <ul>\n      <li>expressed-at-least-once assumption\uc774\ub780, \ub450 entity\uac00 \uc5b4\ub5a4 relation\uc744 \uac16\uace0 \uc788\uc744 \ub54c, \uc5b4\ub5a4 \ubb38\uc7a5\uc5d0\uc11c \ub450 entity\uac00 \ub4f1\uc7a5\ud558\uba74 \uc774\ub294 \uadf8 relation\uc744 \uac16\uace0 \uc788\ub2e4\uace0 \ubcf4\ub294 \uac83\uc784<\/li>\n      <li>\uadf8\ub7ec\ub098 \uc774 \uac00\uc815\uc740 \ub108\ubb34\ub098\ub3c4 \uac15\ub825\ud558\uace0, \ud55c \ubb38\uc7a5\uc744 \uc120\ud0dd\ud558\ub294 \uac83\uc740 \ub2e4\ub978 \ubb38\uc7a5\ub4e4\ub85c\ubd80\ud130 \uc5bb\uc744 \uc218 \uc788\ub294 \ub9ce\uc740 \uc815\ubcf4\ub97c \uc783\uc740\ub2e4\uace0 \ubd04<\/li>\n      <li>\uc2e4\uc81c\ub85c knowledge base relation\uc5d0 \ub098\ud0c0\ub09c \ub450 entity\uac00 \uc8fc\uc5b4\uc84c\uc744 \ub54c, \uadf8 relation\uc744 \uc815\ud655\ud788 \ud45c\ud604\ud558\ub294 \ud558\ub098\uc758 \ubb38\uc7a5\uc744 training text\ub85c \ubd80\ud130 \ucc3e\ub294 \uac83\uc740 \uc0c1\ub2f9\ud788 \uc5b4\ub824\uc6c0<\/li>\n      <li>Figure 1\uc744 \ubcf4\uba74 <em>Thailand<\/em> \uc640 <em>Bangkok<\/em> \uc758 entity pair\ub97c \ud3ec\ud568\ud55c \uc138 \ubb38\uc7a5\uc774 \uc788\uc74c<\/li>\n      <li>\uc774 \ubb38\uc7a5\ub4e4 \uc911 \uc5b4\ub290 \uac83 \ud558\ub098 <em>\/location\/country\/capital<\/em> relation\uc744 \ub098\ud0c0\ub0b4\uc9c0\ub294 \uc54a\uc74c<\/li>\n      <li>\ud558\uc9c0\ub9cc \uc774 \uc138 \ubb38\uc7a5\uc744 \uc885\ud569\uc801\uc73c\ub85c \ubd24\uc744 \ub54c\ub294 <em>\/location\/country\/capital<\/em> relation\uc744 \uc720\ucd94\ud574\ubcfc \uc218 \uc788\ub2e4\ub294 \uac83\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ub458\uc9f8, PCNN\uc740 single-label learning problem\uc73c\ub85c distantly supervised RE\ub97c \ub2e4\ub8e8\uace0 \uc788\uc74c\n    <ul>\n      <li>\uc120\ud0dd\ub41c \ub450 entity pair\ub294 \uc5ec\ub7ec \uac1c\uc758 relation\uc744 \uac16\uace0 \uc788\ub354\ub77c\ub3c4 \ubc18\ub4dc\uc2dc \ud558\ub098\uc758 relation label\uc744 \uac16\ub294\ub2e4\uace0 \ubd04<\/li>\n      <li>New York Times 2007 corpus\uc758 \uc57d 18%\uac00 overlapping relation\uc744 \uac16\ub294 \ubb38\uc7a5\uc778 \uac83\uc73c\ub85c \ub098\ud0c0\ub0a8<\/li>\n      <li>\ub530\ub77c\uc11c single-label learning\uc740 \ubb38\uc81c\uac00 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774 \ub17c\ubb38\uc5d0\uc11c \uc6b0\ub9ac\ub294 \uc704\uc758 \ubb38\uc81c\ub4e4\uc744 \ud574\uacb0\ud558\uae30 \uc704\ud574 multi-instance multi-label convolutional neural network (MIMLCNN)\ub97c \uc81c\uc548\ud568<\/li>\n  <li>\uccab \ubc88\uc9f8 \ubb38\uc81c\uc5d0 \ub300\ud574\uc11c\ub294 \uae30\uc874\uc758 expressed-at-least-once assumption \ub300\uc2e0\uc5d0 <em>\u201ca relation holding between two entities can be either expressed explicitly or inferred implicitly from all sentences that mention these two entities\u201d -&gt; \ub450 entity\uc758 relation\uc740 \ubd84\uba85\ud558\uac8c \ud45c\ud604\ub420 \uc218\ub3c4 \uc788\uace0 (\uc774\uac8c \uae30\uc874 \uac00\uc815\uc778\ub4ef), \uc5b8\uae09\ub41c \ubaa8\ub4e0 \ubb38\uc7a5\ub4e4\ub85c\ubd80\ud130 \ud568\ucd95\uc801\uc73c\ub85c \ucd94\ub860\ub420 \uc218\ub3c4 \uc788\ub2e4 (Figure 1\uc758 \ub0b4\uc6a9)<\/em> \ub77c\ub294 \ubcf4\ub2e4 \uc644\ud654\ub41c \uac00\uc815\uc744 \ub458 \uac83\uc784\n    <ul>\n      <li>\uad6c\uccb4\uc801\uc73c\ub85c\ub294 \uc544\ub798\uc758 \uc21c\uc11c\ub300\ub85c \ud558\uba74 \ub428<\/li>\n      <li>\uac01 \ubb38\uc7a5 \ubcc4\ub85c convolution\uc744 \uc2dc\ucf1c\uc11c feature\ub97c \ucd94\ucd9c\ud568<\/li>\n      <li>\uc0c8\ub85c \uc81c\uc548\ud558\ub294 cross-sentence max-pooling\uc774\ub77c\ub294 \uac83\uc744 \ud1b5\ud574 \ub2e4\ub978 \ubb38\uc7a5\uc5d0 \uac78\uccd0 \ub098\ud0c0\ub098\ub294 feature\ub97c \ucd94\ucd9c\ud568<\/li>\n      <li>\uadf8\ub9ac\uace0 most significant feature(max?)\ub97c \ud569\uccd0\uc11c \uac01 entity pair\uc758 vector representation\uc73c\ub85c \ub9cc\ub4ec<\/li>\n      <li>\uc774 vector representation\uc740 \ub2e4\ub978 \ubb38\uc7a5\ub4e4\uc758 feature\ub97c \ud3ec\ud568\ud558\uae30 \ub54c\ubb38\uc5d0 \uac00\uc815\uc5d0\uc11c \ub9d0\ud588\ub358 \uc5ec\ub7ec \ubb38\uc7a5\ub4e4\uc758 \uc815\ubcf4\ub97c \ubaa8\ub450 \uc0ac\uc6a9\ud558\ub294 \uc148\uc774 \ub428<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ub450 \ubc88\uc9f8 \ubb38\uc81c\uc5d0 \ub300\ud574\uc11c\ub294 \ub2e4\uc591\ud55c multi-label loss function\uc744 \ub9cc\ub4e4\uc5b4\uc11c overlapping relation\uc744 \ucc98\ub9ac\ud560 \uc218 \uc787\ub3c4\ub85d \ud568\n    <ul>\n      <li>\uc544\ub798\uc758 section 3\uc5d0 \ub098\uc624\ub294 Figure 2\ub97c \ucc38\uace0\ud558\uba74 \uc804\uccb4\uc801\uc778 \uad6c\uc870\ub97c \uc54c \uc218 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc6b0\ub9ac \ub17c\ubb38\uc758 \uba54\uc778 contribution\uc740 \uc544\ub798\uc640 \uac19\uc74c\n    <ol>\n      <li>expressed-at-least-once assumption\uc744 \uc644\ud654\uc2dc\ucf30\uace0 \uc5ec\ub7ec \ubb38\uc7a5\ub4e4\uc774 \uc11c\ub85c \uc815\ubcf4\ub97c \uacf5\uc720\ud560 \uc218 \uc788\uac8c \ud558\ub294 \ub354\uc6b1 \ud604\uc2e4\uc801\uc778 \uac00\uc815\uc744 \uc81c\uc548\ud568<\/li>\n      <li>multi-label\uc744 \ub2e4\ub8f0 \uc218 \uc788\ub294 multi-instance multi-label convolutinoal neural nerwork (MIMLCNN)\uc744 \uc81c\uc548\ud568<\/li>\n      <li>\uc6b0\ub9ac\ub294 real-world dataset\uc73c\ub85c \uc6b0\ub9ac\uc758 approach\ub97c \ud3c9\uac00\ud588\uc73c\uba70, state-of-the-art \ubcf4\ub2e4 \uc0c1\ub2f9\ud55c \ud5a5\uc0c1\uc744 \ubcf4\uc600\uc74c<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"3-our-approach\">3. Our Approach<\/h2>\n\n<ul>\n  <li>\uacf5\ud1b5\uc758 entity pair(e1, e1)\ub97c \uac16\ub294 sentence\uac00 input\uc73c\ub85c \ub4e4\uc5b4\uac00\uace0, knowledge base\uc5d0\uc11c \uc815\uc758\ub41c relation(class)\uac00 output\uc73c\ub85c \ub098\uc624\uac8c \ub428. Figure 2 \ucc38\uace0<\/li>\n  <li>\uc6b0\ub9ac\uc758 approach\ub294 3\uac00\uc9c0 key step\uc744 \uac16\ub294\ub370, (1) sentence-level feature extraction, (2) cross-sentence max-pooling, (3) multi-label relation modeling \uc784. Figure 2 \ucc38\uace0<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36655746-ebb035cc-1b07-11e8-87fb-798eee7f5a22.png\" alt=\"figure2\" \/><\/p>\n\n<h3 id=\"31-sentence-level-feature-extraction\">3.1 Sentence-level Feature Extraction<\/h3>\n\n<ul>\n  <li>sentence-level feature extraction\uc740 \ubb38\uc7a5\uc73c\ub85c\ubd80\ud130 vector feature\ub97c \ub9cc\ub4e4\uc5b4\ub0b4\ub294 \ub2e8\uacc4\uc784<\/li>\n  <li>\uae30\uc874\uc5d0 \uc81c\uc548\ub418\uc5c8\ub358 text cnn, cnn for RE, piecewise max pooling \ub4f1\uc744 \uadf8\ub300\ub85c \uc0ac\uc6a9\ud558\uace0 \uc788\uc74c<\/li>\n  <li>\ucd5c\uc885\uc801\uc73c\ub85c sentence-level feature extraction \uacb0\uacfc\ub97c <strong>sentence representation (vector)<\/strong>\uc774\ub77c\uace0 \ud568<\/li>\n  <li>Figure 3 \ucc38\uace0<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36655747-ebd89f6c-1b07-11e8-9246-5d6617c33a52.png\" alt=\"figure3\" \/><\/p>\n\n<h3 id=\"32-cross-sentence-max-pooling\">3.2 Cross-sentence Max-pooling<\/h3>\n\n<ul>\n  <li>\ub450 entity\uc758 relation\uc744 \ud55c \ubb38\uc7a5\uc774 \uc544\ub2cc \uc5ec\ub7ec \uac1c\uc758 \ubb38\uc7a5\uc758 \uc815\ubcf4\ub97c \uc774\uc6a9\ud574\uc11c \uc608\uce21\ud558\uaca0\ub2e4\ub294 \uac83\uc774 \uc774 \ub17c\ubb38\uc5d0\uc11c \uc8fc\uc7a5\ud558\ub294 \ubc14\uc784<\/li>\n  <li>\uadf8\ub798\uc11c \uc544\ub798\uc640 \uac19\uc740 \uac00\uc815\uc744 \ub460. \uc774\ub294 \uc774\uc804 PCNN\uc5d0\uc11c \uc0ac\uc6a9\ud55c \uac00\uc815\ubcf4\ub2e4 \uc644\ud654\ub41c \uac83\uc784<\/li>\n  <li>Assumption: <em>A relation holding between two entities can be either expressed explicitly or inferred implicitly from all sentences that mention these two entities.<\/em>\n    <ul>\n      <li>\uc774 \uac00\uc815\uc758 \ubcf8\uc9c8\uc5d0 \uc758\ud558\uba74, \uc6b0\ub9ac\ub294 sentence-level relation extraction\uc744 \uac74\ub108 \ub6f0\uace0 entity-pair-level\uc5d0\uc11c prediction\uc744 \uc9c1\uc811\uc801\uc73c\ub85c \ud558\uac8c \ub428<\/li>\n      <li>\uc774\ub294 \ub354\uc6b1 downstream apllication\uc744 \uc5fc\ub824\ud558\uace0 \ub354 \uadfc\uac70\ub97c \ud1b5\ud569\ud558\ub294\ub370 \uc774\ub4dd\uc784<\/li>\n      <li>\ubb54 \uc18c\ub9b0\uc9c0\ub97c \ubaa8\ub974\uaca0\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc6b0\ub9ac\ub294 \uc774 \uac00\uc815\uc758 \uc7a5\uc810\uc744 \uac00\uc9c0\ub294 cross-sentence max-pooling\uc774\ub77c\ub294 \ubc29\ubc95\uc744 \uc81c\uc548\ud558\ub294 \ubc14\uc784\n    <ul>\n      <li>entity pair\uac00 \uc5b8\uae09\ub41c m\uac1c\uc758 \ubb38\uc7a5\uc774 \uc788\ub2e4\uace0 \ud558\uc790<\/li>\n      <li>\uac01 \ubb38\uc7a5\uc740 \uc55e\uc758 sentence-level feature extraction \uacfc\uc815\uc744 \uac70\uccd0\uc11c sentence representation (vector)\ub85c \ubcc0\ud658\ub428<\/li>\n      <li>\uc989, sentence representation vector\uac00 m\uac1c\uac00 \uc0dd\uae30\uac8c \ub418\ub294 \uac83\uc774\uace0, \uc774 m\uac1c\uc758 \ubca1\ud130\ub4e4\uc758 \uac01 \uc6d0\uc18c\uc5d0 \ub300\ud574\uc11c \ucd5c\ub300\uac12\uc744 \ucd94\ucd9c\ud558\uc5ec \ud558\ub098\uc758 \ubca1\ud130\ub85c pooling\ud558\ub294 \uac83\uc774 <strong>cross-sentence max-pooling<\/strong>\uc784. \uac01 \uc6d0\uc18c \ubcc4\ub85c m\uac1c \uc911\uc5d0 \ucd5c\ub300\ub97c \ubf51\uc544\ub0b4\ub294 \uac83\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc774 \uc791\uc5c5\uc744 \ud558\uba74 \uba87 \uac00\uc9c0 \uc774\uc775\uc774 \uc788\uc74c<\/li>\n<\/ul>\n\n<ol>\n  <li>\uac01 \ubb38\uc7a5\uc73c\ub85c\ubd80\ud130 feature\ub97c \ud1b5\ud569\ud558\uac8c \ub418\uace0 entity-pair-level relation extraction\uc744 \uc9c0\uc6d0(support)\ud568<\/li>\n  <li>\ub2e4\ub978 \ubb38\uc7a5\ub4e4\ub85c\ubd80\ud130 relation \uc608\uce21\uc5d0 \ub300\ud55c \uadfc\uac70\ub97c \ubaa8\uc744 \uc218 \uc788\uc74c<\/li>\n  <li>Zeng et al. (2015)\uc774 \ud558\ub098\uc758 \ubb38\uc7a5\uc529 \ud559\uc2b5\ud55c \uac83\uc5d0 \ube44\ud574 \uc6b0\ub9ac\ub294 \uac00\ub2a5\ud55c \ubaa8\ub4e0 \ubb38\uc7a5\uc73c\ub85c\ubd80\ud130\uc758 \uc815\ubcf4\ub97c \ubaa8\ub450 \uc0ac\uc6a9\ud558\uc5ec \ud559\uc2b5\ud55c\ub2e4\ub294 \uc7a5\uc810\uc774 \uc788\uc74c<\/li>\n<\/ol>\n\n<ul>\n  <li>mean-pooling \uac19\uc740 \ubc29\ubc95\uc744 \uc4f0\uc9c0 \uc54a\uace0 max-pooling\uc744 \uc0ac\uc6a9\ud55c \uc774\uc720\ub294 \ub2e4\uc74c\uacfc \uac19\uc74c\n    <ul>\n      <li>entity-pair-level relation extraction\uc5d0 \uc788\uc5b4\uc11c \uc5ec\ub7ec\ubc88 \ub4f1\uc7a5\ud55c feature\ub294 \ub354 \ub9ce\uc740 \ucd94\uac00 \uc815\ubcf4\ub97c \uc8fc\uc9c0 \uc54a\ub294\ub2e4\uace0 \uc0dd\uac01\ud568<\/li>\n      <li>\uc989, \ud55c \ubc88\uc529\ub9cc \ub098\ud0c0\ub09c \uad6c\ubd84 \uac00\ub2a5\ud55c(\uc11c\ub85c \ub2e4\ub978) signal\ub9cc\uc73c\ub85c\ub3c4 relation extraction\uc5d0 \uc788\uc5b4\uc11c \ucda9\ubd84\ud568<\/li>\n      <li>\uac01 feature\uc758 maximum activation level\ub9cc\uc744 \uc5ec\ub7ec \ubb38\uc7a5\uc5d0 \uac78\uccd0\uc11c \ud558\ub098\uc529 \ubf51\ub294 cross-sentence max-pooling \ubc29\ubc95\uc774 \ubc14\ub85c \uc774\ub7f0 \uc0dd\uac01\uc744 \uad6c\ud604\ud55c \uac83\uc784<\/li>\n      <li>\ubc18\uba74, mean-pooling \uac19\uc740 \uacbd\uc6b0\uc5d0\ub294 \uc5ec\ub7ec\ubc88 \uc5b8\uae09\ub41c entity-pair\uc5d0 \ub300\ud574\uc11c predictive feature\uac00 \ud76c\uc11d\ub420 \uc218 \uc788\uc74c<\/li>\n      <li>\uc704\uc758 \uc8fc\uc7a5\uc740 \ub4a4\uc5d0 \uc2e4\ud5d8\uc744 \ubcf4\uba74 \ub354\uc6b1 \uc798 \uc54c \uc218 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"33-multi-label-relation-modeling\">3.3 Multi-label Relation Modeling<\/h3>\n\n<ul>\n  <li>\uae30\uc874\uc758 multi-instance learning\uc744 \uc801\uc6a9\ud55c neural network \ubc29\ubc95\ub4e4\uc740 \uc5b4\ub5a4 entity pair\uac00 \uc5ec\ub7ec \uac1c\uc758 relation\uc744 \uac16\uace0 \uc788\uc5b4\ub3c4 single label\ub85c \ud559\uc2b5\ud568<\/li>\n  <li>\uc6b0\uc120 pooling\uc744 \uac70\uccd0\uc11c \ub098\uc628 \ubca1\ud130\uc5d0 FC\ub97c \ubd99\uc774\uace0 \uadf8 \uacb0\uacfc\uac12\uc5d0 sigmoid\ub97c \ucde8\ud568<\/li>\n  <li>binary label vector\uc778 y\ub294 relation\uc774 \uc788\uc73c\uba74 1, \uc5c6\uc73c\uba74 0\uc73c\ub85c \ud45c\uae30 \ub418\uace0 \ubcf5\uc218\uc758 relation\uc774 \uc788\uc73c\uba74 1\uc774 \uc5ec\ub7ec \uac1c\uc778 vector\uac00 \ub418\ub294 \uac83\uc784<\/li>\n  <li>\uc774 \ubc29\ubc95\uc73c\ub85c \ud558\uba74 \uc544\ubb34 relation\uc774 \uc5c6\ub294 NA \ucf00\uc774\uc2a4\uc5d0 \ub300\ud574\uc11c all-zero vector\ub97c \uc0ac\uc6a9\ud558\uba74 \ub418\uae30\uc5d0 \ud45c\ud604\uc774 \uc790\uc5f0\uc2a4\ub7ec\uc6cc\uc9d0. \uae30\uc874\uc5d0\ub294 NA \ud074\ub798\uc2a4\uc5d0 \ud574\ub2f9\ud558\ub294 index \ud558\ub098\ub97c \ub354 \ub9cc\ub4e4\uc5b4\uc11c one-hot vector\ub85c \uad6c\uc131\ud568<\/li>\n  <li>relation \uac04\uc5d0 dependency\uac00 \uac78\ub9ac\ub294 \uacbd\uc6b0\ub3c4 \uc788\uc74c<\/li>\n  <li>(A, capital, B)\uc640 (A, contains, B)\ub294 \uac70\uc758 \ubd99\uc5b4\ub2e4\ub2d0 \uac83\uc778\ub370, \uc6b0\ub9ac\uc758 \ubaa8\ub378\uc740 \ubaa8\ub4e0 relation label\uc5d0 \ub300\ud574\uc11c shared entity-pair-level representation\uc744 \uc0ac\uc6a9\ud558\ubbc0\ub85c \ucc98\ub9ac\ud560 \uc218 \uc788\uc74c<\/li>\n  <li>multi-label modeling\uc744 \uc704\ud574\uc11c \uc544\ub798\uc758 \ub450 loss function\uc744 \ub9cc\ub4ec<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36669255-9866812a-1b37-11e8-9d3a-84c042316502.png\" alt=\"formula\" \/><\/p>\n\n<ul>\n  <li>\uc704\uc758 \ub450 loss function\uc744 \uc0ac\uc6a9\ud558\uace0 \uc774\uc5d0 \ub300\ud55c \uc2e4\ud5d8\uc744 \ub4a4\uc5d0 \ud560 \uac83\uc784<\/li>\n  <li>\uc6b0\ub9ac\uc758 \ubaa8\ub378\uc740 end-to-end\ub85c \ud559\uc2b5\ud558\uace0 Adadelta\ub97c optimizer\ub85c \uc0ac\uc6a9\ud558\uba70 dropout\uc774 \uc801\uc6a9\ub418\uc5b4 \uc788\uc74c<\/li>\n  <li>\ucd5c\uc885\uc801\uc73c\ub85c prediction vector\uc5d0\uc11c \uac01 \uc6d0\uc18c \uac12(\ud655\ub960)\uc774 0.5 \ucd08\uacfc\ud560 \uacbd\uc6b0, \ud574\ub2f9 label\uc744 1\ub85c \ucc98\ub9ac\ud558\uc5ec output\uc744 \ub0b4\ubcf4\ub0c4<\/li>\n<\/ul>\n\n<h2 id=\"4-experiments\">4. Experiments<\/h2>\n\n<ul>\n  <li>NYT10\uc744 \uc774\uc6a9\ud574\uc11c \ud3c9\uac00\ub97c \uc9c4\ud589<\/li>\n  <li>precision-recall curve\uc640 P@N metric(\ud655\uc2e4\ud55c \uc21c\uc73c\ub85c \uc0c1\uc704 N\uac1c\uc5d0 \ub300\ud55c precision)\uc744 \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h2 id=\"5-conclusion\">5. Conclusion<\/h2>\n\n<ul>\n  <li>\uc6b0\ub9ac\ub294 Distant supervision with multi-instance multi-label learning\uc744 \uc81c\uc548\ud568<\/li>\n  <li>expressed-at-least-once assumption\uc744 \uc644\ud654\uc2dc\ud0a4\uace0, \uc5ec\ub7ec \ubb38\uc7a5\uc5d0 \uac78\uccd0 \ub098\ud0c0\ub098\ub294 \uc815\ubcf4\ub97c cross-sentence max-pooling\uc744 \ud1b5\ud574 \ubaa8\ub450 \uc0ac\uc6a9\ud560 \uc218 \uc788\ub3c4\ub85d \ud558\uc600\uc73c\uba70, multiple relation\uc744 \uac16\ub294 entity pair\uc5d0 \ub300\ud55c modeling\uc744 \ud574\ubd04<\/li>\n  <li>future work\ub85c\ub294 loss function\uc774 \uc131\ub2a5\uc5d0 \ubbf8\uce58\ub294 \uc601\ud5a5\uc5d0 \ub300\ud55c \ubd84\uc11d\uc774\ub098 human evaluation\uc744 \ud574\ubcf4\uba74\uc11c \uc2e4\ud5d8\uc744 \ubcf4\ub2e4 \ud48d\ubd80\ud558\uac8c \ub9cc\ub4e4\uc5b4\ubcfc \uc218 \uc788\uc744 \uac83 \ub4f1\uc774 \uc788\uc74c<\/li>\n<\/ul>\n","pubDate":"Tue, 27 Feb 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/relation-extraction-multi-instance-multi-label-cnn\/","guid":"https:\/\/roomylee.github.io\/relation-extraction-multi-instance-multi-label-cnn\/","category":["relation-extraction","multi-instance","multi-label","convolutional-neural-network","cnn","blog"]},{"title":"Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks (EMNLP 2015)","description":"<ul>\n  <li>Paper Link: <a href=\"http:\/\/www.emnlp2015.org\/proceedings\/EMNLP\/pdf\/EMNLP203.pdf\">http:\/\/www.emnlp2015.org\/proceedings\/EMNLP\/pdf\/EMNLP203.pdf<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Daojian Zeng (Chinese Academy of Sciences)<\/li>\n      <li>Kang Liu (Chinese Academy of Sciences)<\/li>\n      <li>Yubo Chen (Chinese Academy of Sciences)<\/li>\n      <li>Jun Zhao (Chinese Academy of Sciences)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>EMNLP 2015<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>Distant supervision\uc5d0\ub294 2\uac00\uc9c0 \ubb38\uc81c\uc810\uc774 \uc788\uc74c\n    <ul>\n      <li>\uccab \ubc88\uc9f8 \ubb38\uc81c\uc810\uc740 distant supervision\uc5d0\uc11c \uc774\ubbf8 \uc874\uc7ac\ud558\ub294 knowledge base\ub97c \uc774\uc6a9\ud558\ub294\ub370 \uc788\uc74c<\/li>\n      <li>knowledge base\ub294 \ud734\ub9ac\uc2a4\ud2f1\ud558\uac8c \ub9cc\ub4e4\uc5b4\uc9c0\uace0 \uc774\ub97c \uc774\uc6a9\ud574\uc11c labeled data\ub97c \ub9cc\ub4e4\uc5b4\uc11c \ud559\uc2b5\uc744 \ud558\ub294\ub370, \ud734\ub9ac\uc2a4\ud2f1\uc740 \uc2e4\ud328\ud560 \uc218 \uc788\uace0 \uadf8\ub807\uac8c \ub9cc\ub4e0 knowledge base\uc640 \uadf8 \uacb0\uacfc\ubb3c\uc778 labeled data \uc5ed\uc2dc \ubb38\uc81c(wrong label problem)\uac00 \uc788\uc744 \uc218 \uc788\ub2e4\ub294 \uac83\uc784<\/li>\n      <li>\ub450 \ubc88\uc9f8\ub294 \uc774\uc804 \uc5f0\uad6c\uc758 \ud655\ub960\uc801 \ubaa8\ub378\uc740 \ub300\uac1c ad hoc feature\ub97c \uc774\uc6a9\ud558\ub294\ub370 \uc774 feature\ub97c \ucd94\ucd9c\ud558\ub294 \uacfc\uc815 \uc790\uccb4\uc5d0 noise\uac00 \uc788\uace0 \uc774 noise\uac00 \uc131\ub2a5 \uc800\ud558\ub97c \uac00\uc838\uc628\ub2e4\ub294 \uac83\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc6b0\ub9ac\ub294 \uc704\uc758 2\uac00\uc9c0 \ubb38\uc81c\uc810\uc744 \ub2e4\ub8e8\uae30 \uc704\ud574, multi-instance learning\uc744 \uc774\uc6a9\ud55c Piecewise Convolutional Neural Networks (PCNNs) \ub77c\ub294 \uc0c8\ub85c\uc6b4 \ubaa8\ub378\uc744 \uc81c\uc548\ud558\ub294 \uac70\uc784\n    <ul>\n      <li>\uccab \ubc88\uc9f8 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574, distant supervised relation extraction\uc744 label\uc758 \ubd88\ud655\uc2e4\uc131\uc744 \uace0\ub824\ud55c multi-instance \ubb38\uc81c\ub85c \ubcf4\uace0 \ucc98\ub9ac\ud568<\/li>\n      <li>\ub450 \ubc88\uc9f8 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574\uc11c\ub294 feature engineering\uc744 \ud558\uc9c0 \uc54a\ub294 \ub300\uc2e0, piecewise max pooling\uc744 \uc801\uc6a9\ud55c convolutional architecture\ub97c \uc0ac\uc6a9\ud574\uc11c \uad00\ub828\uc788\ub294 feature\ub97c \uc790\ub3d9\uc73c\ub85c \ud559\uc2b5\uc2dc\ud0b4<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uba87\uba87 \ub2e4\ub978 \ubc29\ubc95\ub4e4\ubcf4\ub2e4 \ud6a8\uc728\uc801\uc774\uace0 \uc88b\uc740 \uc131\ub2a5\uc744 \ubcf4\uc784<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Relation extraction\uc5d0\uc11c \ud558\ub098\uc758 \ub3c4\uc804 \uacfc\uc81c\ub294 training examples\uc744 \ub9cc\ub4dc\ub294 \uac83\uc784<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36376357-2a9912f6-15b6-11e8-9830-b8e9c1b57033.png\" alt=\"figure1\" \/><\/p>\n\n<ul>\n  <li>Distant supervision\uc774 \uc774\ub97c \ud574\uacb0\ud558\ub294 \ud558\ub098\uc758 \ubc29\ubc95\uc774 \ub420 \uc218 \uc788\uc74c<\/li>\n  <li>Distant supervision\uc740 \uc5b4\ub5a4 \ub450 entity\uac00 \uc774\ubbf8 \uc54c\ub824\uc9c4 knowledge base\uc5d0\uc11c relation\uc744 \uac00\uc9c0\uace0 \uc788\uc744 \ub54c, \uc774 \ub450 entity\uac00 \ub4f1\uc7a5\ud55c \ubaa8\ub4e0 \ubb38\uc7a5\uc5d0 \ub300\ud574\uc11c \ub3d9\uc77c\ud55c relation\uc744 \uac00\uc9c4\ub2e4\uace0 \uac00\uc815\ud558\uace0 data\ub97c \ub9cc\ub4e4\uc5b4\ub0b4\ub294 \uac83\uc784 (Figure 1 \ucc38\uace0)<\/li>\n  <li>\ud558\uc9c0\ub9cc \uc774 \ubc29\ubc95\uc740 2\uac00\uc9c0 \uc911\uc694\ud55c \uacb0\uc810\uc744 \uac00\uc9c0\uace0 \uc788\uc74c<\/li>\n  <li>\uccab \ubc88\uc9f8\ub294, distant supervision\uc758 \uac00\uc815\uc774 \ub108\ubb34 \uac15\ub825\ud558\uace0 \uc798\ubabb\ub41c label\uc744 \ub9cc\ub4e4\uc5b4\ub0b8\ub2e4\ub294 \uac83\uc784\n    <ul>\n      <li>\uc989, \ub450 entity\uac00 \uc5b8\uae09\ub41c \ubb38\uc7a5\uc774 \ubc18\ub4dc\uc2dc knowledge base\uc5d0\uc11c \ub098\ud0c0\ub0b4\ub294 relation\uc744 \ud3ec\ud568\ud55c\ub2e4\uace0 \ubcfc \uc218 \uc5c6\ub2e4\ub294 \uac83\uc784<\/li>\n      <li>Figure 1\uc758 2 \ubc88\uc9f8 \ubb38\uc7a5\uc5d0\uc11c \ub450 entitiy\ub294 \u201c\/copany\/founders\u201d\uc758 relation\uc744 \uac16\ub294\ub2e4\uace0 \ubcf4\uae30 \uc5b4\ub835\uace0 \uc774\ub7f0 noisy data \ub54c\ubb38\uc5d0 \uc131\ub2a5 \uc800\ud558\uac00 \uc77c\uc5b4\ub098\ub294 \uac83\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ub450 \ubc88\uc9f8\ub294, distant supervision\uc73c\ub85c \ub370\uc774\ud130\ub97c \uc5bb\uc744 \ub54c \uc815\uad50\ud558\uac8c \ub514\uc790\uc778 \ub41c feature\ub97c \uac00\uc9c0\uace0 model\uc5d0 \uc801\uc6a9\uc2dc\ud0a8\ub2e4\ub294 \uac83\uc784\n    <ul>\n      <li>\ubb38\uc81c\uc758 \ub300\ud45c\uc801 \uc6d0\uc778\uc740 \uc774\ubbf8 \uc874\uc7ac\ud558\ub294 NLP tool\uc744 \uc0ac\uc6a9\ud558\ub294\ub370 \uc788\uc74c<\/li>\n      <li>\ubd88\uac00\ud53c\ud558\uc9c0\ub9cc NLP tool\uc744 \uc0ac\uc6a9\ud558\uba74\uc11c \uadf8\uc5d0 \ub0b4\uc81c\ub41c error\uac00 feature\uc5d0 \uc804\ub2ec\uc774 \ub418\ub294 \uc148\uc784<\/li>\n      <li>\uc2ec\uc9c0\uc5b4\ub294 \ubb38\uc7a5\uc758 \uae38\uc774\uac00 \uae38\uc5b4\uc9c8\uc218\ub85d \ub354\ub354\uc6b1 \uc131\ub2a5\uc774 \ub5a8\uc5b4\uc9c0\uae30\uc5d0 \ub2e8\uc21c\ud788 \uc5d0\ub7ec\uac00 \uc874\uc7ac\ud558\ub294 \uac83\uc774 \uc544\ub2c8\uace0 \uc810\uc810 \ub354 \uc2ec\uac01\ud574\uc9c4\ub2e4\uace0 \ubd10\uc57c \ud568. \uc65c\ub0d0\ud558\uba74 \uc131\ub2a5\uc774 \ub5a8\uc5b4\uc9c0\ub294 \uae34 \ubb38\uc7a5\uc774 \ucf54\ud37c\uc2a4\uc758 \uc808\ubc18 \uc815\ub3c4\ub97c \ucc28\uc9c0\ud558\uae30 \ub54c\ubb38\uc784<\/li>\n    <\/ul>\n  <\/li>\n  <li>\uc704\uc758 \ub450 \ubb38\uc81c\uc810\uc744 \ud574\uacb0\ud558\uae30 \uc704\ud574 multi-instance learning\uc744 \uc774\uc6a9\ud55c PCNNs \ubaa8\ub378\uc744 \uc81c\uc548\ud558\ub294 \ubc14\uc784<\/li>\n  <li>\uccab \ubc88\uc9f8 \ubb38\uc81c(wrong label)\ub97c multi-instance learning\uc73c\ub85c \ud574\uacb0\ud560 \uac70\uc784<\/li>\n  <li>multi-instance learning\n    <ul>\n      <li>training set\uc744 \ub9ce\uc740 bag\uc73c\ub85c \uad6c\uc131\ud558\uace0 \uac01 bag\uc5d0\ub294 \ub9ce\uc740 instance\uac00 \ub4e4\uc5b4\uc788\ub2e4\uace0 \ubd04<\/li>\n      <li>bag\uc758 label\uc740 \uc54c\uace0 \uc788\uc9c0\ub9cc(known), bag\ub97c \uad6c\uc131\ud558\ub294 \uac01 instance\uc758 label\uc740 \ubaa8\ub984(unknown)<\/li>\n      <li>\uc6b0\ub9ac\ub294 bag-level\uc758 objective function\uc744 \ud559\uc2b5\uc2dc\ud0ac \uac83\uc774\uace0, \uc774\ub807\uac8c \ud558\uba74 instance label\uc758 \ubd88\ud655\uc2e4\uc131\uc744 \uace0\ub824\ud560 \uc218 \uc788\uace0 \uc774\ub294 wrong label problem\uc744 \uc644\ud654\uc2dc\ud0ac \uc218 \uc788\uc74c<\/li>\n    <\/ul>\n  <\/li>\n  <li>\ub450 \ubc88\uc9f8 \ubb38\uc81c(NLP tool)\ub294 \ubcf5\uc7a1\ud55c NLP \uc804\ucc98\ub9ac \uacfc\uc815\uc744 \uc5c6\uc774 convolutional architecture\uac00 \uc790\ub3d9\uc73c\ub85c \uad00\ub828\ub41c feature\ub97c \ud559\uc2b5\ud558\ub3c4\ub85d \ud558\uc5ec \ud574\uacb0\ud568. \uc774\ub294 Zeng et al. (2014)\uc758 \ub17c\ubb38\uc744 \ubcf4\uba74 \ub428<\/li>\n  <li>\uc6b0\ub9ac\uc758 \uc81c\uc548\uc740 single max pooling\uc744 \uc0ac\uc6a9\ud55c Zeng et al.\uc758 \uc81c\uc548\uc5d0 \ub300\ud55c \ud655\uc7a5\ud310\uc774\ub77c\uace0 \ubcf4\uba74 \ub428<\/li>\n  <li>\uc6b0\ub9ac\ub294 single max pooling\uc774 \uc544\ub2c8\ub77c, \ubb38\uc7a5 \uad6c\uc870\uc801\uc778 feature\ub3c4 \uc7a1\uc544\ub0b4\uae30 \uc704\ud574 \uc8fc\uc5b4\uc9c4 \ub450 entity\uc5d0 \uc758\ud574\uc11c \ub9cc\ub4e4\uc5b4\uc9c0\ub294 3\uac1c\uc758 segment\uc5d0 \ub300\ud574 piecewise max pooling\uc774\ub77c\ub294 \uac83\uc744 \ud560 \uac83\uc784<\/li>\n  <li>\uc6b0\ub9ac paper\uc758 contribution\uc744 \uc694\uc57d\ud558\uba74 \uc544\ub798\uc640 \uac19\uc74c\n    <ul>\n      <li>hand-designed feature \uc5c6\uc774 distant supervised relation extraction\uc744 \uc704\ud55c \ubc29\ubc95\uc744 \ud0d0\uad6c\ud558\uc600\uace0, \ubcf5\uc7a1\ud55c NLP \uc804\ucc98\ub9ac \uc5c6\uc774 feature\ub97c \ud559\uc2b5\ud560 \uc218 \uc788\ub294 PCNNs\uc744 \uc81c\uc548\ud568<\/li>\n      <li>Wrong label problem\uc744 \ud574\uacb0\ud558\uae30 \uc704\ud574 multi-instance learning\uc73c\ub85c PCNNs\uc744 \ud559\uc2b5\uc2dc\ucf1c distant supervised relation extraction\uc744 \ud558\ub294 \ud574\uacb0\ucc45\uc744 \uc81c\uc2dc\ud568<\/li>\n      <li>\ubb38\uc7a5\uc5d0\uc11c \ub450 entity \uac04\uc758 \uad6c\uc870\uc801 \uc815\ubcf4(structure information)\ub97c \uc7a1\uc544\ub0b4\uae30 \uc704\ud574, piecewise max pooling\uc774\ub77c\ub294 \ubc29\ubc95\uc744 \uc81c\uc548\ud568<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"3-methodology\">3. Methodology<\/h2>\n\n<ul>\n  <li><em>3.1 Vector Representation<\/em>, <em>3.2 Convolution<\/em>, <em>3.4 Softmax Output<\/em> \uc740 skip\u2026<\/li>\n<\/ul>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36379736-111bd004-15c3-11e8-9b8b-9424dabb2d86.png\" alt=\"network\" \/><\/p>\n\n<h3 id=\"33-piecewise-max-pooling\">3.3 Piecewise Max Pooling<\/h3>\n\n<ul>\n  <li>single max pooling\uc740 relation extraction\uc744 \ud558\uae30\uc5d0 \ubd88\ucda9\ubd84\ud568<\/li>\n  <li>hidden layer\uc758 size\uac00 \uae09\uaca9\ud558\uace0 \uac70\uce60\uac8c \uc904\uc5b4\ub4e4\uc5b4\uc11c \uace0\uc6b4 feature\ub97c \uc5bb\uae30 \uc5b4\ub824\uc6c0 <br \/>\u27a4 \uc804\uccb4 \ubb38\uc7a5\uc744 \ud55c\uc21c\uac04\uc5d0 \ud558\ub098\uc758 \uac12\uc73c\ub85c \ubb49\ub6b1\uadf8\ub9ac\uae30 \ub54c\ubb38\uc5d0 feature\uac00 \ubb49\uac1c\uc9c4\ub2e4\ub294 \uc758\ubbf8\uc778\ub4ef<\/li>\n  <li>\uadf8\ub9ac\uace0 \ub450 entity\uc5d0 \ub300\ud55c structural information\uc744 \uc7a1\uc544\ub0b4\uae30 \uc5b4\ub835\ub2e4\ub294 \ub2e8\uc810\uc774 \uc788\uc74c<\/li>\n  <li>piecewise max pooling\uc740 \uc704\uc758 \ub2e8\uc810\uc744 \ubcf4\uc644\ud558\ub294 \ubc29\ubc95\uc73c\ub85c\uc11c, \ub450 entity\ub97c \uae30\uc900\uc73c\ub85c \ubb38\uc7a5\uc744 3\uac1c\uc758 segment\ub85c \ub098\ub208 \ub4a4 \uac01 segment \ubcc4\ub85c max pooling\uc744 \ud574\uc11c 3\ucc28\uc6d0\uc758 \ubca1\ud130\ub97c \uc5bb\uc744 \uc218 \uc788\uc74c<\/li>\n  <li>\uc774\ub807\uac8c \ucd94\ucd9c\ub41c 3\ucc28\uc6d0 \ubca1\ud130\ub4e4\uc744 \ucb49 \uc774\uc5b4 \ubd99\uc774\uace0(concatenate) non-linear activation function\uc744 \uac70\uccd0 \ub2e4\uc74c layer\uc778 softmax output layer\ub85c \uac12\uc744 \ubcf4\ub0c4. \uc5ec\uae30\uc11c\ub294 tanh\ub97c \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h3 id=\"35-multi-instance-learning\">3.5 Multi-instance Learning<\/h3>\n\n<ul>\n  <li>wrong label problem\uc744 \ud574\uacb0\ud558\uae30 \uc704\ud574\uc11c multi-instance learning for PCNNs\uc744 \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc544\ub798\uc758 \uc124\uba85\uc744 \ud2c0\ub9b4 \uc218 \uc788\uc74c.<\/li>\n  <li>bag\ub77c\ub294 \uac1c\ub150\uc774 \ub4f1\uc7a5\ud558\ub294\ub370, \uc774\ub294 instance\ub4e4\uc758 \uc9d1\ud569\uc774\ub77c\uace0 \ubcf4\uba74 \ub428<\/li>\n  <li>bag\ub294 \ucd1d T\uac1c\uac00 \uc788\uace0 target label\uc758 \uac1c\uc218\uc640 \uac19\uc740 \uac83\uc73c\ub85c \ubcf4\uc784<\/li>\n  <li>training step, \ud558\ub098\uc758 batch\uc5d0\uc11c \ubaa8\ub4e0 instance\ub294 T\uac1c\uc758 bag\uc5d0 random\ud558\uac8c \ud560\ub2f9\uc774 \ub428<\/li>\n  <li>batch\ub97c \uad6c\uc131\ud558\ub294 \ubaa8\ub4e0 instance\ub4e4\uc740 network\ub97c \uac70\uccd0\uc11c output(probability by softmax)\uc744 \uad6c\ud568<\/li>\n  <li>\uac01 i\ubc88\uc9f8 bag\uc5d0\uc11c i\ubc88\uc9f8 label\uc758 output(\ud655\ub960)\uc774 \uac00\uc7a5 \ud070 instance\ub97c \ud574\ub2f9 bag\uc758 \ub300\ud45c\uac12\uc73c\ub85c \ud568<\/li>\n  <li>\ubaa8\ub4e0 bag\uc758 \ub300\ud45c\uac12(output=\ud655\ub960)\uc5d0 \ub300\ud574 cross-entropy\ub97c \uad6c\ud574\uc11c \ub124\ud2b8\uc6cc\ud06c parameter\ub97c \uc5c5\ub370\uc774\ud2b8 \ud568<\/li>\n  <li>\uc774\ub807\uac8c \ud558\uba74 \ud655\uc2e4\ud55c instance\uc640 relation\uc5d0 \ub300\ud574\uc11c\ub9cc \ud559\uc2b5\uc744 \uc9c4\ud589\ud558\uac8c \ub418\uc5b4 wrong label problem\uc744 \uc644\ud654\uc2dc\ud0ac \uc218 \uc788\ub2e4\uace0 \ud568<\/li>\n<\/ul>\n\n<h2 id=\"4-experiments\">4. Experiments<\/h2>\n\n<ul>\n  <li>Dataset\n    <ul>\n      <li>Riedel et al. (2010) \uc5d0\uc11c \ub9cc\ub4e0 \uac83\uc744 \uc0ac\uc6a9\ud568<\/li>\n      <li>Freebase relation(NYT corpus \uae30\ubc18)\uc73c\ub85c \ub9cc\ub4e4\uc5b4\uc9c4 dataset\uc784<\/li>\n      <li>2005-2006\uc744 training data, 2007\uc744 testing data\ub85c \uc0ac\uc6a9\ud568<\/li>\n    <\/ul>\n  <\/li>\n  <li>Evaluation Metrics<\/li>\n  <li>Mintz et al. (2009) \uc5d0\uc11c \uc0ac\uc6a9\ud55c held-out evaluation\uacfc manual evaluation\uc744 \uc0ac\uc6a9\ud568<\/li>\n  <li>\uc2e4\ud5d8\uc5d0 \ub300\ud55c precision\/recall curve \ub4f1\uc744 \ubd04<\/li>\n  <li>\ub098\uba38\uc9c0\ub294 skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"5-conclusion\">5. Conclusion<\/h2>\n","pubDate":"Tue, 20 Feb 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/piecewise-cnn-diatant-supervision-for-relation-extraction\/","guid":"https:\/\/roomylee.github.io\/piecewise-cnn-diatant-supervision-for-relation-extraction\/","category":["piecewise-cnn","relatin-extraction","distant-supervision","blog"]},{"title":"Relation Extraction: Perspective from Convolutional Neural Networks (NAACL 2015)","description":"<ul>\n  <li>Paper Link: <a href=\"http:\/\/www.cs.nyu.edu\/~thien\/pubs\/vector15.pdf\">http:\/\/www.cs.nyu.edu\/~thien\/pubs\/vector15.pdf<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Thien Huu Nguyen (New York University)<\/li>\n      <li>Ralph Grishman (New York University)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>NAACL 2015<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>In relation extraction, tranditional approach with complicated feature engineering has errors and it lead to errors of relation detection and classification.\n    <ul>\n      <li>\u27a4 \uad00\uacc4\ucd94\ucd9c\uc5d0\uc11c \uc804\ud1b5\uc801\uc778 \ubc29\ubc95\uc740 \ubcf5\uc7a1\ud55c feature engineering\uc744 \ud558\uae30\uc5d0 \uc5d0\ub7ec\uac00 \ub9ce\uace0 \ub54c\ubb38\uc5d0 detection\uacfc classification\uc5d0\uc11c \ub9ce\uc740 \ubb38\uc81c\ub97c \uc57c\uae30\ud55c\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Advantage of our model: multiple window sizes for filters.\n    <ul>\n      <li>\u27a4 \ud544\ud130\uc758 \uc708\ub3c4\uc6b0 \uc0ac\uc774\uc988\ub97c \uc5ec\ub7ec\uac00\uc9c0\ub85c \ud560 \uc218 \uc788\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>using pre-trained word embeddings as initializer.\n    <ul>\n      <li>\u27a4 pre-trained \ub41c \uc6cc\ub4dc \uc784\ubca0\ub529 \ubaa8\ub378\uc744 \uc0ac\uc6a9\ud558\uace0 \uc788\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>Relation Extaction task can be divided into two steps: 1) Detecting and 2) Classifying\n    <ul>\n      <li>\u27a4 \uad00\uacc4\ucd94\ucd9c \ubb38\uc81c\ub294 \uad00\uacc4\ub97c \ucc3e\uc544\ub0b4\ub294 \ub2e8\uacc4\uc640 \uc774\ub97c \ubd84\ub958\ud558\ub294 \ub2e8\uacc4, \ub450 \uac00\uc9c0\ub85c \ub098\ub20c \uc218 \uc788\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>Difference between Relation Classification and Relation Extraction \u27a4 RC\uc640 RE\uc758 \ucc28\uc774\uc810\n    <ul>\n      <li>In classification, non-relation examples in the dataset are comparable to the other examples, so they can be treated as a usual relation class like *Other- class.(balanced)<\/li>\n      <li>\u27a4 RC\uc5d0\uc11c\ub294 non-relation\uc778 \ub370\uc774\ud130\uc758 \uc591\uc774 relation\uc778 \ub370\uc774\ud130 \uc591\uacfc \ube44\uc2b7\ud558\ub2e4.(balanced) \uadf8\ub798\uc11c \ubcf4\ud1b5\uc758 \ud074\ub798\uc2a4\ucc98\ub7fc \uc774\ub97c \ub2e4\ub8f0 \uc218 \uc788\ub2e4.<\/li>\n      <li>In Extraction, non-relation examples far exceeds the others.(unbalanced) So more challenging but more practical than relation classification.<\/li>\n      <li>\u27a4 RE\uc5d0\uc11c\ub294 non-relation\uc778 \ub370\uc774\ud130\uc758 \uc591\uc774 \ub9e4\uc6b0 \ub9ce\ub2e4.(unbalanced) \uadf8\ub798\uc11c \ub354 \uc5b4\ub835\uace0 \uc4f8\ubaa8\uac00 \uc788\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>In the last decade, the relation extraction has been dominated by two methods, the feature-based method and kernel-based method.\n    <ul>\n      <li>\u27a4 \uc9c0\uae08\uae4c\uc9c0 relation extraction\uc758 \ud574\ubc95\uc73c\ub85c feature-based\uc640 kernel-based \ub450 \uac00\uc9c0 \ubc29\ubc95\uc774 \uc9c0\ubc30\uc801\uc774\uc5c8\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>The common characteristic of these methods is the leverage of a large body of linguistic analysis and knowledge resourses to transform relation mentions into some rich representation to be used by some statistical classifier such as SVM, MaxEnt.\n    <ul>\n      <li>\u27a4 \ub450 \ubc29\ubc95\uc758 \uacf5\ud1b5\uc801\uc778 \ud2b9\uc9d5\uc740 relation mention\uc744 \ud1b5\uacc4\uc801 \ubd84\ub958\uae30(SVM, MaxEnt)\uc5d0\uc11c \uc0ac\uc6a9\ud560 \uc218 \uc788\ub294 \ud48d\ubd80\ud55c \ud45c\ud604\uc73c\ub85c \ubcc0\ud658\ud558\uae30 \uc704\ud574\uc11c <em>\uc5b8\uc5b4 \ubd84\uc11d \ubc0f \uc9c0\uc2dd \uc790\uc6d0\uc758 \ub9ce\uc740 \ubd80\ubd84\uc744 \ud65c\uc6a9<\/em>\ud55c\ub2e4\ub294 \uac83\uc774\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>So these models depend on a supervised NLP toolkit and suffer from a performance loss when they are applied to out-of-domain data.\n    <ul>\n      <li>\u27a4 \uadf8\ub798\uc11c \uc774 \ubaa8\ub378\ub4e4\uc740 \ud559\uc2b5\ub41c NLP toolkit\uc5d0 \uc758\uc874\uc801\uc774\uace0 out-of-domain \ub370\uc774\ud130\uc5d0 \ub300\ud574\uc11c \ud37c\ud3ec\uba3c\uc2a4 \uc18c\uc2e4\uc774 \uc788\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>We target an independent RE system that both avoids complicated feature engineering and minimizes the reliance on the supervised NLP modules.\n    <ul>\n      <li>\u27a4 \uc6b0\ub9ac\ub294 \ubcf5\uc7a1\ud55c feature engineering\uacfc NLP module\uc758 \uc758\uc874\uc131\uc744 \ucd5c\uc18c\ud654\ud55c \ub3c5\ub9bd\uc801\uc778 RE \uc2dc\uc2a4\ud15c\uc744 \ubaa9\ud45c\ub85c \ud558\uc600\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n  <li>there are two recent works on CNNs for relation classification (Liu et al., 2013) and (Zeng et al., 2014); however work on CNNs for relation extraction is not yet.\n    <ul>\n      <li>\u27a4 CNN\uc744 \ud1b5\ud55c RC\uc5d0 \ub300\ud574\uc11c\ub294 \ucd5c\uadfc Liu\uc640 Zeng\uc758 \uc5f0\uad6c\uac00 \ubc1c\ud45c\ub41c \ubc14 \uc788\uc9c0\ub9cc, RE\ub294 \uc544\uc9c1 \ubc1c\ud45c\ub41c \uac8c \uc5c6\ub2e4.<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"3-convolutional-neural-network-for-relation-extraction\">3. Convolutional Neural Network for Relation Extraction<\/h2>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36367668-1a3dbc00-1597-11e8-8a8c-74b6607567f6.png\" alt=\"network\" \/><\/p>\n\n<ul>\n  <li>There are main 4 layers\n    <ol>\n      <li>the lookup tables to encode words in sentences by real-valued vectors<\/li>\n      <li>the convolutional layer to recognize n-gram<\/li>\n      <li>the pooling layer to determine the most relevant features<\/li>\n      <li>the logistic regression layer (FC with softmax at the end) to perform classification<\/li>\n    <\/ol>\n  <\/li>\n<\/ul>\n\n<h3 id=\"3-1-word-representation\">3-1. Word Representation<\/h3>\n\n<ul>\n  <li>CNN\uc740 fixed length input\uc774\uc5b4\uc57c\ub9cc \ud568<\/li>\n  <li>fixed length\ub294 relation\uc744 \uac16\ub294 \ub450 entity \uc0ac\uc774\uc758 \ucd5c\ub300 \uac70\ub9ac\ub85c \ud558\uc600\uace0, \uc774 length\ubcf4\ub2e4 \uae34 \ubb38\uc7a5\uc740 \uc790\ub974\uace0 \uc9e7\uc740 \ubb38\uc7a5\uc740 special token\uc73c\ub85c padding \uc2dc\ud0b4<\/li>\n  <li>\ubb38\uc7a5\uc744 \uc774\ub8e8\ub294 \uac01 \ub2e8\uc5b4 \ud1a0\ud070\uc740 \uc784\ubca0\ub529 lookup table(random or pre-trained)\uc5d0 \uae30\ubc18\ud558\uc5ec \ubca1\ud130\ub85c \ubcc0\ud658\uc2dc\ud0b4<\/li>\n  <li>\ub9c8\ud0b9\ub41c \ub450 entity\uc758 position\uc744 \uc784\ubca0\ub529\ud558\uae30\uc704\ud574 \uac01 entity\uc640 \ubaa8\ub4e0 \ub2e8\uc5b4\uc5d0 \ub300\ud55c \uc0c1\ub300\uc801 \uac70\ub9ac \uac12(i-i1, i-i2)\uc744 \uad6c\ud568<\/li>\n  <li>\uc774\ub97c \uac00\uc9c0\uace0 random initialize\ub41c real-value vector(d1, d2)\ub85c \ubcc0\ud658\ud560 \uc218 \uc788\ub294 lookup table\uc744 \ub9cc\ub4ec<\/li>\n  <li>\ub530\ub77c\uc11c \uc0c1\ub300\uc801 \uac70\ub9ac \uac12\uc758 \ubc94\uc704\ub294 -n+1\ubd80\ud130 n-1\uae4c\uc9c0\uc774\uace0, position embedding lookup table\uc740 (2n-1) - m_d\uc758 \uc0ac\uc774\uc988\ub97c \uac16\uace0 \uc774\ub54c m_d\ub294 hyperparameter\uc778 position embedding size\ub97c \uc758\ubbf8\ud568<\/li>\n  <li>\ucd5c\uc885\uc801\uc73c\ub85c \ubb38\uc7a5\uc758 \uac01 \ub2e8\uc5b4 \ud1a0\ud070\uc740 word embedding vector, position embedding vector d1\uacfc d2\ub97c \uc774\uc5b4 \ubd99\uc778(concatenate) vector\ub97c \uac16\uac8c \ub428<\/li>\n  <li>\ub530\ub77c\uc11c \ud558\ub098\uc758 \ub2e8\uc5b4 \ud1a0\ud070\uc740 (word embedding size + 2*position embedding size)\uc758 \ucc28\uc6d0\uc744 \uac16\ub294 \ubca1\ud130\uac00 \ub428<\/li>\n<\/ul>\n\n<h3 id=\"3-2--3-convolution--pooling\">3-2 &amp; 3. Convolution &amp; Pooling<\/h3>\n\n<ul>\n  <li>\ub2e4\uc591\ud55c size\uc758 filter\ub97c \uc774\uc6a9\ud574\uc11c \ub2e8\uc5b4 \uc2dc\ud000\uc2a4(\ubb38\uc7a5)\uc744 convolution \uc2dc\ud0a4\uba70, bias\uc640 non-linear activation function\uc774 \uc0ac\uc6a9\ub428<\/li>\n  <li>convolution\ub41c \uac12(\ubca1\ud130)\uc5d0 \ub300\ud574 max pooling\uc744 \uc9c4\ud589\ud558\uace0 max pooling\ud560 \ubca1\ud130\uc758 \ud06c\uae30\ub294 convolution\ud55c filter\uc758 \ud06c\uae30\uc5d0 \ub530\ub77c\uc11c (fixed length - filter size +  1)\uc758 \uae38\uc774\ub97c \uac16\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"3-4-regularization-and-classification\">3-4. Regularization and Classification<\/h3>\n\n<ul>\n  <li>dropout<\/li>\n  <li>FC<\/li>\n  <li>softmax<\/li>\n  <li>l2 norm<\/li>\n  <li>AdaDelta<\/li>\n<\/ul>\n\n<h2 id=\"4-experiments\">4. Experiments<\/h2>\n\n<ul>\n  <li>Dataset\n    <ul>\n      <li>SemEval 2010 Task 8 dataset for relation classification<\/li>\n      <li>ACE 2005 dataset for relation extraction<\/li>\n    <\/ul>\n  <\/li>\n  <li>static &amp; non-static\n    <ul>\n      <li>static\uc740 word &amp; position embedding vector\ub97c \ud559\uc2b5\ud558\uc9c0 \uc54a\ub294 \uac83, non\uc740 \uadf8 \ubc18\ub300<\/li>\n      <li>random init + non-static, pre-trained + non-static, pre-trained + static, 3\uac00\uc9c0\uc5d0 \ub300\ud574\uc11c \uc131\ub2a5 \ud3c9\uac00\ub97c \ud574\ubd04<\/li>\n    <\/ul>\n  <\/li>\n  <li>tanh for non-linear activation function<\/li>\n  <li>150 filters for each window size<\/li>\n  <li>word embedding size = 300 (using GoogleNews word2vec)<\/li>\n  <li>position embedding size = 50^4<\/li>\n  <li>dropout keep prob = 0.5<\/li>\n  <li>batch size = 50<\/li>\n  <li>hyperparameter of l2 = 3<\/li>\n<\/ul>\n\n<h2 id=\"5-conclusion\">5. Conclusion<\/h2>\n\n<ul>\n  <li>\uc6b0\ub9ac\ub294 CNN\uc73c\ub85c unbalanced corpus\uc5d0 \ub300\ud574\uc11c\ub3c4 \uc798 \uc791\ub3d9\ud558\uace0 feature\ub97c \uc704\ud55c \uc678\ubd80 supervised NLP toolkit\uc758 \uc0ac\uc6a9\uc744 \ucd5c\uc18c\ud654\ud588\uc74c<\/li>\n  <li>\uc544\ub798\uc758 \uce21\uba74\uc5d0\uc11c relation classification &amp; extraction task\uc5d0 \ub300\ud574 \uae30\uc5ec(contribution)\uac00 \uc788\ub2e4\uace0 \ubd04\n    <ul>\n      <li>multiple window size<\/li>\n      <li>position embedding<\/li>\n      <li>pre-trained word embedding for init in a non-static architecture<\/li>\n    <\/ul>\n  <\/li>\n  <li>future work\ub85c\ub294 relation extraction\uc744 \uc704\ud55c CNN\uc758 \ucd94\uac00\uc801\uc778 feature \ucd94\ucd9c \ubc29\ubc95 \uace0\uc548, CNN\ub9d0\uace0 \ub2e4\ub978 \ub274\ub7f4\ub137\uc744 \uc774\uc6a9\ud55c relation extraction \ubb38\uc81c \ud574\uacb0 \ub4f1\uc774 \uc788\ub2e4\uace0 \ubd04<\/li>\n<\/ul>\n","pubDate":"Mon, 19 Feb 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/relation-extraction-perspective-cnn\/","guid":"https:\/\/roomylee.github.io\/relation-extraction-perspective-cnn\/","category":["relation-extraction","cnn","analysis","blog"]},{"title":"Relation Classification via Convolutional Deep Neural Network (COLING 2014)","description":"<ul>\n  <li>Paper Link: <a href=\"http:\/\/www.aclweb.org\/anthology\/C14-1220\">http:\/\/www.aclweb.org\/anthology\/C14-1220<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>Daojian Zeng (Chinese Academy of Sciences)<\/li>\n      <li>Kang Liu (Chinese Academy of Sciences)<\/li>\n      <li>Siwei Lai (Chinese Academy of Sciences)<\/li>\n      <li>Guangyou Zhou (Chinese Academy of Sciences)<\/li>\n      <li>Jun Zhao (Chinese Academy of Sciences)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>COLING 2014<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>\uc774 \ubd84\uc57c\uc758 state of the art\ub294 \ud655\ub960\uc801 \uba38\uc2e0\ub7ec\ub2dd \uae30\ubc95\uc774\uace0 \uc774\ub294 feature \ucd94\ucd9c\uc758 \uc9c8\uc774 \uc131\ub2a5\uc744 \uc88c\uc9c0\uc6b0\uc9c0\ud568<\/li>\n  <li>Deep CNN\uc744 \uc0ac\uc6a9\ud558\uc5ec lexical &amp; sentence level feature\ub97c \ucd94\ucd9c\ud558\ub824\uace0 \ud568<\/li>\n  <li>\uc6b0\ub9ac\ub294 \ubcf5\uc7a1\ud55c \uc804\ucc98\ub9ac\uac00 \ud544\uc694\ud558\uc9c0 \uc54a\uc74c<\/li>\n  <li>\uba3c\uc800 word embedding lookup table\uc744 \ud1b5\ud574 word \ud1a0\ud070\uc744 \ubca1\ud130\ub85c \ubcc0\ud658\ud568<\/li>\n  <li>lexical level feature\ub294 \uc8fc\uc5b4\uc9c4 \uba85\uc0ac\ub4e4(nouns)\uc5d0 \ub530\ub77c \ucd94\ucd9c\ub428<\/li>\n  <li>sentence lebel feature\ub294 CNN \ud559\uc2b5\uacfc\uc815 \uc911\uc5d0 \ucd94\ucd9c\ub428<\/li>\n  <li>\uc774\ub807\uac8c \ucd94\ucd9c\ud55c \ub450 feature\ub97c softmax classifier\ub97c \ud1b5\ud574\uc11c \ub9c8\ud0b9\ud55c \ub450 \uba85\uc0ac(entity) \uac04\uc758 \uad00\uacc4\ub97c \uc608\uce21\ud568<\/li>\n  <li>state of the art\ubcf4\ub2e4 \uc0c1\ub2f9\ud788 \uc131\ub2a5\uc774 \uc88b\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>\ub300\ubd80\ubd84 supervised learning\uc744 \ud568<\/li>\n  <li>supervised learning\uc740 \ud06c\uac8c feature-based method\uc640 kernel-based method\uac00 \uc788\uc74c<\/li>\n  <li>feature-based method\ub294 bag-of-words model \uac19\uc740 \uac70\uace0 kernel-based method\ub294 dependency parse tree \uac19\uc740 \uac70\uc784<\/li>\n  <li>\uc774\ub7f0 \ubc29\ubc95\uc740 \ud6a8\uacfc\uc801\uc774\uc9c0\ub9cc \uc774\ub7f0 \ubaa8\ub378 \uc790\uccb4\uc758 \uc601\ud5a5\ub825\uc774 \ub108\ubb34 \ud06c\uace0 \ubcf4\ud1b5 \uc774\ub7f0 feature\ub098 kernel\uc740 \uae30\uc874\uc5d0 \uc874\uc7ac\ud558\ub294 NLP system\uc5d0 \uc758\ud574 \uc720\ub3c4\ub428 -&gt; \uc678\ubd80 NLP system\uc5d0 \uc758\uc874\uc801\uc774\ub2e4\ub294 \uc758\ubbf8\uc778\ub4ef<\/li>\n  <li>\uc6b0\ub9ac\ub294 Deep CNN\uc744 \uc0ac\uc6a9\ud560 \uac83\uc784<\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"3-methodology\">3. Methodology<\/h2>\n\n<p><img src=\"https:\/\/user-images.githubusercontent.com\/15166794\/36367647-f98a3236-1596-11e8-973c-f27d3c89c073.png\" alt=\"network\" \/><\/p>\n\n<h3 id=\"3-1-the-neural-network-architecture\">3-1. The Neural Network Architecture<\/h3>\n\n<ul>\n  <li>neural network architecture\ub97c \uc0ac\uc6a9<\/li>\n  <li>Word Representation, Feature Extraction, Output, 3\uac00\uc9c0 component\ub85c \uad6c\uc131\ub428<\/li>\n  <li>\uc6b0\ub9ac \uc2dc\uc2a4\ud15c\uc740 \ubcf5\uc7a1\ud55c \uc804\ucc98\ub9ac\uac00 \ud544\uc694\ud558\uc9c0 \uc54a\uace0 input\uc73c\ub85c \uadf8\ub0e5 2\uac1c\uc758 \uba85\uc0ac\uac00 \ub9c8\ud0b9\ub41c \ubb38\uc7a5\uc774 \ub4e4\uc5b4\uac10<\/li>\n  <li>\ub2e8\uc5b4 \ud1a0\ud070\uc740 lookup table\uc5d0 \uae30\ubc18\ud558\uc5ec vector\ub85c \uc784\ubca0\ub529 \ub428<\/li>\n  <li>lexical feature\uc640 sentence feature\ub294 \uac01\uac01 \ucd94\ucd9c\ub418\uace0 \uc5f0\uacb0(concatenate)\uc2dc\ucf1c \ucd5c\uc885 feature vector\ub97c \uc644\uc131\ud568<\/li>\n  <li>\uc774 \ucd5c\uc885 feature vector\ub294 softmax classifier\ub97c \ud1b5\ud574\uc11c output(prediction)\uc744 \uc0dd\uc131\ud574\ub0c4<\/li>\n<\/ul>\n\n<h3 id=\"3-2-word-representation\">3-2. Word Representation<\/h3>\n\n<ul>\n  <li>\ub79c\ub364 \ucd08\uae30\ud654\ubcf4\ub2e4 \ud559\uc2b5\ub41c word embedding vector\ub97c \uc0ac\uc6a9\ud558\ub294 \uac8c \uc88b\uace0 \uc6b0\ub9ac\ub294 Turian et al.(2010)\uc758 trained embedding\uc744 \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h3 id=\"3-3-lexical-level-features\">3-3. Lexical Level Features<\/h3>\n\n<ul>\n  <li>\uae30\uc874\uc758 lexical level feature\ub294 \uc8fc\ub85c \uba85\uc0ac \uc790\uccb4, entity \uc30d, entity \uc0ac\uc774\uc758 \uc2dc\ud000\uc2a4\ub97c \ud3ec\ud568\uc2dc\ucf30\uace0 \uc774\ub4e4\uc740 NLP tool\uc5d0 \uc758\uc874\uc801\uc774\uc600\uc74c<\/li>\n  <li>\uc6b0\ub9ac\ub294 \uc774\ub7f0 \ubc29\ubc95 \ub300\uc2e0\uc5d0 \uc6cc\ub4dc \uc784\ubca0\ub529 \uae30\ubc18\uc758 feature\ub97c \uc0ac\uc6a9\ud558\uc600\uc74c<\/li>\n  <li>\ub9c8\ud0b9\ub41c \uba85\uc0ac\uc640 \uadf8 \uc8fc\ubcc0\uc758 context \ud1a0\ud070(\ub2e8\uc5b4)\ub4e4, \uadf8\ub9ac\uace0 MVRNN(Socher et al. 2012)\uc5d0\uc11c\ucc98\ub7fc WordNet hypernym\uc744 \uc544\ub798\uc640 \uac19\uc774 \ub2e8\uacc4\uc801\uc73c\ub85c \uc0ac\uc6a9\ud568\n    <ul>\n      <li>L1 = Noun 1<\/li>\n      <li>L2 = Noun 2<\/li>\n      <li>L3 = Left and right tokens of Noun 1<\/li>\n      <li>L4 = Left and tight tokens of Noun 2<\/li>\n      <li>L5 = WordNet hypernyms of nouns<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"3-4-sentence-level-features\">3-4. Sentence Level Features<\/h3>\n\n<ul>\n  <li>3-2\uc5d0\uc11c \uc0ac\uc6a9\ud55c word vector\ub294 word similarity\ub97c \uc798 \ud45c\ud604\ud558\uc9c0\ub9cc \ubb38\uc7a5\uc5d0\uc11c long distance feature\ub098 sementic compositionality \uba74\uc5d0\uc11c \ubd80\uc871\ud568<\/li>\n  <li>\uadf8\ub798\uc11c \uc6b0\ub9ac\ub294 sentence level feature\ub97c \uc790\ub3d9\uc73c\ub85c \ucd94\ucd9c\ud560 \uc218 \uc788\ub294 max pooled convolutional neural network\ub97c \uc0ac\uc6a9\ud568<\/li>\n  <li>Figure 2\ub294 sentence level feature \ucd94\ucd9c\uc744 \uc704\ud55c cnn \ubaa8\ub378\uc784. \uc774\ub294 \uc804\uccb4 \uc544\ud0a4\ud14d\ucc98\ub97c \ub098\ud0c0\ub0b4\ub294 Figure 1\uc758 sentence level features \ubd80\ubd84\uc5d0 \ub4e4\uc5b4\uac04\ub2e4\uace0 \ubcf4\uba74 \ub428<\/li>\n  <li>\ucd5c\uc885\uc801\uc73c\ub85c \uc6b0\ub9ac\ub294 non-linear\ud55c sentence level feature\ub97c \uc5bb\uc744 \uc218 \uc788\uc74c<\/li>\n<\/ul>\n\n<h3 id=\"3-4-1-word-features\">3-4-1. Word Features<\/h3>\n\n<ul>\n  <li>Distributional hypothesis theory (Harris, 1954)\uc5d0 \ub530\ub974\uba74 \uac19\uc740 context\uc5d0\uc11c \ub098\ud0c0\ub098\ub294 \ub2e8\uc5b4\ub4e4\uc740 \uc720\uc0ac\ud55c \uc758\ubbf8\ub97c \uc9c0\ub2c8\ub294 \uacbd\ud5a5\uc774 \uc788\uc74c<\/li>\n  <li>\uc774\ub7ec\ud55c \ud2b9\uc131\uc744 \uc7a1\uae30 \uc704\ud574 \ud574\ub2f9 \ub2e8\uc5b4\uc640 \uc8fc\ubcc0 context \ub2e8\uc5b4\uc758 vector representation\uc744 \uc870\ud569\ud558\uae30\ub85c \ud568<\/li>\n  <li><em>[People]_0 have_1 been_2 moving_3 back_4 into_5 [downtown]_6<\/em> \uc640 \uac19\uc740 \ub2e8\uc5b4\uac00 \uc788\ub2e4\uace0 \ud588\uc744 \ub54c, \uac01 \ub2e8\uc5b4\ub97c \uc784\ubca0\ub529 \uc2dc\ud0a4\uba74 (x0, x1, x2, \u2026 , x5, x6)\uc774 \ub420 \uac83\uc784<\/li>\n  <li>\uc774 \ubca1\ud130 \ub9ac\uc2a4\ud2b8\ub97c window_size = 3\uc73c\ub85c \uc8fc\ubcc0 \ub2e8\uc5b4\ub97c \ubb36\uc73c\uba74 ([x_start, x0, x1], [x0, x1, x2], \u2026 , [x4, x5, x6], [x5, x6, x_end])\uc744 \uc5bb\uc744 \uc218 \uc788\uace0 \uc774\ub97c WF(Word Features)\ub85c \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h3 id=\"3-4-2-position-features\">3-4-2. Position Features<\/h3>\n\n<ul>\n  <li>\uc774\uc804\uc5d0 relation classification\uc744 \uc704\ud574 structure features(e.g., the shortest dependency path between nominals)\ub97c \uc0ac\uc6a9\ud588\uc74c. \ud558\uc9c0\ub9cc WF\ub294 \uc774\ub7f0 feature\ub97c \ud3ec\ud568\ud558\uc9c0 \ubabb\ud568<\/li>\n  <li>\uc6b0\ub9ac\ub294 Position Features(PF)\ub97c \ucd94\ucd9c\ud558\uae30 \uc704\ud574 \uc0c1\ub300\uc801\uc778 \uac70\ub9ac\ub97c \uc774\uc6a9\ud568. \uc608\ub97c \ub4e4\uc5b4 \uc704\uc758 3-4-1\uc758 \ubb38\uc7a5\uc5d0\uc11c moving\uc740 \ub9c8\ud0b9\ub41c \ub2e8\uc5b4\ub4e4(people, downtown)\uc5d0 \ub300\ud574 (3, -3)\uc758 \uc0c1\ub300\uc801\uc778 \uac70\ub9ac\ub97c \uac16\uc74c<\/li>\n  <li>\uc0c1\ub300\uc801 \uac70\ub9ac \uac12\uc5d0 \ub300\ud55c \uc784\ubca0\ub529 \ubca1\ud130(size is hyperparameter) d1, d2\ub97c \uc5bb\uc744 \uc218 \uc788\uace0 PF = [d1, d2]\ub85c \ub9cc\ub4e4\uc5b4\ub0bc \uc218 \uc788\uc74c<\/li>\n  <li>\uc774\ub807\uac8c \uc5bb\uc740 position features\ub97c word features\uc640 \ud569\uccd0\uc11c [WF, PF] \ubca1\ud130\ub97c convolution component\ub85c \uc804\ub2ec\uc2dc\ud0b4<\/li>\n<\/ul>\n\n<h2 id=\"4-datasets-and-evaluation-metrics\">4. Datasets and Evaluation metrics<\/h2>\n\n<ul>\n  <li>SemEval 2010 Task 8 dataset<\/li>\n  <li>macro average F1 score<\/li>\n<\/ul>\n","pubDate":"Mon, 19 Feb 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/cnn-relation-classification\/","guid":"https:\/\/roomylee.github.io\/cnn-relation-classification\/","category":["convolutional-neural-network","relation-extraction","relation-classification","blog"]},{"title":"Convolution Neural Network for Relation Extraction (ADMA 2013)","description":"<ul>\n  <li>Paper Link: <a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-642-53917-6_21\">https:\/\/link.springer.com\/chapter\/10.1007\/978-3-642-53917-6_21<\/a><\/li>\n  <li>Author\n    <ul>\n      <li>ChunYang Liu (National Computer Network Center of China)<\/li>\n      <li>WenBo Sun (Beijing University)<\/li>\n      <li>WenHan Chao (Beijing University)<\/li>\n      <li>WanXiang Che (Harbin Institute of Technology)<\/li>\n    <\/ul>\n  <\/li>\n  <li>Published at\n    <ul>\n      <li>ADMA 2013<\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<hr \/>\n\n<h2 id=\"abstract\">Abstract<\/h2>\n\n<ul>\n  <li>CNN\uc744 \uc0ac\uc6a9\ud568<\/li>\n  <li>synonym dictionary\ub97c \uc0ac\uc6a9\ud558\uc5ec new coding(embedding) \ubc29\ubc95\uc744 \uc81c\uc548<\/li>\n  <li>\uae30\uc874 \ubc29\uc2dd(tree kernel \uae30\ubc18)\ubcf4\ub2e4 \uc57d 9%\uc815\ub3c4 \uc88b\uc74c<\/li>\n  <li>ACE 2005 dataset\uc744 \uc0ac\uc6a9<\/li>\n  <li>hypernym\uc5d0 \ub300\ud574\uc11c\ub3c4 \uc2e4\ud5d8\uc744 \ud574\ubd04<\/li>\n<\/ul>\n\n<h2 id=\"1-introduction\">1. Introduction<\/h2>\n\n<ul>\n  <li>\ucee4\ub110 \uae30\ubc18 \ubaa8\ub378\uc740 \ud30c\uc2f1\uc744 \ud574\uc57c\ud558\uace0 \uc774\ub294 \ubcf5\uc7a1\ub3c4\uac00 \ub9e4\uc6b0 \ud07c<\/li>\n  <li>\ub610\ud55c, \uc218\ub3d9\uc73c\ub85c \uc0ac\ub78c\uc774 \uc9c1\uc811 feature engineering\uc744 \ud574\uc57c\ud568<\/li>\n  <li>\uc774\uc804 \uc5f0\uad6c\uc5d0\uc11c\ub294 \uc784\ubca0\ub529 \ubc29\ubc95\uc774 semantic \uc758\ubbf8\ub97c \ub2f4\uc9c0 \ubabb\ud558\uc600\uc73c\ub098 \uc6b0\ub9ac\ub294 \uadf8\uac78 \ub2f4\uace0 \uc788\ub294 synonym coding\uc774\ub77c\ub294 \ubc29\ubc95\uc744 \uc0ac\uc6a9\ud568<\/li>\n<\/ul>\n\n<h2 id=\"2-related-work\">2. Related Work<\/h2>\n\n<ul>\n  <li>skip\u2026<\/li>\n<\/ul>\n\n<h2 id=\"3-convolution-network-architecture\">3. Convolution Network Architecture<\/h2>\n\n<ul>\n  <li>one-hot \uc778\ucf54\ub529 \ubc29\ubc95 \ub300\uc2e0\uc5d0 synonym dictionary\uc5d0 \uae30\ubc18\ud55c \uc784\ubca0\ub529 \ubc29\ubc95\uc744 \uc0ac\uc6a9<\/li>\n  <li>\ud574\ub2f9 synonym\uc784\ubca0\ub529 \uc778\ub371\uc2a4 \ubca1\ud130\uac00 input\uc73c\ub85c \ub4e4\uc5b4\uac10<\/li>\n  <li>\uadf8\ub7ec\uace0 \ub098\uc11c lookup table layer\ub97c \ud1b5\ud574 \ubca1\ud130 \ubcc0\ud658<\/li>\n  <li>CNN -&gt; FC -&gt; softmax \ub97c \uac70\uccd0 \uc544\uc6c3\ud48b\uc744 \ubf51\uc74c<\/li>\n<\/ul>\n\n<h2 id=\"4-experiments\">4. Experiments<\/h2>\n\n<ul>\n  <li>\uc131\ub2a5\uc740 \uc0c1\ub2f9\ud788 \uc798 \ub098\uc634<\/li>\n  <li>\uc5ec\ub7ec\uac00\uc9c0 \uc2e4\ud5d8\uc744 \ud588\ub294\ub370 \uc815\ud655\ud788 \uc5b4\ub5a4 \ub370\uc774\ud130\uc778\uc9c0 \ubd88\uba85\ud655\ud558\uace0 \ub2e4\ub978 \ub17c\ubb38\uc73c\ub85c \ubcf4\uc544 \uc660\uc9c0 \uc2e4\ud5d8 \uacb0\uacfc\uac00 \uc758\uc2ec\uc2a4\ub7ec\uc6c0<\/li>\n<\/ul>\n","pubDate":"Mon, 12 Feb 2018 00:00:00 +0000","link":"https:\/\/roomylee.github.io\/cnn-relation-extraction\/","guid":"https:\/\/roomylee.github.io\/cnn-relation-extraction\/","category":["convolutional-neural-network","cnn","relation-extraction","blog"]}]}}