DialogueDiscourseParser

Dialogue Discourse Parser For Meeting Summarization

This code is borrowed from shizhouxing/DialogueDiscourseParsing. We first train the Deep Sequential Model and then use the model to parse our own meetings.

Requirements

* Python                 2.7
* tensorflow             1.3.0
* tensorflow-gpu         1.3.0
* tensorflow-tensorboard 0.1.8

Data Format

Firstly, you should convert each meeting into the required data format. Note that relations are used for metric calculation, which is not useful for discourse parsing.

[
  {
    "edus": [
      {
        "text": "this is the kick-off meeting for our project .",
        "speaker": "B"
      },
      {
        "text": "and this is just what we're gonna be doing over the next twenty five minutes .",
        "speaker": "A"
      },
      ...
    ],
    "id": "xxxxx",
    "relations": [
      {
        "y": 1,
        "x": 0,
        "type": "Explanation"
      }
    ]
  }
]

Pre-trained Model

You can download the pre-trained model here. Then put it under project dir, DialogueDiscourseParser/model/*

Train from scratch

You can follow the instructions from shizhouxing/DialogueDiscourseParsing to train the parser. Simply run python main.py --is_train, we only modify the vocab_size to 2,500.

Parsing

Choose dataset, AMI or ICSI.
- In run_test.py, change file_list.
- In main.py, change data to AMI or ICSI.
  - tf.flags.DEFINE_string("data", "AMI", "data")
- In main.py, change is_train to False.
  - tf.flags.DEFINE_boolean("is_train", False, "train model")
- In utils.py, in function preview_data, change output dir, and you should first create it.
  - wf = codecs.open("./icsi_res/"+test_data.split(".")[0]+".txt","w","utf-8")
- In data dir, there should be two files train.json, test.json (useless)
Start parsing.
- Run python run_test.py
You can get results from target dir.

For your own data

You can first copy train.json and test.json under the AMI directory to your own data directory without any modification, results in yourdata/train.json and yourdata/test.json since these two files are only for code running rather than discourse parsing.
Then, for your meetings, each meeting is converted into a JSON file, namely, meetingX.json under the yourdata directory.
Finally, you may have a data directory yourdata including meeting1.json, meeting2.json, ..., train.json, and test.json.
Please note that train.json, and test.json are directly copied from AMI directory.

Name		Name	Last commit message	Last commit date
parent directory ..
AMI		AMI
ICSI		ICSI
STAC		STAC
libs		libs
.gitattributes		.gitattributes
.gitignore		.gitignore
Agent.py		Agent.py
Model.py		Model.py
NonStructured_Encoder.py		NonStructured_Encoder.py
README.md		README.md
README_orig.md		README_orig.md
Structured_Encoder.py		Structured_Encoder.py
data_pre.py		data_pre.py
list.ami		list.ami
list.icsi		list.icsi
main.py		main.py
run_test.py		run_test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Dialogue Discourse Parser For Meeting Summarization

Requirements

Data Format

Pre-trained Model

Train from scratch

Parsing

For your own data

FilesExpand file tree

DialogueDiscourseParser

Directory actions

More options

Directory actions

More options

Latest commit

History

DialogueDiscourseParser

Folders and files

parent directory

README.md

Dialogue Discourse Parser For Meeting Summarization

Requirements

Data Format

Pre-trained Model

Train from scratch

Parsing

For your own data