Skip to content
This repository was archived by the owner on Feb 23, 2024. It is now read-only.

google-research-datasets/impakt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

This dataset contains semantic parsing annotations for 2489 sentences from shopping web pages in the C4 corpus, corresponding to annotations of 3719 expressed implication relationships and 6117 typed and summarized attributes.

The data is released under the CC BY 4.0 license.

More details can be found in an upcoming paper.

The dataset is in JSON Lines format, where each line is a json object with the following schema:

{
  "snippet": "",
  "provenance": {
    "url": "https://pleasanthearthfireplacedoors.com/gas-logs/vented-vs-vent-free-gas-logs-which-one-to-get/",
    "timestamp": "2019-04-22T20:53:43Z",
    "span_start": 360,
    "span_end": 435
  },
  "category": "Home & Garden > Fireplaces",
  "classification": "Yes",
  "attributes": [
    {
      "attribute": "ambiance",
      "summary": "ambiance"
    },
    {
      "attribute": "flickering fire",
      "summary": "flickering fire"
    }
  ],
  "atomic_attributes": {
    "ambiance": [
      {
        "attribute": "ambiance",
        "summary": "ambiance",
        "attribute_type": "use case"
      }
    ],
    "flickering fire": [
      {
        "attribute": "flickering fire",
        "summary": "flickering fire",
        "attribute_type": "features"
      }
    ]
  },
  "implications": [
    {
      "antecedent": "flickering fire",
      "consequent": "ambiance"
    }
  ]
}

The provenance information lets users join onto the C4 corpus to find the snippet that was annotated. The C4 corpus can be constructed using the scripts at the original dataset listing, or a pre-constructed version can be used, such as the one created by AI2 and hosted by HuggingFace.

About

The ImPaKT (Implication Parsing and Knowledge exTraction) dataset contains semantic parsing annotations for 2489 sentences from shopping web pages in the C4 corpus, corresponding to annotations of 3719 expressed implication relationships and 6117 typed and summarized attributes.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors