SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles
SciHyp provides invaluable insights into the formulation and structure of hypotheses in scientific literature, making it a crucial resource for researchers in various scientific disciplines. SciHyp is a dataset designed to help the scientific community discover and detect hypotheses across scientific literature.
This repository contains the models and datasets described in our paper, SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles
Annotated dataset and models
-
data
: All information about SciHyp data is here. For example, the SPARQL endpoint, example queries, and the data from different stages are included here. Check the readme about the dataset here.-
🚨 Important Update: We have updated the ontology and RDF data files. Please refer to
data/README.md
for detailed information about these updates.
-
🚨 Important Update: We have updated the ontology and RDF data files. Please refer to
-
crowdsourcing
: This directory consists of the crowdsourcing interfacepotato
including the instructions and further details. Theanalysis_intermediate
contains the notebooks used to compute the intermediate steps.-
annotation guidelines
: You can find here the crowdsourcing task instructions/Annotation Guidelines
-
-
models
: This consists of all the scripts used to train or fine-tune the model.classification
directory has the data used to train the models split as train/test/dev, andclassification/model/
has the bin files of the pre-trained models as well as the code.weak_labeling
consists of the scripts used to prepare the ensemble model and the necessary intermediate input data. This could be useful to prepare the pipeline described in the paper. -
evaluation
: This consists of all the sampled evaluation data and outcomes from two versions of in-context learning models (back-boned by gpt-4) and a notebook that reproduces the analysis mentioned in the paper.
Note: Some of the intermediate data files will be available upon request.
Citation
If you would like to use the SciHyp dataset or models in your project, please cite our paper.
@InProceedings{10.1007/978-3-031-77847-6_8,
author="Vasu, Rosni
and Sarasua, Cristina
and Bernstein, Abraham",
title="SciHyp: A Fine-Grained Dataset Describing Hypotheses and Their Components from Scientific Articles",
booktitle="The Semantic Web -- ISWC 2024",
year="2025",
publisher="Springer Nature Switzerland",
address="Cham",
pages="134--152",
isbn="978-3-031-77847-6"
}
Contact
If you have any questions or feedback about the SciHyp dataset or models, please feel free to contact us at rosni[at]ifi.uzh.ch
Thank you for your interest in SciHyp!