Skip to content
Snippets Groups Projects
Rosni Vasu's avatar
Rosni Vasu authored
665b1fce
Name Last commit Last update
crowdsourcing
data
evaluation
models
.gitignore
README.md

SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

SciHyp provides invaluable insights into the formulation and structure of hypotheses in scientific literature, making it a crucial resource for researchers in various scientific disciplines. SciHyp is a dataset designed to help the scientific community discover and detect hypotheses across scientific literature.

This repository contains the models and datasets described in our paper, SciHyp: A Fine-grained Dataset Describing Hypotheses and Their Components from Scientific Articles

Annotated dataset and models

  • data: All information about SciHyp data is here. For example, the SPARQL endpoint, example queries, and the data from different stages are included here. Check the readme about the dataset here.

    • 🚨 Important Update: We have updated the ontology and RDF data files. Please refer to data/README.md for detailed information about these updates.
  • crowdsourcing: This directory consists of the crowdsourcing interface potato including the instructions and further details. The analysis_intermediate contains the notebooks used to compute the intermediate steps.

  • models: This consists of all the scripts used to train or fine-tune the model. classification directory has the data used to train the models split as train/test/dev, and classification/model/ has the bin files of the pre-trained models as well as the code. weak_labeling consists of the scripts used to prepare the ensemble model and the necessary intermediate input data. This could be useful to prepare the pipeline described in the paper.

  • evaluation: This consists of all the sampled evaluation data and outcomes from two versions of in-context learning models (back-boned by gpt-4) and a notebook that reproduces the analysis mentioned in the paper.

Note: Some of the intermediate data files will be available upon request.

Citation

If you would like to use the SciHyp dataset or models in your project, please cite our paper.

@InProceedings{10.1007/978-3-031-77847-6_8,
author="Vasu, Rosni
and Sarasua, Cristina
and Bernstein, Abraham",
title="SciHyp: A Fine-Grained Dataset Describing Hypotheses and Their Components from Scientific Articles",
booktitle="The Semantic Web -- ISWC 2024",
year="2025",
publisher="Springer Nature Switzerland",
address="Cham",
pages="134--152",
isbn="978-3-031-77847-6"
}

Contact

If you have any questions or feedback about the SciHyp dataset or models, please feel free to contact us at rosni[at]ifi.uzh.ch

Thank you for your interest in SciHyp!