Downloads

Translations of Simple English Wikipedia Articles into Typed Lambda Calculus

The below text files are annotated using a cued-association sentence processing (CASP) markup, including associations for anaphoric inheritance (-n and -m tags) and quantifier scope (-s, -t, -u tags).

Files:

  • (v0.3) syntactic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
    File
    Wikisem tranche C1 categorial grammar
  • (v0.3) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
    Wikisem tranche C1 logic
  • (v0.3) syntactic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
    File
    Wikisem tranche C2 categorial grammar
  • (v0.3) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
    File
    Wikisem tranche C2 logic

 

The below version 0.2 annotation files must be manually translated into large lambda calculus text files (over 100M each) using the modelblocks software package.

After installing modelblocks, go to the modelblocks-release directory and create the workspace directory:

make

Then, from the modelblocks/workspace directory:

curl -O https://linguistics.osu.edu/sites/default/files/2021-06/wikisemc2.casp_.toktrees_0.txt mv wikisemc2.casp_.toktrees{_0.txt,} make wikisemc2.casp_.discexprs

If you have trouble running modelblocks, you can build the files manually:

cat wikisemc2.casp_.toktrees | perl ../resource-linetrees/scripts/editabletrees2linetrees.pl > wikisemc2.casp_.senttrees cat wikisemc2.casp_.senttrees | sed 's/\^g//g' | python2 ../resource-gcg/scripts/senttrees2discgraphs.py -e > wikisemc2.casp_.discgraphs if [ ! -d ../../modelblocks-release/config ]; then mkdir ../config; fi echo '-DNDEBUG -O3' > ../config/user-cflags.txt if [ ! -d bin ]; then mkdir bin; fi g++ -I../resource-rvtl -Wall `cat ../config/user-cflags.txt` -g -lm ../resource-linetrees/src/indent.cpp -o bin/indent cat wikisemc2.casp_.discgraphs | python2 ../resource-gcg/scripts/discgraphs2discexprs.py | bin/indent > wikisemc2.casp_.discexprs

Files:

  • (v0.2) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
  • (v0.2) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
  • (v0.2) semantic annotations for 3-sentence beginnings of the first 279 articles in a 2014 dump of Simple English Wikipedia which are not redundant with tranches C1 or C2.