Downloads

Body

Translations of Simple English Wikipedia Articles into Typed Lambda Calculus

The below text files are annotated using a cued-association sentence processing (CASP) markup, including associations for anaphoric inheritance (-n and -m tags) and quantifier scope (-s, -t, -u tags).

These annotation files can be translated into large lambda calculus text files (over 100M each) using the modelblocks software package.

After installing modelblocks, go to the modelblocks-release directory and create the workspace directory:

make

Then, from the modelblocks/workspace directory:

curl -O https://linguistics.osu.edu/sites/default/files/2021-06/wikisemc2.casp_.toktrees_0.txt
mv wikisemc2.casp_.toktrees{_0.txt,}
make wikisemc2.casp_.discexprs

If you have trouble running modelblocks, you can build the files manually:

cat wikisemc2.casp_.toktrees  |  perl ../resource-linetrees/scripts/editabletrees2linetrees.pl  >  wikisemc2.casp_.senttrees
cat wikisemc2.casp_.senttrees  |  sed 's/\^g//g' | python2 ../resource-gcg/scripts/senttrees2discgraphs.py -e  >  wikisemc2.casp_.discgraphs
if [ ! -d ../../modelblocks-release/config ]; then mkdir ../config; fi
echo '-DNDEBUG -O3' > ../config/user-cflags.txt
if [ ! -d bin ]; then mkdir bin; fi
g++ -I../resource-rvtl -Wall `cat ../config/user-cflags.txt` -g -lm  ../resource-linetrees/src/indent.cpp  -o bin/indent
cat wikisemc2.casp_.discgraphs  |  python2 ../resource-gcg/scripts/discgraphs2discexprs.py  |  bin/indent  >  wikisemc2.casp_.discexprs

 

Files:
  • (v0.2) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
  • (v0.2) semantic annotations for 6-sentence beginnings of Simple English Wikipedia articles corresponding to the second 128 most common words used in a 2014 dump of Simple English Wikipedia that are also titles of articles.
  • (v0.2) semantic annotations for 3-sentence beginnings of the first 279 articles in a 2014 dump of Simple English Wikipedia which are not redundant with tranches C1 or C2.