Tuesday, October 20, 2009

WordNet / VerbNet via Python NLTK

WordNet as thesaurus
>>> from nltk.corpus import wordnet as wn
>>> set([s.name.split('.')[0] for s in wn.synsets('trade') if s.name.find(".v.") != -1])

VerbNet
>>> from nltk.corpus import verbnet
>>> verbnet.classids('drink')
['eat-39.1-2']
>>> v=verbnet.vnclass('39.1-2')
>>> [t.attrib['type'] for t in v.findall('THEMROLES/THEMROLE/SELRESTRS/SELRESTR')]
['comestible', 'solid']
>>> [t.attrib['type'] for t in v.findall('THEMROLES/THEMROLE')]
['Patient']

Clustering
http://docs.huihoo.com/nltk/0.9.5/api/nltk.cluster-module.html
import numpy
from nltk import cluster

vectors = [numpy.array(f) for f in [[3, 3], [1, 2], [4, 2], [4, 0]]]

# initialise the clusterer (will also assign the vectors to clusters)
clusterer = cluster.KMeansClusterer(2, euclidean_distance)
clusterer.cluster(vectors, True)

# classify a new vector
print clusterer.classify(numpy.array([3, 3]))

Named Entities
http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html#named_entity_detection_index_term
>>> import nltk
>>> sent = nltk.corpus.treebank.tagged_sents()[22]
>>> print nltk.ne_chunk(sent)
(S
The/DT
(GPE U.S./NNP)
is/VBZ
one/CD
...
according/VBG
to/TO
(PERSON Brooke/NNP T./NNP Mossman/NNP)
...)

MontyLemmatiser - Strips inflectional morphology, i.e. changes verbs to infinitive form and nouns to singular form eg. eating -> eat
http://web.media.mit.edu/~hugo/montylingua/
>>> from monty.MontyLemmatiser import MontyLemmatiser
>>> lemmatized = MontyLemmatiser().lemmatise_word('eating')

No comments: