aboutsummaryrefslogtreecommitdiff

Synonymiser

Use a database to programmatically synonymise words and phrases. Developed from work associated with http://txt.1bpm.net/Cicerone/

Requirements

  • Python (Developed with 2.7)
  • PostgreSQL
  • Peewee for Python

Installation

The source is provided with no setup.py or other installation and the idea is that it can be used directly or assimilated into other Python projects directly.

  • Load the gzipped sql file in the sql directory to your PostgreSQL instance.
  • In the synonymiser directory, rename config.dist.py to config.py and edit the containing values to reflect your PostgreSQL installation.

Usage

Standalone

synonymiser.py can be called directly and has a number of command line options. It can handle a single word or a phrase given as an argument, or take input piped to stdin. Some of the following options only make sense for synonymising a single word:

  • -h , --help : show help
  • -o , --offensive : include words marked as offensive in the database (default is don't)
  • -l LIMIT , --limit LIMIT : limit the number of synonyms returned to LIMIT. Only relevant when providing a single word.
  • -s SORTING , --sorting SORTING : sort the list of synonyms, available options are random, alpha and none. Only relevant when providing a single word.

Imported to Python project

The synonymiser directory can be used as a subpackage in your Python project or you can just use the three files config.py, synonymiser.py and db.py accordingly. The functions intended to be exposed are: * synonymise(line, offensives=False) : synonymise a line of text, replacing each word with randomly selected synonyms, only selecting words marked as offensive if offensives is True. Returns a string. * get_synonyms(word, limit=1, sorting=SORTING.RANDOM, offensives=False) : get synonyms for word, limited to the number specified by limit, where sorting can be SORTING.RANDOM, SORTING.ALPHA or SORTING.NONE, only selecting words marked as offensive if offensives is True. Returns a list.