Prosodic Save

Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.

Project README

Prosodic

Prosodic is a metrical-phonological parser written in Python. Currently, it can parse English and Finnish text, but adding additional languages is easy with a pronunciation dictionary or a custom python function. Prosodic was built by Ryan Heuser, Josh Falk, and Arto Anttila. Josh also maintains another repository, in which he has rewritten the part of this project that does phonetic transcription for English and Finnish. Sam Bowman has contributed to the codebase as well, adding several new metrical constraints.

This version, "Prosodic 2", is a near-total rewrite of the original Prosodic.

Supports Python>=3.8.

Install

1. Install python package

For now, pip-install directly from github:

pip install git+https://github.com/quadrismegistus/prosodic

2. Install espeak

Install espeak, free text-to-speak (TTS) software, to ‘sound out’ unknown words.

Mac: brew install espeak. (First install homebrew if not already installed.)
Linux: apt-get install espeak
Windows: Download and install from http://espeak.sourceforge.net/download.html.

Usage

Web app

Prosodic has a new GUI (graphical user interface) in a web app. After installing, run:

prosodic

Then navigate to http://127.0.0.1:5000/. It should look like this:

Python

Read texts

# import prosodic
import prosodic

# load a text
sonnet = prosodic.Text("""
Those hours, that with gentle work did frame
The lovely gaze where every eye doth dwell,
Will play the tyrants to the very same
And that unfair which fairly doth excel;
For never-resting time leads summer on
To hideous winter, and confounds him there;
Sap checked with frost, and lusty leaves quite gone,
Beauty o’er-snowed and bareness every where:
Then were not summer’s distillation left,
A liquid prisoner pent in walls of glass,
Beauty’s effect with beauty were bereft,
Nor it, nor no remembrance what it was:
But flowers distill’d, though they with winter meet,
Leese but their show; their substance still lives sweet.
""")

# can also load by filename
shaksonnets = prosodic.Text(fn='corpora/corppoetry_en/en.shakespeare.txt')

Stanzas, lines, words, syllables, phonemes

Texts in prosodic are organized into a tree structure. The .children of a Text object is a list of Stanza's, whose .parent objects point back to the Text. In turn, in each stanza's .children is a list of Line's, whose .parent's point back to the stanza; so on down the tree.

# Take a peek at this tree structure 
# and the features particular entities have
sonnet.show(maxlines=30, incl_phons=True)

Text()
|   Stanza(num=1)
|       Line(num=1, txt='Those hours, that with gentle work did frame')
|           WordToken(num=1, txt='Those', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='Those', lang='en', num_forms=1)
|                   WordForm(num=1, txt='Those')
|                       Syllable(ipa='ðoʊz', num=1, txt='Those', is_stressed=False, is_heavy=True)
|                           Phoneme(num=1, txt='ð', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='o', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=1, round=1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|           WordToken(num=2, txt=' hours', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='hours', lang='en', num_forms=2)
|                   WordForm(num=1, txt='hours')
|                       Syllable(ipa="'aʊ", num=1, txt='ho', is_stressed=True, is_heavy=True, is_strong=True, is_weak=False)
|                           Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                       Syllable(ipa='ɛːz', num=2, txt='urs', is_stressed=False, is_heavy=True, is_strong=False, is_weak=True)
|                           Phoneme(num=2, txt='ɛː', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=-1, long=1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                   WordForm(num=2, txt='hours')
|                       Syllable(ipa="'aʊrz", num=1, txt='hours', is_stressed=True, is_heavy=True)
|                           Phoneme(num=2, txt='a', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=-1, lo=1, back=-1, round=-1, velaric=-1, tense=1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=3, txt='ʊ', syl=1, son=1, cons=-1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=0, cor=-1, distr=0, lab=-1, hi=1, lo=-1, back=1, round=1, velaric=-1, tense=-1, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='r', syl=-1, son=1, cons=1, cont=1, delrel=0, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=0, lo=0, back=0, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|                           Phoneme(num=4, txt='z', syl=-1, son=-1, cons=1, cont=1, delrel=-1, lat=-1, nas=-1, strid=0, voi=1, sg=-1, cg=-1, ant=1, cor=1, distr=-1, lab=-1, hi=-1, lo=-1, back=-1, round=-1, velaric=-1, tense=0, long=-1, hitone=0, hireg=0)
|           WordToken(num=3, txt=',', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt=',', lang='en', num_forms=0, is_punc=True)
|           WordToken(num=4, txt=' that', sent_num=1, sentpart_num=1)
|               WordType(num=1, txt='that', lang='en', num_forms=3)

# take a peek at it in dataframe form
sonnet.df   # by-syllable dataframe representation
sonnet      # ...which will also be shown when text object displayed (in a notebook)

												word_num_forms	syll_is_stressed	syll_is_heavy	syll_is_strong	syll_is_weak	word_is_punc
stanza_num	line_num	line_txt	sent_num	sentpart_num	wordtoken_num	wordtoken_txt	word_lang	wordform_num	syll_num	syll_txt	syll_ipa
1	1	Those hours, that with gentle work did frame	1	1	1	Those	en	1	1	Those	ðoʊz	1	0	1
					2	hours	en	1	1	ho	'aʊ	2	1	1	1	0
								1	2	urs	ɛːz	2	0	1	0	1
								2	1	hours	'aʊrz	2	1	1
					3	,	en	0	0			0					1
	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
	14	Leese but their show; their substance still lives sweet.	1	1	7	substance	en	1	2	tance	stəns	1	0	1	0	1
					8	still	en	1	1	still	'stɪl	1	1	1
					9	lives	en	1	1	lives	'lɪvz	1	1	1
					10	sweet	en	1	1	sweet	'swiːt	1	1	1
					11	.	en	0	0			0					1

195 rows × 6 columns

Open Source Agenda is not affiliated with "Prosodic" Project. README Source: quadrismegistus/prosodic

Stars

267

Open Issues

Last Commit

1 month ago

Repository

quadrismegistus/prosodic

License

GPL-3.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/prosodic"><img src="https://www.opensourceagenda.com/projects/prosodic/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022