the Musical Corpus of Flow


A key question in science is how to precisely measure and record the information we're interested in. Studying art (like rap flow) scientifically is difficult because art, and the emotional, social, and intellectual experience of art, is so complex and multifaceted that is difficult to determine what to measure. We must is inevitably decide what information we want to capture, and what information to leave out. Measurements of art are also highly subjectivefor instance, two people might disagree on what words rhyme, on the meaning of some particular words, and on what makes for "good" flow.

We "measure" rap flow by encoding information about syllables. We identify each individual syllable in a rap verse, and encode a variety of theoretically interesting features about each syllable. To maximize the objectivity and consistency of encodings, it might seem preferable to use automated computer analysis to extract information from rap songs. However, the high-level theoretical features of rap flow which interest us are too complex to be reliably extracted using the automated techniques of Music Information Retrieval. Instead, we must rely on the processing abilities of humans, with all their failings and biasesaccordingly, transcriptions are encoded by ear by me.

For the most complete discussion of the encoding scheme, consult section 4.2 of my dissertation.

Humdrum Representations

transcriptions are encoded as Humdrum-Syntax text files. Eight humdrum spines are used to encode information in each transcription. Each spine has it's own, original humdrum interpretation. The eight interpretations are:


**recipx is used to encode the rhythmic duration of syllables in . **recipx is a variation of the standard Humdrum **recip representation (which is the same as the rhythmic aspect of the more popular **kern).

**recip is based on the rhythmic aspect of traditional music notation, wherein the duration of notes are represented as fractions of a measure; i.e. as "whole," "half," "quarter," "eighth," or "sixteenth" notes. "Recip" is short for "reciprocal," because in **recip a single number indicates it's reciprocal: thus, "8" = 1/8 note (eighth-note), "4" = 1/4 (quarter-note), etc. Just as in traditional music notation, a dot can be added to a number to increase the duration by half. Unlike traditional notation, triple subdivisions are not represented by "triplet" groupings; instead, triplets are simply represented as whatever fraction of the measure they take up: Thus, what are traditionally called "triplet eighth-notes" are represented simply as "12" (twelfth-notes), while "triplet sixteenth-notes" are simply "24" (twenty-fourth-note).

**recipx vs **recip

**recipx is a modification of **recip which allows us to encode any rhythmic duration in a single token, without using "ties." Ties are used in traditional notation for two reasons:

1) The first reason ties are used in traditional notation is to represent certain durations which are not powers of two and which can't be represented using "dots": for instance, a duration of five sixteenth-notes. In **recipx, fractional numbers are used to represent durations with odd-numbers: for example, the token "16%5" indicates a duration of 1/16/5, which if you do the math, is the same as a duration of five sixteenth-notes.

Here is an example, illustrating **recipx fractional tokens in comparison to traditional notation and **recip:

Illustration of how **recipx differs from **recip (and traditional notation) by avoiding the use of ties by encoding durations of odd length with fractional tokens.

2) The other reason ties are used in traditional notation (and **recip) is to make it easier for musicians to see which notes land on stronger metric positions. This is done by "tying" durations which cross strong beats. This is especially true for durations which cross downbeatsthe boundary from one measure to the next. In **recipx, durations are allowed to freely cross any metric position, including across barlines. The result is more difficult to sight read, but simpler to analyze.

Here is an example, illustrating **recipx meter-insensitive tokens in comparison to traditional notation and **recip:

Illustration of how **recipx differs from **recip (and traditional notation) by avoiding the use of ties across strong metric positions, and barlines.
back to top


The **stress spine in each transcription marks whether each syllable is stressed (1) or unstressed (0). Stress is, of course, often more complicated and subtle than just two categories; thus, the **stress spine is just a basic, limited information about stress. More information about stress can be found in the **tone spine, indicated by pitch accents. Transcriptions were made by ear; this is important because emcees sometimes "put the emphasis on the wrong syllable."

back to top


The **tone spine is inspired by the "tone tier" of the ToBI system. **tone spines contain information about pitch accents and other notable pitch contours in flow, including boundary tones.

**tone Key

**tone spines use five primary tokens indicate information about the intonation of particular syllable: + = "plus" = pitch peak (accent). _ = "underscore" = pitch nadir (accent). - = "dash" = pitch average. / = "forward slash" = upward glide. \ = "backslash" = downward glide.

In some case, a important pitch pattern happens across multiple syllables, in which case parentheses are used to group the intonation tokens. For example, a common two-syllable boundary tone pattern looks like: (\ _) = downward glide to pitch nadir.

Two more tokens are used to indicate overall changes in pitchthese tokens don't just contain information about a particular syllable, but about all subsequent syllables: v = (vee) = Overall drop in register. ^ = (hat) = Overall increase in register.

back to top


The **break spine is inspired by the "break tier" of the ToBI system. Specifically, **break spines mark the boundaries between prosodic units.

**break Key

Four numeric tokens (0, 1, 2, 3, or 4) are used to mark the relative strength of prosodic boundaries in the flow. 0 = weak syllable boundary. (This token is used when it's unclear if a word is pronounced as one or two syllables.) 1 = normal syllable boundary. (This token is usually omitted, since its assumed by default.) 2 = Intonation boundary without rhythmic break, or vice versa. 3 = Sub-declination unit. 4 = Complete declination unit. Tokens are placed on the first syllable after a prosodic boundary.

back to top


At , the meaning of the word "rhyme" is taken very broadly, including things like assonance, alliteration, and slant rhyme. Rhymes are encoded in transcriptions using a "two-dimensional" system:

The first dimension simply identifies which syllables match as part of a rhyme. These "matches" are encoded by arbitrary roman letters (A, B, C, etc.), reusing letters in each verse in the corpus. These rhyme tokens are case sensitive; lowercase letters are typically used for unstressed syllables. The second dimension marks which adjacent syllables are grouped together into multisyllable rhymes. The start and end of multi-syllable rhymes are marked with opening parenthesis [(] and closing parentheses [)] respectively. In groups longer than two syllables, middle syllables are marked with a [_] (underscore). This 2D system is illustrated in the following example:

Illustration of MCFlow "two-dimensional" rhyme encodings.

This example contains three separate syllable matches:
1) syllables marked A, which are all stressed syllables with the nucleus ʌ. 2) syllables marked a, which are all identical unstressed syllables (following the stressed A syllable) with the phonemes ðɚ. 3) syllables marked B, which are all stressed syllables with the nucleus and coda of ʊd.

These three types of syllables are grouped together in several different ways: Syllables A and a are always grouped as A a and in two cases connect with B syllables to create a three-syllable rhyme (motherhood, brotherhood). Syllable B appears twice as a part of this three-syllable rhyme, but also once by itself. These sorts of flexible, mixed-up, rhyme patterns are common in rap flow, so this "2D" system is necessary to properly encode rhyme in rap.

back to top


The **ipa spine indicates the pronunciation of each syllable using the International Phonetic Alphabet. **ipa transcriptions are made by ear, in order to catch variations in emcees' accents. Transcriptions are somewhat biased towards hearing things using my dialect of English. For instance, I don't differentiate between the "ah" sound in "caught" and "cot."

back to top


The lyrics of the rap are recorded in the **lyrics spine using normal English spelling, as well as some basic information about syntactic boundaries in the flowwhere sentences and clauses begin and end, as well as semantic boundariesin the lyricsplaces where the topic, tense, or scened of the lyrics changes. Multi-syllable words are broken up using a dash (-) symbols. Actual hyphenated words, like "multi-syllable," use underscore (_) instead. Underscore is also used to indicate hyphenated words, and other multi-word groups.


Since the **ipa spine records the pronunciation of words, the **lyrics spine is free to focus on encoding the meaning of the words alone. Thus, only one spelling of all words is used, regardless of how the word is pronounced. For example, the word "because" is often pronounced simply as kʌz, which is informally spelled as cause, 'cause, cuz, or even coz. However, in **lyrics spines it is always spelled "because." Similarly, present progressive verbsi.e. words ending with "ing" are always spelled ing never in', even when they are pronounced In instead of . This approach does not imply that these are wrong spellings of pronunciationsthey are actually very common, normal spellings/pronunciations for many Americans. Rather this approach is simply used to streamline analyses of word use in the corpus; if searching for the word "because," we don't have to guess all the ways it might have been pronounced or spelled. (Numbers are always spelled out; i.e. "ninety_nine" not "99").

back to top

In normal English writing, we used capital letters for two reasons: 1) to indicate the beginning of the sentence; 2) to indicate proper nouns. Since **lyrics spines already encode details about syntactic boundaries (see below), the first of these reasons is made redundant. Thus, to avoid ambiguity about the meaning of letter capitalization, in **lyrics spines only proper nouns (i.e. specific people or places) have their first letter capitalized. This allows us to easily differentiate between, for instance, a "fifty cent cd" (a CD which costs ¢50) and a "Fifty_Cent cd" (an album by the emcee Fifty Cent). Notice that acronyms (like "C.D." for "Compact Disc") are not capitalized, unless they refer to a proper noun, as when, Snoop Dogg spells his name out "S_N_O_O_P_D_0_G_G.

back to top
Syntactic Boundaries
back to top
Semantic Boundaries
back to top


The **hype spine encodes additional hype vocals that happen in a rap song. The spine encodes the same information as the **recipx, **ipa, and **stress spines, compressed into a single spine.

back to top