This chapter details the range of possible words as the bottom layer of sentence structure, while chapter 3 considers phrase layers, and chapter 4 ascends to clause layers. This unravelling of the layers of grammatical structure will show all elements as inter-related components. Consequently, understanding of what is possible at one layer really requires the integration of knowledge from all of the other layers. But discovery of structure has to start somewhere, hence the focus here on words as foundational components.
With this chapter, words are categorised into classes informed by how words function in phrases and clauses. Word analysis involves saying with a TAG the WORD CLASS to which a given word in context belongs. This task is known as TAGGING.
One criteria for tagging is to identify words that can be minimal components for phrase layers: words for noun phrases are discussed in section 2.2; words for adjective phrases, in section 2.3; and words for adverb phrases, in section 2.4. Section 2.5 covers other word types that appear under a noun phrase as modifiers.
Section 2.6 presents tags for verbs which are minimal components for clause layers. Section 2.8 adds other word types that occur under a clause layer. Section 2.9 considers words that serve as connectives to combine layers. Section 2.10 looks at the treatment of punctuation. Section 2.11 deals with interjection, reaction signals, and formulaic expressions. Section 2.12 notes other possible elements of a parse.
Section 1.2.1 identifies words through the conventions of the English writing system as character strings delimited by a space on each side, while noting exceptions. Word identification is not a complete word analysis: The same character string might be used as a word in different ways in different contexts. Tagging completes the word analysis. For example, down in (2.1) occurs as a noun (N), adjective (ADJ), verb (VB), adverb particle (RP), and preposition (P-ROLE).
The examples of (2.1) show how there might be different tags for what are different instances in different contexts of the same character string. Tag differences due to context can happen even within the same sentence. For example, that in (2.2) occurs as a complementizer (C), determiner (D), noun (N), relative pronoun (RPRO), and demonstrative pronoun (D;_nphd_).
Yet another possible tag for that is as a subordinating conjunction word (P-CONN), as seen in (2.3).
Certain words form the minimal component required for the phrase layers that immediately contain them. Such a minimal component is called the HEAD of the phrase. The word class tags of Table 2.2 distinguish words that can head a noun phrase (NP; see section 3.2).
Table 2.2: Tags for words that can head a noun phrase
NS | plural common noun (e.g., children, revelations, times, wishes) |
N | common noun not subclassified as NS, that is, either singular (e.g., child, revelation, time, wish), or neutral for number (e.g., committee, fish, information) |
NPRS | plural proper noun (e.g., Heathers, Koreas) |
NPR | proper noun not subclassified as NPRS, that is either singular (e.g., Heather, Tokyo), or neutral for number (e.g., Andes, IBM) |
PNX | reflexive pronoun (e.g., myself, yourself, itself, ourselves), or reciprocal pronoun (e.g., each_other) |
PRO | personal pronoun (e.g., I, you, them, us) |
PRO;_genm_ | possessive pronoun, pre-nominal (Genitive 1 of Table 2.4 below) (e.g., my, your, our) |
PRO;_ppge_ | nominal possessive personal pronoun that is the only daughter of a genitive marked preposition phrase (Genitive 2 of Table 2.4 below) (e.g., mine, yours, ours) |
WPRO | WH-pronoun (e.g., what, who, whom) |
WPRO;_genm_ | genitive WH-pronoun (whose) |
RPRO | relative pronoun (e.g., which, who, whom, that) |
RPRO;_genm_ | genitive relative pronoun (whose) |
Q;_nphd_ | Indefinite pronoun with quantification, which can be a compound (e.g., everybody, nothing), or a word which often occurs with the preposition of (e.g., much, many, a_lot) |
D;_nphd_ | Indefinite pronoun not subclassified as Q;_nphd_ (e.g., someone, anything, another), and demonstrative pronoun (e.g., this, that, these, those) |
The rest of this section describes morphosyntactic properties for identifying the words of Table 2.2.
There are many suffixes that derive nouns from other nouns or from words of other classes. Some examples include: {age} (package, usage), {er} (officer, teacher), {ness} (illness, awareness), {ship} (championship, relationship), {tion} (action, organisation), {ty} (ability, responsibility).
Many nouns take the suffix {(e)s} when they refer to plural items:
Such nouns are instances of COUNT nouns, so called, because they can combine with numerals (e.g., three wishes). A few count nouns take the irregular suffix {(r)en} in the plural:
There are also count nouns that use the same form in the singular and plural. For example, nouns for the animal species in (2.6) look like singular nouns, and so are tagged N, but can also be used as plurals (e.g., three deer); while (2.7) is tagged NS for looking like a plural, but might also refer to a single pair.
In contrast to count nouns, there are MASS (or noncount) nouns that can neither combine with numerals nor inflect for number. For example:
It is possible for the same noun to belong to both count and mass categories: in Her hair is brown, hair is a mass noun, but in I found a hair in my soup, it is a count noun.
Proper nouns are written with an initial capital letter. Inflection for number is rare, but is possible in talk about more than one entity with the same name, e.g., There are two Heathers in my class.
Both proper nouns and common nouns can be case marked — but only for genitive case — by the genitive suffix {'s} (or {'} if the noun already has the plural suffix {s}). For parsing, the genitive marking is made to form a distinct word that is tagged GENM:
Table 2.3: Tag for genitive marker
GENM | The genitive marker {'s} or {'} |
This marker is placed leftmost under an NP-GENV projection that comprises the genitive content. For example:
Double marking of a genitive is possible when the content of the genitive has both a following genitive marker and a preceding of preposition, as in (2.12).
This section presents an overview of reciprocal and reflexive pronouns (PNX), and personal pronouns (PRO). Such words are understood in the context of their occurrence, often by taking already mentioned noun phrases as antecedents. They can appear in full noun phrase positions, with the range of grammatical markings detailed in Table 2.4.
Table 2.4: Reciprocal and reflexive pronouns (PNX), and personal pronouns (PRO)
1st person | 2nd person | 3rd person | ||||||
---|---|---|---|---|---|---|---|---|
Singular | Plural | Singular | Plural | Singular | Plural | |||
Masculine | Feminine | Neuter | ||||||
Reciprocal (PNX) | each_other, one_another | |||||||
Reflexive (PNX) | myself | ourselves | yourself | yourselves | himself | herself | itself | themselves |
Subject (PRO) | I | we | you | he | she | it | they | |
Non-subject (PRO) | me | us | him | her | them | |||
Genitive 1 (PRO;_genm_) | my | our | your | his | its | their | ||
Genitive 2 (PRO;_ppge_) | mine | ours | yours | hers | theirs |
The reciprocal pronouns each_other and one_another are used to indicate a relationship between conjoined nouns, for example, the love relationship in (2.13).
Reflexive personal pronouns can indicate that the subject and object are the same entity, as in (2.14).
Reflexive forms are also used for emphasis, as seen with (2.15), where myself is annotated as an adverbial noun phrase with reflexive function (NP-RFL).
A look across the columns of Table 2.4 shows that the ‘person’ category is useful when differentiating the various reflexive and personal pronouns. The first person refers to the speaker(s), the second person refers to the hearer(s), and the third person refers to other entities. Person is further distinguished into singular and plural reference. Further distinction is possible with third person singular pronouns (reflexive and personal), since they inflect for gender: This gives: masculine he/him(self)/his, feminine she/her(self)/hers, and neuter it(self)/its.
A look down the rows of Table 2.4 shows that pronouns are marked for more grammatical cases than common nouns or proper nouns. In addition to genitive case, pronouns are marked for nominative case (marking the subject of the clause: I shrugged), and accusative case (marking the object acted on by the verb: James tickled me).
Genitive pronouns can be either:
As shown by examples (2.18) and (2.19), with parsing, a genitive pronoun (dependent (PRO;_genm_) or independent (PRO;_ppge_)) is the only element of an NP-GENV layer. Furthermore, for PRO;_ppge_, the NP-GENV layer is the only element of the containing noun phrase.
Other types of pronouns found in English include:
Table 2.5: Demonstrative pronouns (D;_nphd_)/determiners (D)
Near speaker | Away from speaker | |
---|---|---|
Singular | this | that |
Plural | these | those |
For a word to be an adjective, it must be able to function as the head of an adjective phrase (ADJP; see section 3.3), often as the only element of the phrase. Adjectives are tagged as in Table 2.6 to distinguish comparative and superlative forms from general forms.
Table 2.6: Tags for words that can head an adjective phrase
ADJ | General adjective: an adjective not subclassified as ADJR or ADJS (e.g., old, good, male) |
ADJR | Comparative adjective (e.g., older, better) |
ADJS | Superlative adjective (e.g., oldest, best) |
The comparative form of an adjective is typically indicated by the suffix {er}, whereas the superlative form is typically indicated by the suffix {est}; see Table 2.7.
Table 2.7: Forms of adjectives
General (ADJ) | Comparative (ADJR) | Superlative (ADJS) | |
---|---|---|---|
Gradable | old | older | oldest |
good | better | best | |
Non-gradable | male |
For a word to be an adverb, it must be able to function as the head of an adverb phrase (ADVP; see section 3.4), often as the only element of the phrase. Adverbs are tagged as in Table 2.8 to distinguish comparative, superlative, and WH froms from general forms.
Table 2.8: Tags for words that can head an adverb phrase
ADV | General adverb: an adverb not subclassified as ADVR, ADVS, RADV, or WADV (e.g., often, well, really). |
ADVR | Comparative adverb (e.g., more, less, farther) |
ADVS | Superlative adverb (e.g., most, least, farthest) |
RADV | Wh-adverb that is the relative adverb of a relative clause (e.g., how, when, where, whereby) |
WADV | Wh-adverb (e.g., how, when, where, why) |
RP | Adverbial particle (e.g., up, off, out) |
Section 2.2 above has already discussed words that can head a noun phrase. Table 2.9 gives tags for other words that can be immediate components of a noun phrase, but that can't be a noun phrase head. Rather, these words precede the head within noun phrases and function to modify the head in terms of definiteness, item under question, or quantity.
Table 2.9: Tags for words that can be immediate components of a noun phrase
D | Determiner, which includes articles (e.g., a, the) and demonstratives (e.g., this, that) |
RD | Wh-determiner that is the relative determiner of a relative clause (e.g., what, whatever) |
WD | Wh-determiner (e.g., which, what, whichever) |
NUM | Numeral (e.g., one, 1975) |
Q | Quantifier (e.g., every, no) |
Prototypical nouns can take one of two articles:
Articles for the singular head noun wish and the plural head noun wishes are seen in (2.35).
Verbs occur at clause levels of structure in the annotation. There are tags to subclassify verbs in accordance to their form:
Verbs can change in shape to show tense. For example, the verb SUPPORT in John supports Peter takes a third person present tense {s} inflection, while in John supported Peter it has a past tense {ed} inflection. A verb that has tense is called a finite verb.
Tenseless forms of verbs are called nonfinite verbs, which are comprised of:
Note that participle forms are tenseless despite their full names! Infinitive forms occur in infinitive clauses, often preceded by the infinitive marker to (e.g., John happened to support Peter.). Present participles are used in the progressive construction (e.g., John is supporting Peter). Past participles are used in the perfect construction (e.g., John has supported Peter) and the passive construction (e.g., Peter is supported by John).
The distinctions in verb forms just sketched are captured in the tags for lexical verbs of Table 2.10.
Table 2.10: Tags for lexical verbs
VBP | present tense form of lexical verbs (e.g., reaches, supports, writes, sinks, puts, reach, support, write, sink, put) |
VBD | past tense form of lexical verbs (e.g., reached, supported, wrote, sank, put) |
VB | infinitive form of lexical verbs (e.g., reach, support, write, sink, put) |
VAG | present participle ({ing}) form of lexical verbs (used in the progressive construction) (e.g., reaching, supporting, writing, sinking, putting) |
VVN | past participle ({ed}/{en}) form of lexical verbs (used in the perfect construction and the passive construction) (e.g., reached, supported, written, sunk, put) |
Table 2.11 further illustrates with examples the distinctions between the different lexical verb forms. This includes examples of irregular verbs that do not have a regular past tense inflection.
Table 2.11: Forms of lexical verb
Tensed forms | Tenseless forms | |||||
---|---|---|---|---|---|---|
Tense | Infinitive (VB) | Participles | ||||
Present (VBP) | Past (VBD) | Present (VAG) | Past (VVN) | |||
3rd person singular | Other | |||||
Regular | reaches | reach | reached | reach | reaching | reached |
supports | support | supported | support | supporting | supported | |
Irregular | writes | write | wrote | write | writing | written |
sinks | sink | sank | sink | sinking | sunk | |
puts | put | put | put | putting | put |
The forms of HAVE are tagged as in Table 2.12.
HVP | present tense forms of the verb HAVE: have, 've, has, 's |
HVD | past tense form of the verb HAVE: had, 'd |
HV | infinitive form of the verb HAVE: have |
HAG | present participle form of the verb HAVE: having |
HVN | past participle form of the verb HAVE: had |
HAVE has the non-contracted inflections of Table 2.13.
Tensed forms | Tenseless forms | ||||
---|---|---|---|---|---|
Tense | Infinitive (HV) | Participles | |||
Present (HVP) | Past (HVD) | Present (HAG) | Past (HVN) | ||
3rd person singular | Other | ||||
has | have | had | have | having | had |
The forms of BE are tagged as in Table 2.14.
BEP | present tense forms of the verb BE: is, am, are, 'm, 're, 's |
BED | past tense forms of the verb BE: was, were |
BE | infinitive form of the verb BE: be |
BAG | present participle form of the verb BE: being |
BEN | past participle form of the verb BE: been |
Table 2.15 presents an overview of the eight different non-contracted forms of BE. This is the widest range of distinct forms for the same verb lexme in English, with extra person-number contrasts in the past and present tenses.
Tensed forms | Tenseless forms | ||||||
---|---|---|---|---|---|---|---|
Tense | Infinitive (BE) | Participles | |||||
Present (BEP) | Past (BED) | Present (BAG) | Past (BEN) | ||||
3rd person singular | 1st person singular | Other | singular | plural | |||
is | am | are | was | were | be | being | been |
The forms of DO are tagged as in Table 2.16.
DOP | present tense forms of the verb DO: do, does, 's |
DOD | past tense form of the verb DO: did |
DO | infinitive form of the verb DO: do |
DAG | present participle form of the verb DO: doing |
DON | past participle form of the verb DO: done |
DO has the the non-contracted inflections of Table 2.17.
Tensed forms | Tenseless forms | ||||
---|---|---|---|---|---|
Tense | Infinitive (DO) | Participles | |||
Present (DOP) | Past (DOD) | Present (DAG) | Past (DON) | ||
3rd person singular | Other | ||||
does | do | did | do | doing | done |
Modal verbs express meanings such as certainty, ability, or obligation. The main modal verbs are WILL, WOULD, CAN, COULD, MAY, MIGHT, SHALL, SHOULD, MUST and OUGHT. A modal verb only has finite forms and has no suffixes (e.g., I sing — he sings, but I must — he must). Modal verbs are tagged as in Table 2.18.
Table 2.18: Tag for modal verbs
MD;~cat_Vi | modal auxiliary verb (e.g., will, would, can, could, 'll, 'd) |
MD;~cat_Vt | modal catenative (ought, used) |
EX | existential there, i.e., there of the there is ... or there are ... construction co-occurring with an existential subject (NP-ESBJ) |
PRO;_cleft_ | cleft it occuring as part of a cleft construction (so it was you that got them together) |
PRO;_expletive_ | expletive it e.g., occuring in a weather construction (it's raining) |
PRO;_provisional_ | provisional it occuring with extraposition (it bothered her that she probably would never know) |
Besides verbs, other clause level components are words with the tags of Table 2.19.
Table 2.19: Tags for clause level components
NEG | negative particle not |
NEG;_clitic_ | negative clitic particle n't |
TO | Infinitive marker to |
CONJ;_cl_ | discourse coordination (e.g., And, But) |
We can see some of these clause level components in the annotation of (2.36). This begins with there (EX) to create an existential construction, and includes the negative clitic particle n't (NEG;_clitic_). The IP-INF-CAT as selected complement of the existential verb (BED;~ex_cat_Vt; see section 8.2.5) includes infinitive marker to (TO).
It is also possible for a word tagged RP to occur as a clause level component. The RP tag is used to mark adverbial particles (e.g., up, off, out) and was seen in section 2.4 as the tag for a word that can head an adverb phrase. When an RP tagged word occurs as a clause level component it is part of a phrasal verb, as in (2.37).
So far we have considered words that serve as components of either phrases or clauses. There is a further class of words with the tags of Table 2.20 that serve as the means to connect phrases and clauses.
Table 2.20: Tags for connective words
CONJ | Coordinating conjunction (e.g., and, or, but) |
C | The complementizer that |
WQ | Marker of indirect question (whether or if) |
P-CONN | Subordinating conjunction (e.g., although, when) |
P-ROLE | Role preposition (e.g., in, of, under) |
When phrases and clauses are connected they are said to be COMPLEX.
Punctuation points, quotation marks, and brackets (‘.’ ‘?’ ‘!’ ‘:’ ‘;’ ‘,’ ‘-’ ‘(’ ‘)’ etc.) are treated as words for the purposes of word tagging with the tags of Table 2.21.
Table 2.21: Tags for punctuation
PUNC | Punctuation: general separating mark — i.e., . , ! : ; - or ? |
PULB | Punctuation: left bracket — i.e., ( or [ |
PURB | Punctuation: right bracket — i.e., ) or ] |
PULQ | Punctuation: left quotation mark — i.e., ‘ or “ |
PURQ | Punctuation: right quotation mark — i.e., ’ or ” |
This makes punctuation part of a sentence in its own right. When creating constituent structure, punctuation is placed as high as possible. For example, a full stop that ends a sentence is treated as the last constituent of the highest clause layer (IP/CP/FRAG).
Interjections, reaction signals, and formulaic expressions are treated as single words with the tags of Table 2.22. They have a high placement in structure, typically occurring as elements of clause or fragment layers.
Table 2.22: Tags for interjection, reaction signals, and formulaic expressions
INTJ | Interjection (e.g., aah, eh, ummmmm) |
REACT | Reaction signal (e.g., good_grief, really, yes, wow) |
FRM | Formulaic expression (e.g., good_afternoon, you_see, thank_you) |
Tags for other possible elements of a parse are given in Table 2.23.
FO | Formula |
FW | Foreign word |
LS | List item (e.g., 1, a, i) |
SYM | Symbol |