TriSCA

SCA Documentation

TriSCA (Tri's Sound Change Applier, /ˈtɹaɪ.skə/ or /tɹaɪ ɛs si eɪ/) allows you to automatically apply a set of sound changes to a list of words. It is inspired by SCA2 by Mark Rosenfelder, but features a number of changes. All code is open-source under the GNU GPLv3 license and is available here.

UI

Rules - the list of sound changes that will be applied. The format is detailed below.
Input - put your list of words here, separated by spaces or newlines.
Output - when "Apply" is pressed, the sounds changes will be applied and the results will be shown here.
Apply - apply the sound changes
Help - open this help page (in a new tab)
IPA Chart - open Wikipedia's IPA chart (in a new tab)

Basic syntax

The basic syntax is very similar to that used in the Index Diachronica. An example sound change is k/kʷ/_u, which reads as "replace all instance of k with whenever followed by u". The section before the first slash is what to find (the target), after the first slash is what to replace it with (the replacement), and (optionally) after the second slash is the environment to perform the replacement in (the context). Each rule must be listed on its own line in the "Rules" field. The first slash can optionally be replaced with an arrow (→) to improve readability.

If the replacement is left empty, the rule will delete anything matching it. If the context is left empty or ommited the rule will apply to all matches of the target. The target cannot be left empty.

Any line in the "Rules" field beginning with a single semicolon (;) will be ignored, which can be used to add comments. See "Intermediate results" below for lines beginning with two semicolons.

Reserved characters

The following characters either perform a special function or may in the future without warning. Avoid these unless you're using them for their designated purpose.

/ → = { } [ ] ( ) _ # , . ; 0 1 2 3 4 5 6 7 8 9 ? + *

Whitespace (besides newlines) in the rules list is ignored.

Categories

Say you have the following sound changes:

k/kʷ/_u
g/gʷ/_u
ŋ/ŋʷ/_u

These all follow the same format, so it would be nice if you could write them as one rule. To do this, define the category K=kgŋ. You can then use this category to rewrite the three rules above as K/Kʷ/_u.

On the left-hand side of the equals sign is the category's name, which must be a single character. On the right is a list of segments to include. If no commas are used, each character will be counted as its own segment, for example V=aeiou will create the class V containing a, e, i, o, and u. If you require multiple characters per segment, use commas to separate them. For example, A=tʃ,dʒ creates a category containing tʃ and dʒ. Categories should be written in the "Rules" field above your sound changes, each on their own line.

If you have a category you only need to use once, instead of defining it you can create a nonce category, a list of segments enclosd in {curly braces}. For example, the above rule could be rewritten as {kgŋ}/{kgŋ}ʷ/_u. These can also be comma-seperated, for example {td}/{tʃ,dʒ}/_i will replace t with tʃ and d with dʒ before i. Named categories can be included in nonce categories, resulting in the named category's contents being "extracted" into the nonce category. For example,

P=ptk
B=bdg
{Pfsʃ}/{Bvzʒ}/V_V

is equivalent to {ptkfsʃ}/{bdgvzʒ}/V_V. Named categories cannot be nested inside other named categories.

Category indexing

If multiple categories are used in the target or the replacement they will automatically be paired up based on the order they appear in. For example, in {td}{iu}/{cɟ}{eo} {td} is paired with {cɟ} and {iu} is paired with {eo}. Sometimes this does not give the desired results. In {td}{iu}/ɟ{eo}, {td} instead gets paired with {eo} and {iu} is left unpaired. To fix this, matching numbers can be inserted after the categories that you want to be paired. The previous example would then become {td}{iu}1/ɟ{eo}1. This works for both categories and nonce categories.

If two categories are given the same index in the target they will also be paired. In N1sN1/M1, for example, the two instances of N1 must be the same.

Category indexing in the context is not done by default, but can be done by explicitly adding numbers. This indexing is independent from that in the target and replacement, so in P1/D1/A1_E1, P1 and D1 will be paired, A1 and E1 will be paired, but no further pairing will occur. This will result in the rule only applying when the character before the match has the same index in A as the character after the match has in E.

Word boundaries

Use the # symbol (called a hash, pound sign, or octothorpe) to indicate a word boundary. This can be used before the _ (where it will search for the beginning of a word) or after the _ (where it will search for the end). Any characters before a beginning # or after an ending # will be ignored.

Word boundaries can be included as an option in a group by adding # as an element of the group. For example, e/i/_{mnŋ#} will change e to i whenever followed by a m, n, ŋ, or a word boundary. The # must be in its own option, it can't be used as part of another option, so s/ʃ/_{i,e#} would not work.

Optional segments

By placing a question mark (?) after a segment (a single character, collection, or word boundary) will mark it as optional. If this segment cannot be matched it will be skipped. Optional segments are greedy, meaning they will match whenever they can, and if the match fails later they will not backtrack.

Wildcard character

The asterisk (*) will match any single character. Note that this cannot be made optional (as it will match as long as the word has not ended), and it will not be paired with an asterisk in the replacement.

Rule exceptions

Sometimes you may want a rule to apply in all contexts except some. This can be accomplished by adding another slash after the context, and specifying the context the rule should not apply after this slash. For example, r/ɾ/_/_{fv} will replace r with ɾ everywhere except when followed by f or v. If the exception string is a single _, it will be ignored.

Intermediate results

By default, only the final words are show. The checkbox "Show output as before → after" will show both the initial and final form of the word. To show intermediate forms, add a line beginning with ;; to the rules list and check "Show intermediate results" (this implicitly enables the previous checkbox too). If desired, you may leave a comment after the two semicolons.

Unicode wackyness

If the "Normalize input and output" box is checked, all input text is normalized, meaning that whenever possible combining diacritics are merged with the characters they modify to create single characters. For example, is made up of two characters (the letter 'e' and a combining accent), so it gets converted to é, a single character representing 'e' with an accent. You will usually want to keep this checked, but in some cases (eg. specifying tone or stress) you may want to uncheck this.

This fixes most of the problems, but not all combinations of diacrics and characters gets its own codepoint. An example where this might be an issue is {ɛɨ}/{ɛ̃ɨ̃}/_n. Since the ɛ̃ and ɨ̃ in this example are composed of two characters, the second category gets parsed into [ɛ, ˜, ɨ, ˜], which is clearly not what is intended. A similar problem occurs with the ? operator, since it will only apply to the character immediately preceding it. In general avoid using combining diacritics, but if you must use them you can comma-separate your collections ({ɛ,ɨ}/{ɛ̃,ɨ̃}/_n).

Debugging, bug reports, and contact

If your sound changes aren't working as expected, check "Log all changes applied to the console", open the browser console (Ctrl+Shift+K on Firefox, Ctrl+Shift+J on Chrome) and click "Apply" again to view a list of all changes that were applied. For more complete debug information, check "Print debug information to the console".

If you find a bug, please create an issue at this repository's issues page. If applicable, make sure to include example rules and input, expected output, actual output, and the debug output as described above.

For questions or help, feel free to DM TriMill#6898 on Discord.