pythonimmediate package

Submodules

Module contents

The main module. Contains Pythonic wrappers for much of \(\TeX\)’s API.

Refer to simple for the “simple” API – which allows users to avoid the need to know \(\TeX\) internals such as category codes.

The fundamental data of \(\TeX\) is a token, this is represented by Python’s Token object. A list of tokens is represented by TokenList object. If it’s balanced, BalancedTokenList should be used.

With that, you can manipulate the \(\TeX\) input stream with BalancedTokenList.get_next(), BalancedTokenList.get_until(), TokenList.put_next().

Furthermore, executing \(\TeX\) code is possible using continue_until_passed_back(). For example, the following code:

TokenList(r"\typeout{123}\pythonimmediatecontinuenoarg").put_next()
continue_until_passed_back()

will just use \(\TeX\) to execute the code \typeout{123}.

With the 3 functions above, you can do everything that can be done in \(\TeX\) (although maybe not very conveniently or quickly). Some other functions are provided, and for educational purposes, the way to implement it using the primitive functions are discussed.

expand_once(): TokenList(r"\expandafter\pythonimmediatecontinuenoarg").put_next(); continue_until_passed_back()
BalancedTokenList.expand_o(): TokenList(r"\expandafter\pythonimmediatecontinuenoarg\expandafter", self).put_next(); continue_until_passed_back(); return BalancedTokenList.get_next()

For example, if the current token list is test, the lines above will:
- put \expandafter\pythonimmediatecontinuenoarg\expandafter{\test} following in the input stream,
- pass control to \(\TeX\),
- after one expansion step, the input stream becomes \pythonimmediatecontinuenoarg{⟨content of \test⟩},
- \pythonimmediatecontinuenoarg is executed, and execution is returned to Python,
- finally BalancedTokenList.get_next() gets the content of \test, as desired.
TokenList.execute(): (self+TokenList(r"\pythonimmediatecontinuenoarg")).put_next(); continue_until_passed_back()
NToken.put_next(): TokenList("\expandafter\pythonimmediatecontinuenoarg\noexpand\abc").put_next(); continue_until_passed_back() (as an example of putting a blue \abc token following in the input stream)
etc.

This is a table of \(\TeX\) primitives, and their Python wrapper:

\(TeX\)	Python
`\let`	`Token.set_eq()`
`\ifx`	`NToken.meaning_eq()`
`\meaning`	`NToken.meaning_str()`
`\futurelet`	`Token.set_future()`, `Token.set_future2()`
`\def`	`Token.tl()` (no parameter), `Token.set_func()` (define function to do some task)
`\edef`	`BalancedTokenList.expand_x()`
Get undelimited argument	`BalancedTokenList.get_next()`
Get delimited argument	`BalancedTokenList.get_until()`, `BalancedTokenList.get_until_brace()`
`\catcode`	`catcode`
`\count`	`count`, `Token.int()`
`\Umathcode`	`umathcode`
`\detokenize`	`BalancedTokenList.detokenize()`
`\begingroup`, `\endgroup`	`group`

In order to get a “value” stored in a “variable” (using expl3 terminology, this has various meanings e.g. a \countdef token, or a typical macro storing a token list), use a property on the token object itself:

Token.int() for \int_use:N \int_set:Nn,
Token.tl() for \tl_use:N \tl_set:Nn,
Token.str() for \str_use:N \str_set:Nn,
Token.bool(),
etc.

A token list can be:

interpreted as a string (provide it is already a string) using TokenList.str(),
converted from a Python string (opposite of the operation above) using TokenList.fstr(),
interpreted as an integer using TokenList.int(),
detokenized using BalancedTokenList.detokenize(),
expanded with BalancedTokenList.expand_x() or BalancedTokenList.expand_o(),
etc.

Some debug functionalities are provided and can be specified on the command-line, refer to pytotex documentation.

class pythonimmediate.BalancedTokenList(a: ~typing.Iterable = (), string_tokenizer: ~typing.Callable[[str], ~pythonimmediate.TokenList] = <bound method TokenList.e3 of <class 'pythonimmediate.TokenList'>>)[source]

Bases: TokenList

Represents a balanced token list.

Some useful methods to interact with \(\TeX\) include expand_o(), expand_x(), get_next() and put_next(). See the corresponding methods’ documentation for usage examples.

See also Token list construction for shorthands to construct token lists in Python code.

Note

Runtime checking is not strictly enforced, use is_balanced() method explicitly if you need to check.

detokenize() → str[source]

Returns: a string, equal to the result of \detokenize applied to this token list.

execute() → None[source]

Execute this token list. It must not “peek ahead” in the input stream.

For example the token list \catcode1=2\relax can be executed safely (and sets the corresponding category code), but there’s no guarantee what will be assigned to \tmp when \futurelet\tmp is executed.

expand_estr() → str[source]

Expand this token list according to Note on argument expansion of estr-type functions.

It’s undefined behavior if the expansion result is unbalanced.

expand_o() → BalancedTokenList[source]

Return the o-expansion of this token list.

The result must be balanced, otherwise the behavior is undefined.

expand_x() → BalancedTokenList[source]

Return the x-expansion of this token list.

The result must be balanced, otherwise the behavior is undefined.

static get_next() → BalancedTokenList[source]: Get an (undelimited) argument from the \(\TeX\) input stream.

static get_until(delimiter: BalancedTokenList, remove_braces: bool = True, long: bool = True) → BalancedTokenList[source]

Get a delimited argument from the \(\TeX\) input stream, delimited by delimiter.

The delimiter itself will also be removed from the input stream.

Parameters: long – Works the same as \long primitive in \(\TeX\) – if this is False then \(\TeX\) fatal error Runaway argument will be raised if there’s a \par token in the argument.

static get_until_brace(long: bool = True) → BalancedTokenList[source]: Get a TokenList from the input stream delimited by {. The brace is not removed from the input stream.

parse_keyval(allow_duplicate: bool = False) → dict[ImmutableBalancedTokenList, Optional[BalancedTokenList]][source]

Parse a key-value token list into a dictionary.

>>> BalancedTokenList("a=b,c=d").parse_keyval()
{<ImmutableBalancedTokenList: a₁₁>: <BalancedTokenList: b₁₁>, <ImmutableBalancedTokenList: c₁₁>: <BalancedTokenList: d₁₁>}
>>> BalancedTokenList("a,c=d").parse_keyval()
{<ImmutableBalancedTokenList: a₁₁>: None, <ImmutableBalancedTokenList: c₁₁>: <BalancedTokenList: d₁₁>}
>>> BalancedTokenList.doc("a = b , c = d").parse_keyval()
{<ImmutableBalancedTokenList: a₁₁>: <BalancedTokenList: b₁₁>, <ImmutableBalancedTokenList: c₁₁>: <BalancedTokenList: d₁₁>}
>>> BalancedTokenList.doc("a ={ b,c }, c = { d}").parse_keyval()
{<ImmutableBalancedTokenList: a₁₁>: <BalancedTokenList:  ₁₀ b₁₁ ,₁₂ c₁₁  ₁₀>, <ImmutableBalancedTokenList: c₁₁>: <BalancedTokenList:  ₁₀ d₁₁>}
>>> BalancedTokenList("a=b,a=c").parse_keyval()
Traceback (most recent call last):
    ...
ValueError: Duplicate key: <ImmutableBalancedTokenList: a₁₁>
>>> BalancedTokenList("a=b,a=c").parse_keyval(allow_duplicate=True)
{<ImmutableBalancedTokenList: a₁₁>: <BalancedTokenList: c₁₁>}

parse_keyval_items() → list[tuple[BalancedTokenList, Optional[BalancedTokenList]]][source]

Parse a key-value token list into a list of pairs.

>>> BalancedTokenList("a=b,c=d").parse_keyval_items()
[(<BalancedTokenList: a₁₁>, <BalancedTokenList: b₁₁>), (<BalancedTokenList: c₁₁>, <BalancedTokenList: d₁₁>)]
>>> BalancedTokenList("a,c=d").parse_keyval_items()
[(<BalancedTokenList: a₁₁>, None), (<BalancedTokenList: c₁₁>, <BalancedTokenList: d₁₁>)]
>>> BalancedTokenList.doc("a = b , c = d").parse_keyval_items()
[(<BalancedTokenList: a₁₁>, <BalancedTokenList: b₁₁>), (<BalancedTokenList: c₁₁>, <BalancedTokenList: d₁₁>)]
>>> BalancedTokenList.doc("a ={ b,c }, c = { d}").parse_keyval_items()
[(<BalancedTokenList: a₁₁>, <BalancedTokenList:  ₁₀ b₁₁ ,₁₂ c₁₁  ₁₀>), (<BalancedTokenList: c₁₁>, <BalancedTokenList:  ₁₀ d₁₁>)]
>>> BalancedTokenList.doc("{a=b},c=d").parse_keyval_items()
[(<BalancedTokenList: {₁ a₁₁ =₁₂ b₁₁ }₂>, None), (<BalancedTokenList: c₁₁>, <BalancedTokenList: d₁₁>)]

put_next() → None[source]: Put this token list forward in the input stream.

split_balanced(sep: BalancedTokenList, maxsplit: int = -1, do_strip_braces_in_result: bool = True) → List[BalancedTokenList][source]

Split the given token list at the given delimiter, but only if the parts are balanced.

Parameters

sep – the delimiter.
maxsplit – the maximum number of splits.
do_strip_braces_in_result –
if True, each element of the result will have the braces stripped, if any.

It is recommended to set this to True (the default), otherwise the user will not have any way to “quote” the separator in each entry.

Raises

ValueError – if self or sep is not balanced.

For example:

>>> BalancedTokenList("a{b,c},c{d}").split_balanced(BalancedTokenList(","))
[<BalancedTokenList: a₁₁ {₁ b₁₁ ,₁₂ c₁₁ }₂>, <BalancedTokenList: c₁₁ {₁ d₁₁ }₂>]
>>> BalancedTokenList("a{b,c},{d,d},e").split_balanced(BalancedTokenList(","), do_strip_braces_in_result=False)
[<BalancedTokenList: a₁₁ {₁ b₁₁ ,₁₂ c₁₁ }₂>, <BalancedTokenList: {₁ d₁₁ ,₁₂ d₁₁ }₂>, <BalancedTokenList: e₁₁>]
>>> BalancedTokenList("a{b,c},{d,d},e").split_balanced(BalancedTokenList(","))
[<BalancedTokenList: a₁₁ {₁ b₁₁ ,₁₂ c₁₁ }₂>, <BalancedTokenList: d₁₁ ,₁₂ d₁₁>, <BalancedTokenList: e₁₁>]
>>> BalancedTokenList.doc(" a = b = c ").split_balanced(BalancedTokenList("="), maxsplit=1)
[<BalancedTokenList:  ₁₀ a₁₁  ₁₀>, <BalancedTokenList:  ₁₀ b₁₁  ₁₀ =₁₂  ₁₀ c₁₁  ₁₀>]
>>> BalancedTokenList(r"\{,\}").split_balanced(BalancedTokenList(","))
[<BalancedTokenList: \{>, <BalancedTokenList: \}>]

strip_optional_braces() → BalancedTokenList[source]

Strip the optional braces from the given token list, if the whole token list is wrapped in braces.

For example:

>>> BalancedTokenList("{a}").strip_optional_braces()
<BalancedTokenList: a₁₁>
>>> BalancedTokenList("a").strip_optional_braces()
<BalancedTokenList: a₁₁>
>>> BalancedTokenList("{a},{b}").strip_optional_braces()
<BalancedTokenList: {₁ a₁₁ }₂ ,₁₂ {₁ b₁₁ }₂>
>>> BalancedTokenList([C.begin_group("X"), C.other("a"), C.end_group("Y")]).strip_optional_braces()
<BalancedTokenList: a₁₂>

Note that BalancedTokenList is mutable. A copy is returned in any case:

>>> x=BalancedTokenList("a")
>>> y=x.strip_optional_braces()
>>> x is y
False
>>> x.append(C.letter("b"))
>>> x
<BalancedTokenList: a₁₁ b₁₁>
>>> y
<BalancedTokenList: a₁₁>

strip_spaces() → BalancedTokenList[source]

Strip spaces from the beginning and end of the token list.

For example:

>>> BalancedTokenList.doc(" a ").strip_spaces()
<BalancedTokenList: a₁₁>
>>> BalancedTokenList([C.space(' '), C.space(' '), " a b "], BalancedTokenList.doc).strip_spaces()
<BalancedTokenList: a₁₁  ₁₀ b₁₁>
>>> BalancedTokenList().strip_spaces()
<BalancedTokenList: >

Note that only spaces with charcode 32 are stripped:

>>> BalancedTokenList([C.space('X'), C.space(' '), "a", C.space(' ')]).strip_spaces()
<BalancedTokenList: X₁₀  ₁₀ a₁₁>

Similar to strip_optional_braces(), a copy is returned in any case:

>>> x=BalancedTokenList("a")
>>> y=x.strip_spaces()
>>> x is y
False

class pythonimmediate.BlueToken(token: Token)[source]

Bases: NToken

Represents a blue token (see documentation of NToken).

property no_blue: Token: Return the result of this token after being “touched”, which drops its blue status if any.

property noexpand: BlueToken: Return the result of \noexpand applied on this token.

put_next() → None[source]: Put this token forward in the input stream.

pythonimmediate.C: alias of Catcode

class pythonimmediate.Catcode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Enum, consist of begin_group, end_group, etc.

The corresponding enum value is the \(\TeX\) code for the catcode:

>>> Catcode.letter.value
11

This class contains a shorthand to allow creating a token with little Python code. The individual Catcode objects can be called with either a character or a character code to create the object:

>>> C.letter("a")  # creates a token with category code letter and character code "a"=chr(97)
<Token: a₁₁>
>>> C.letter(97)  # same as above
<Token: a₁₁>

Both of the above forms are equivalent to CharacterToken(index=97, catcode=Catcode.letter).

Another shorthand is available to check if a token has a particular catcode. Note that it is not safe to access CharacterToken.catcode directly, as it is not available for all tokens.

>>> C.letter("a") in C.letter
True
>>> C.letter("a") in C.space
False
>>> T.a in C.letter
False
>>> C.letter("a").catcode==C.letter
True
>>> T.a.catcode==C.letter
Traceback (most recent call last):
    ...
AttributeError: 'ControlSequenceToken' object has no attribute 'catcode'

The behavior with blue tokens might be unexpected, be careful:

>>> C.active("a").blue in C.active
True
>>> T.a.blue in C.letter
False
>>> T.a.blue in C.active
False

See also Token list construction for more ways of constructing token lists.

property for_token: bool

Return whether a CharacterToken may have this catcode.

>>> Catcode.escape.for_token
False
>>> Catcode.letter.for_token
True

static lookup(x: int) → Catcode[source]

Construct from \(\TeX\) code.

>>> C.lookup(11)
<Catcode.letter: 11>

class pythonimmediate.CharacterToken(index: int, catcode: Catcode)[source]

Bases: Token

Represent a character token. The preferred way to construct a character token is using C.

property assignable: bool: Whether this token can be assigned to i.e. it’s control sequence or active character.

property can_blue: bool: Return whether this token can possibly be blue i.e. expandable.

catcode: Catcode

>>> C.letter("a").catcode
<Catcode.letter: 11>

Note that it is recommended to use the shorthand documented in Catcode to check the catcode of a token instead:

>>> C.letter("a") in C.letter
True

property chr: str

The character of this token.

>>> C.letter("a").chr
'a'

degree() → int[source]: return the imbalance degree for this token ({ -> 1, } -> -1, everything else -> 0)

index: int

The character code of this token.

>>> C.letter("a").index
97

serialize() → str[source]: Internal function, serialize this token to be able to pass to \(\TeX\).

simple_detokenize(get_catcode: Callable[[int], Catcode]) → str[source]: Simple approximate detokenizer, implemented in Python.

str_code() → int[source]

self must represent a character of a \(\TeX\) string. (i.e. equal to itself when detokenized)

Returns: the character code.

Note

See TokenList.str_codes().

class pythonimmediate.ControlSequenceToken(csname: Union[str, bytes, list[int], tuple[int, ...]], is_unicode: Optional[bool] = None)[source]

Bases: Token

Represents a control sequence:

>>> ControlSequenceToken("abc")
<Token: \abc>
>>> ControlSequenceToken([97, 98, 99])
<Token: \abc>

The preferred way to construct a control sequence is T.

Some care is needed to construct control sequence tokens whose name contains Unicode characters, as the exact token created depends on whether the engine is Unicode-based:

>>> with default_engine.set_engine(None):  # if there's no default_engine...
...     ControlSequenceToken("×")  # this will raise an error
Traceback (most recent call last):
    ...
AssertionError: Cannot construct a control sequence with non-ASCII characters without specifying is_unicode

The same control sequences may appear differently on Unicode and non-Unicode engines, and conversely, different control sequences may appear the same between Unicode and non-Unicode engines:

>>> a = ControlSequenceToken("u8:×", is_unicode=False)
>>> a
<Token: \u8:×>
>>> a == ControlSequenceToken(b"u8:\xc3\x97", is_unicode=False)
True
>>> a.codes
(117, 56, 58, 195, 151)
>>> b = ControlSequenceToken("u8:×", is_unicode=True)
>>> b
<Token: \u8:×>
>>> b.codes
(117, 56, 58, 215)
>>> a == b
False
>>> a == ControlSequenceToken("u8:\xc3\x97", is_unicode=True)
True

Generally, the default way to construct the control sequence will give you what you want.

>>> with ChildProcessEngine("pdftex") as engine, default_engine.set_engine(engine):
...     print(T["u8:×"].meaning_str())
...     print(T["u8:×".encode('u8')].meaning_str())
macro:->\IeC {\texttimes }
macro:->\IeC {\texttimes }
>>> with ChildProcessEngine("luatex") as engine, default_engine.set_engine(engine):
...     print(C.active("\xAD").meaning_str())  # discretionary hyphen
...     BalancedTokenList([r"\expandafter\def\csname\string", C.active("\xAD"), r"\endcsname{123}"]).execute()
...     print(T["\xAD"].meaning_str())  # just a convoluted test since no control sequence with non-ASCII name is defined by default in LuaTeX (that I know of)
macro:->\-
macro:->123

is_unicode will be fetched from default_engine if not explicitly specified.

property assignable: bool: Whether this token can be assigned to i.e. it’s control sequence or active character.

property codes: Tuple[int, ...]: Return the codes of this control sequence – that is, if \detokenize{...} is applied on this token, the tokens with the specified character codes (plus \escapechar) will result.

property csname: str: Return some readable name of the control sequence. Might return None if the name is not representable in UTF-8.

make = <pythonimmediate.ControlSequenceTokenMaker object>: Refer to the documentation of ControlSequenceTokenMaker.

serialize() → str[source]: Internal function, serialize this token to be able to pass to \(\TeX\).

simple_detokenize(get_catcode: Callable[[int], Catcode]) → str[source]: Simple approximate detokenizer, implemented in Python.

class pythonimmediate.ControlSequenceTokenMaker(prefix: str)[source]

Bases: object

Shorthand to create ControlSequenceToken objects in Python easier.

>>> from pythonimmediate import T
>>> assert T is ControlSequenceToken.make
>>> T.hello
<Token: \hello>
>>> T["a@b"]  # for the "harder to construct" tokens
<Token: \a@b>
>>> P=ControlSequenceTokenMaker("__mymodule_")
>>> P.a
<Token: \__mymodule_a>

pythonimmediate.DimensionUnit

\(\TeX\) dimension units. ex and em are font-dependent, so excluded.

alias of Literal[‘pt’, ‘in’, ‘pc’, ‘cm’, ‘mm’, ‘bp’, ‘dd’, ‘cc’, ‘sp’]

class pythonimmediate.ImmutableBalancedTokenList(a: BalancedTokenList)[source]

Bases: Sequence, Hashable

Represents an immutable balanced token list.

Note that this class is not a subclass of TokenList, and is not mutable.

Not many operations are supported. Convert to BalancedTokenList to perform more operations.

Its main use is to be used as a key in a dictionary.

>>> a=ImmutableBalancedTokenList(BalancedTokenList.e3(r'\def\a{b}'))
>>> b=ImmutableBalancedTokenList(BalancedTokenList.e3(r'\def\a{b}'))
>>> c=ImmutableBalancedTokenList(BalancedTokenList.e3(r'\def\a{c}'))
>>> hash(a)==hash(b)
True
>>> a==b
True
>>> a!=b
False
>>> a==c
False
>>> a!=c
True

class pythonimmediate.MathClass(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]: Bases: Enum

class pythonimmediate.NToken[source]

Bases: ABC

Represent a possibly-notexpanded token. For convenience, a notexpanded token is called a blue token. It’s not always possible to determine the notexpanded status of a following token in the input stream.

Implementation note: Token objects must be frozen.

degree() → int[source]: return the imbalance degree for this token ({ -> 1, } -> -1, everything else -> 0)

meaning_eq(other: NToken) → bool[source]

Whether this token is the same in meaning as the token specified in the parameter other. Equivalent to \(\TeX\)’s \ifx.

Note that two tokens might have different meaning despite having equal meaning_str().

meaning_str(escapechar: Optional[Union[int, str]] = None) → str[source]

Get the meaning of this token as a string.

>>> C.other("-").meaning_str()
'the character -'
>>> T.relax.meaning_str(escapechar="?")
'?relax'
>>> T.relax.meaning_str()
'\\relax'

Note that all blue tokens have the meaning equal to \relax (or [unknown command code! (0, 1)] in a buggy LuaTeX implementation) with the backslash replaced by the current escapechar.

abstract property no_blue: Token: Return the result of this token after being “touched”, which drops its blue status if any.

abstract property noexpand: NToken: Return the result of \noexpand applied on this token.

abstract put_next() → None[source]: Put this token forward in the input stream.

str_code() → int[source]

self must represent a character of a \(\TeX\) string. (i.e. equal to itself when detokenized)

Returns: the character code.

Note

See TokenList.str_codes().

class pythonimmediate.NTokenList(a: ~typing.Iterable = (), string_tokenizer: ~typing.Callable[[str], ~pythonimmediate.TokenList] = <bound method TokenList.e3 of <class 'pythonimmediate.TokenList'>>)[source]

Bases: UserList

Similar to TokenList, but can contain blue tokens.

The class can be used identical to a Python list consist of NToken objects, plus some additional methods to operate on token lists.

Refer to the documentation of TokenList for some usage example.

execute() → None[source]: See BalancedTokenList.execute().

expand_x() → BalancedTokenList[source]: See BalancedTokenList.expand_x().

is_balanced() → bool[source]: Check if this is balanced.

put_next() → None[source]: See BalancedTokenList.put_next().

simple_parts() → List[Union[BalancedTokenList, Token, BlueToken]][source]

Internal function.

Split this NTokenList into a list of balanced non-blue parts, unbalanced {/} tokens, and blue tokens.

class pythonimmediate.RedirectPrintTeX(t: Optional[IO])[source]

Bases: object

A context manager. Use like this, where t is some file object:

with RedirectPrintTeX(t):
    pass  # some code

Then all print_TeX() function calls will be redirected to t.

pythonimmediate.T = <pythonimmediate.ControlSequenceTokenMaker object>: See ControlSequenceTokenMaker.

class pythonimmediate.TTPBalancedTokenList(a: ~typing.Iterable = (), string_tokenizer: ~typing.Callable[[str], ~pythonimmediate.TokenList] = <bound method TokenList.e3 of <class 'pythonimmediate.TokenList'>>)[source]

Bases: TeXToPyData, BalancedTokenList

static read() → TTPBalancedTokenList[source]: Given that \(\TeX\) has just sent the data, read into a Python object.

send_code() → str

send_code_var() → str

class pythonimmediate.Token[source]

Bases: NToken

Represent a \(\TeX\) token, excluding the notexpanded possibility. See also documentation of NToken.

abstract property assignable: bool: Whether this token can be assigned to i.e. it’s control sequence or active character.

property blue: BlueToken: Return a BlueToken containing self. can_blue must be true.

bool() → bool[source]

Manipulate an expl3 bool variable.

>>> BalancedTokenList(r'\bool_set_true:N \l_tmpa_bool').execute()
>>> T.l_tmpa_bool.bool()
True

abstract property can_blue: bool: Return whether this token can possibly be blue i.e. expandable.

defined() → bool[source]: Return whether this token is defined, that is, its meaning is not undefined.

static deserialize(s: str | bytes) → Token[source]

See documentation of TokenList.deserialize().

Always return a single token.

static deserialize_bytes(data: bytes) → Token[source]

See documentation of TokenList.deserialize_bytes().

Always return a single token.

dim(unit: DimensionUnit, val: int) → int[source]

dim(unit: DimensionUnit, val: float) → float

dim(unit: DimensionUnit, val: Fraction) → Fraction

dim(unit: DimensionUnit) → Fraction

dim(unit: str) → Any

dim(val: float | fractions.Fraction, unit: DimensionUnit) → Fraction

dim() → str

Manipulate an expl3 dimension variable.

>>> T.l_tmpa_dim.dim("100.5pt")
>>> T.l_tmpa_dim.dim()
'100.5pt'
>>> T.l_tmpa_dim.dim(100.5, "pt")
100.5
>>> T.l_tmpa_dim.dim("pt")
Fraction(201, 2)
>>> T.l_tmpa_dim.dim("1em")
>>> T.l_tmpa_dim.dim(1, "em")
1
>>> T.l_tmpa_dim.dim("em")
Traceback (most recent call last):
    ...
ValueError: Unknown unit "em"
>>> T.l_tmpa_dim.dim(100.5)
Traceback (most recent call last):
    ...
ValueError: Explicit unit is required (e.g. "cm")
>>> T.l_tmpa_dim.dim("6586368sp")
>>> T.l_tmpa_dim.dim("sp")
Fraction(6586368, 1)

estr() → str[source]

Expand this token according to Note on argument expansion of estr-type functions.

It’s undefined behavior if the expansion result is unbalanced.

>>> T.l_tmpa_tl.tl(BalancedTokenList(r'ab\l_tmpb_tl'))
<BalancedTokenList: a₁₁ b₁₁ \l_tmpb_tl>
>>> T.l_tmpb_tl.tl(BalancedTokenList(r'cd123+$'))
<BalancedTokenList: c₁₁ d₁₁ 1₁₂ 2₁₂ 3₁₂ +₁₂ $₃>
>>> T.l_tmpa_tl.estr()
'abcd123+$'

..seealso::: BalancedTokenList.expand_estr()

static get_next() → Token[source]
static get_next(count: int) → TokenList: Get the following token.

Note

in LaTeX3 versions without the commit https://github.com/latex3/latex3/commit/24f7188904d6 sometimes this may error out.

Note

because of the internal implementation of \peek_analysis_map_inline:n, this may tokenize up to 2 tokens ahead (including the returned token), as well as occasionally return the wrong token in unavoidable cases.

int(val: Optional[int] = None) → int[source]

Manipulate an expl3 int variable.

>>> BalancedTokenList(r'\int_set:Nn \l_tmpa_int {5+6}').execute()
>>> T.l_tmpa_int.int()
11

Token list construction

The constructor of this class accepts parameters in various different forms to allow convenient construction of token lists.

Most generally, you can construct a token list from any iterable consist of (recursively) iterables, or tokens, or strings. For example:

>>> TokenList([Catcode.letter("a"), "bc", [r"def\gh"]])
<TokenList: a₁₁ b₁₁ c₁₁ {₁ d₁₁ e₁₁ f₁₁ \gh }₂>

This will make a be the token list with value abc{def\gh }.

Note that the list that is recursively nested inside is used to represent a nesting level. A string will be “flattened” into the closest level, but a token list will not be flattened – they can be manually flattened with Python * syntax.

As a special case, you can construct from a string:

>>> TokenList(r"\let \a \b")
<TokenList: \let \a \b>

The constructor of other classes such as BalancedTokenList and NTokenList works the same way.

The above working implies that:

If you construct a token list from an existing token list, it will be copied (because a TokenList is a UserList of tokens, and iterating over it gives Token objects), similar to how you can copy a list with the list constructor:
```
>>> a = TokenList(["hello world"])
>>> b = TokenList(a)
>>> b
<TokenList: h₁₁ e₁₁ l₁₁ l₁₁ o₁₁ w₁₁ o₁₁ r₁₁ l₁₁ d₁₁>
>>> a==b
True
>>> a is b
False
```
Construct a token list from a list of tokens:
```
>>> TokenList([Catcode.letter("a"), Catcode.other("b"), T.test])
<TokenList: a₁₁ b₁₂ \test>
```
The above will define a to be ab\test, provided T is the object referred to in ControlSequenceTokenMaker.

See also Catcode for the explanation of the Catcode.letter("a") form.

By default, strings will be converted to token lists using TokenList.e3(), although you can customize it by:

Passing the second argument to the constructor.

Manually specify the type:

>>> TokenList([T.directlua, [*TokenList.fstr(r"hello%world\?")]])
<TokenList: \directlua {₁ h₁₂ e₁₂ l₁₂ l₁₂ o₁₂ %₁₂ w₁₂ o₁₂ r₁₂ l₁₂ d₁₂ \\₁₂ ?₁₂ }₂>

property balanced: BalancedTokenList

self must be balanced.

Returns: a BalancedTokenList containing the content of this object.

balanced_parts() → List[Union[BalancedTokenList, Token]][source]

Internal function, used for serialization and sending to \(\TeX\).

Split this TokenList into a list of balanced parts and unbalanced {/} tokens.

check_balanced() → None[source]

ensure that this is balanced.

Raises: UnbalancedTokenListError – if this is not balanced.

classmethod deserialize(data: str | bytes) → TokenListType[source]: Internal function?

classmethod deserialize_bytes(data: bytes) → TokenListType[source]

Internal function.

Given a bytes object read directly from the engine, deserialize it.

classmethod doc(s: str) → TokenListType[source]

Approximate tokenizer in document (normal) catcode regime.

Refer to documentation of from_string() for details.

Usage example:

>>> BalancedTokenList.doc(r'\def\a{b}')
<BalancedTokenList: \def \a {₁ b₁₁ }₂>
>>> BalancedTokenList.doc('}')
Traceback (most recent call last):
    ...
pythonimmediate.UnbalancedTokenListError: Token list <BalancedTokenList: }₂> is not balanced
>>> BalancedTokenList.doc('\n\n')
Traceback (most recent call last):
    ...
NotImplementedError: Double-newline to \par not implemented yet!
>>> TokenList.doc('}')
<TokenList: }₂>

classmethod e3(s: str) → TokenListType[source]

Approximate tokenizer in expl3 (\ExplSyntaxOn) catcode regime.

Refer to documentation of from_string() for details.

Usage example:

>>> BalancedTokenList.e3(r'\cs_new_protected:Npn \__mymodule_myfunction:n #1 { #1 #1 }')
<BalancedTokenList: \cs_new_protected:Npn \__mymodule_myfunction:n #₆ 1₁₂ {₁ #₆ 1₁₂ #₆ 1₁₂ }₂>
>>> BalancedTokenList.e3('a\tb\n\nc')
<BalancedTokenList: a₁₁ b₁₁ c₁₁>

execute() → None[source]

Execute this token list. It must not “peek ahead” in the input stream.

For example the token list \catcode1=2\relax can be executed safely (and sets the corresponding category code), but there’s no guarantee what will be assigned to \tmp when \futurelet\tmp is executed.

expand_x() → BalancedTokenList[source]

Return the x-expansion of this token list.

The result must be balanced, otherwise the behavior is undefined.

classmethod from_string(s: str, get_catcode: Callable[[int], Catcode], endlinechar: str) → TokenListType[source]

Approximate tokenizer implemented in Python.

Convert a string to a TokenList (or some subclass of it such as BalancedTokenList) approximately.

This is an internal function and should not be used directly. Use one of e3() or doc() instead.

These are used to allow constructing a TokenList object in Python without being too verbose. Refer to Token list construction for alternatives.

The tokenization algorithm is slightly different from \(\TeX\)’s in the following respect:

multiple spaces are collapsed to one space, but only if it has character code space (32). i.e. in expl3 catcode, ~~ get tokenized to two spaces.
spaces with character code different from space (32) after a control sequence is not ignored. i.e. in expl3 catcode, ~ always become a space.
^^ syntax are not supported. Use Python’s escape syntax (e.g. ) as usual (of course that does not work in raw Python strings r"...").

Parameters: get_catcode – A function that given a character code, return its desired category code.

classmethod fstr(s: str, is_unicode: Optional[bool] = None) → TokenListType[source]

Approximate tokenizer in detokenized catcode regime.

Refer to documentation of from_string() for details. ^^J (or \n) is used to denote newlines.

>>> BalancedTokenList.fstr('hello world')
<BalancedTokenList: h₁₂ e₁₂ l₁₂ l₁₂ o₁₂  ₁₀ w₁₂ o₁₂ r₁₂ l₁₂ d₁₂>
>>> BalancedTokenList.fstr('ab\\c  d\n \t')
<BalancedTokenList: a₁₂ b₁₂ \\₁₂ c₁₂  ₁₀  ₁₀ d₁₂ \n₁₂  ₁₀ \t₁₂>

Some care need to be taken for Unicode strings.

>>> with default_engine.set_engine(None): BalancedTokenList.fstr('α')
Traceback (most recent call last):
    ...
RuntimeError: Default engine not set for this thread!
>>> with default_engine.set_engine(luatex_engine): BalancedTokenList.fstr('α')
<BalancedTokenList: α₁₂>
>>> BalancedTokenList.fstr('α')
<BalancedTokenList: Î₁₂ ±₁₂>

int() → int[source]

Assume this token list contains an integer (as valid result of \number ...), returns the integer value.

At the moment, not much error checking is done.

is_balanced() → bool[source]: See NTokenList.is_balanced().

put_next() → None[source]: Put this token list forward in the input stream.

serialize_bytes() → bytes[source]

Internal function.

Given an engine, serialize it in a form that is suitable for writing directly to the engine.

str() → str[source]

self must represent a \(\TeX\) string. (i.e. equal to itself when detokenized)

Returns: the string content.

>>> BalancedTokenList([C.other(0xce), C.other(0xb1)]).str()
'α'
>>> with default_engine.set_engine(luatex_engine): BalancedTokenList([C.other('α')]).str()
'α'

str_codes() → list[int][source]

self must represent a \(\TeX\) string. (i.e. equal to itself when detokenized)

Returns: the string content.

>>> BalancedTokenList("abc").str_codes()
Traceback (most recent call last):
    ...
ValueError: this CharacterToken does not represent a string!
>>> BalancedTokenList("+-=").str_codes()
[43, 45, 61]

Note

In non-Unicode engines, each token will be replaced with a character with character code equal to the character code of that token. UTF-8 characters with character code >=0x80 will be represented by multiple characters in the returned string.

str_if_unicode(unicode: bool = True) → str[source]

Assume this token list represents a string in a (Unicode/non-Unicode) engine, return the string content.

If the engine is not Unicode, assume the string is encoded in UTF-8.

class pythonimmediate.Umathcode(family: int, cls: MathClass, position: int)[source]

Bases: object

Example of using active:

>>> Umathcode.parse(0x1000000)
Umathcode.active
>>> Umathcode.active.family
1

exception pythonimmediate.UnbalancedTokenListError[source]

Bases: ValueError

Exception raised when a token list is unbalanced.

class pythonimmediate._CatcodeManager[source]

Bases: object

Python interface to manage the category code. Example usage:

>>> catcode[97]
<Catcode.letter: 11>
>>> catcode["a"] = C.letter

class pythonimmediate._CountManager[source]

Bases: object

Manipulate count registers. Interface is similar to catcode.

For example:

>>> count[5]=6  # equivalent to `\count5=6`
>>> count[5]
6
>>> count["endlinechar"]=10  # equivalent to `\endlinechar=10`
>>> T.endlinechar.int()  # can also be accessed this way
10
>>> count["endlinechar"]=13

As shown in the last example, accessing named count registers can also be done through Token.int().

class pythonimmediate._FrozenRelaxToken[source]

Bases: Token

>>> frozen_relax_token
<Token: [frozen]\relax>
>>> BalancedTokenList(r'\ifnum 0=0\fi').expand_x()
<BalancedTokenList: [frozen]\relax>

serialize() → str[source]: Internal function, serialize this token to be able to pass to \(\TeX\).

simple_detokenize(get_catcode: Callable[[int], Catcode]) → str[source]: Simple approximate detokenizer, implemented in Python.

class pythonimmediate._GroupManager[source]

Bases: object

Create a semi-simple group.

Use as group.begin() and group.end(), or as a context manager:

>>> count[0]=5
>>> with group:
...     count[0]=6
...     count[0]
6
>>> count[0]
5

Note that the user must not manually change the group level in a context:

>>> with group:
...     group.begin()
Traceback (most recent call last):
    ...
ValueError: Group level changed during group

They must not change the engine either:

>>> tmp_engine=ChildProcessEngine("pdftex")
>>> with group:
...     c=default_engine.set_engine(tmp_engine)
Traceback (most recent call last):
    ...
ValueError: Engine changed during group
>>> tmp_engine.close()
>>> c.restore()
>>> group.end()

class pythonimmediate._ToksManager[source]

Bases: object

Manipulate tok registers. Interface is similar to catcode.

For example:

>>> toks[0]=BalancedTokenList('abc')
>>> toks[0]
<BalancedTokenList: a₁₁ b₁₁ c₁₁>

class pythonimmediate._UmathcodeManager[source]

Bases: object

Interface is similar to catcode.

For example:

>>> umathcode[0]
Traceback (most recent call last):
    ...
RuntimeError: umathcode is not available for non-Unicode engines!
>>> from pythonimmediate.engine import ChildProcessEngine
>>> with default_engine.set_engine(luatex_engine): umathcode["A"]
Umathcode(family=1, cls=<MathClass.variable_family: 7>, position=65 'A')

pythonimmediate.add_TeX_handler(t: BalancedTokenList, *, continue_included: bool = False) → str[source]

See call_TeX_handler().

Parameters

continue_included –

If this is set to True, \pythonimmediatecontinuenoarg token should be put when you want to return control to Python.

>>> with group: identifier=add_TeX_handler(BalancedTokenList(
...     r"\afterassignment\pythonimmediatecontinuenoarg \toks0="), continue_included=True)
>>> BalancedTokenList([["abc"]]).put_next()
>>> call_TeX_handler(identifier)  # this will assign \toks0 to be the following braced group
>>> toks[0]
<BalancedTokenList: a₁₁ b₁₁ c₁₁>

pythonimmediate.add_TeX_handler_param(t: BalancedTokenList, param: int | BalancedTokenList, *, continue_included: bool = False) → str[source]

Similar to add_TeX_handler(), however it will take parameters following in the input stream.

Parameters: continue_included – See add_TeX_handler().

>>> identifier=add_TeX_handler_param(BalancedTokenList(r"\def\l_tmpa_tl{#2,#1}"), 2)
>>> BalancedTokenList(r'{123}{456}').put_next()
>>> call_TeX_handler(identifier)
>>> T.l_tmpa_tl.tl()
<BalancedTokenList: 4₁₂ 5₁₂ 6₁₂ ,₁₂ 1₁₂ 2₁₂ 3₁₂>
>>> remove_TeX_handler(identifier)

pythonimmediate.add_handler(f: Callable[[], None], *, all_engines: bool = False) → str[source]

This function provides the facility to efficiently call Python code from \(\TeX\) and without polluting the global namespace.

First, note that with pyc() you can do the following:

>>> a=get_user_scope()["a"]=[]
>>> execute(r"\def\test{\pyc{a.append(1)}}")

Then every time \test is executed on \(\TeX\) side the corresponding Python code will be executed:

>>> a
[]
>>> execute(r"\test")
>>> a
[1]

However, this pollutes the Python global namespace as well as having to parse the string a.append(1) into Python code every time it’s called.

With this function, you can do the following:

>>> def myfunction(): execute(r"\advance\count0 by 1 ")  # it's possible to execute TeX code here
>>> identifier = add_handler(myfunction)
>>> execute(r"\def\test{\pythonimmediatecallhandler{" + identifier + r"}}")

>>> count[0]=5
>>> execute(r"\test")
>>> count[0]
6

The returned value, identifier, is a string consist of only English alphabetical letters, which should be used to pass into \pythonimmediatecallhandler \(\TeX\) command and remove_handler().

The handlers must take a single argument of type Engine as input, and returns nothing.