wiktra2

Wiktra: transliteration tool using Wiktionary transliteration modules. Version 2 (fork)

View the Project on GitHub twardoch/wiktra2

Wiktra: High-Quality Transliteration Powered by Wiktionary

Wiktra is a versatile Unicode transliteration tool that brings the linguistic precision of Wiktionary’s community-curated transliteration modules to your command line and Python projects. It allows you to convert text from one writing system (script) to another with a high degree of accuracy.

Project Locations:

What is Wiktra?

At its core, Wiktra transliterates text. This means it converts characters or words from one script (e.g., Cyrillic, Arabic, Devanagari) into another (e.g., Latin script). Unlike simple character-by-character replacement, Wiktra utilizes sophisticated rule-based transliteration modules written in Lua, developed and maintained by linguists and contributors on Wiktionary. These modules understand the nuances of how languages are written, leading to more accurate and contextually appropriate results.

Wiktra provides:

Wiktra 1.0 was originally developed by Khuyagbaatar Batsuren. Wiktra 2 was significantly rewritten by Adam Twardoch.

Who is Wiktra For?

Wiktra is designed for a diverse range of users:

Why is Wiktra Useful?

Installation

Wiktra requires Python 3.9+ and Lua (specifically LuaJIT is recommended for performance with the lupa bridge).

General Installation (using pip):

The primary way to install Wiktra is via pip:

python3 -m pip install wiktra

This will attempt to install Wiktra and its Python dependencies, including lupa, which bridges Python and Lua. The lupa installation might require Lua development headers to be present on your system.

macOS:

For macOS, a convenience script install-mac.sh (available in the source repository) can help install prerequisites like Lua via Homebrew:

  1. Download or clone the Wiktra repository.
  2. Navigate to the repository’s root directory in your terminal.
  3. Run the script:
    ./install-mac.sh
    
  4. Then, install Wiktra using pip (if the script doesn’t do it already, or to ensure you have the latest version from PyPI):
    # If installing from a local clone after running the script:
    python3 -m pip install --upgrade .
    # Or to get the latest from PyPI:
    python3 -m pip install --upgrade wiktra
    

Linux (Debian/Ubuntu Example):

You’ll need to install Python 3, pip, and Lua development files.

sudo apt update
sudo apt install python3 python3-pip liblua5.1-0-dev luajit
# For lupa, LuaJIT (libluajit-5.1-dev) is often preferred over standard Lua dev packages.
# Depending on your distribution and lupa version, you might need different Lua versions like lua5.3-dev etc.
python3 -m pip install wiktra

Windows:

Installation on Windows can be more complex due to lupa compilation.

  1. Install Python 3.9+ (e.g., from python.org). Make sure to add Python to your PATH.
  2. Installing lupa typically requires a C compiler (like Microsoft C++ Build Tools, often installed with Visual Studio) and Lua (e.g., by compiling Lua from source, or using a package manager like Scoop or Chocolatey to install Lua/LuaJIT).
  3. It’s often easier if pre-compiled lupa wheels are available for your Python version and architecture on PyPI. If not, manual setup of the build environment is necessary.
  4. Once lupa can be installed (i.e., its prerequisites are met), Wiktra can be installed via pip:
    pip install wiktra
    

Note: The original README mentioned that version 2 had not been working well on Ubuntu and Windows 10 at one point. While efforts are made to ensure cross-platform compatibility, installing lupa correctly is often the main hurdle. Refer to the lupa documentation for specific guidance on its installation.

Troubleshooting Installation:

Basic Usage

Wiktra offers two main ways to perform transliterations:

1. Command-Line Interface (wiktrapy)

The wiktrapy tool is perfect for quick transliterations or use in shell scripts.

Basic syntax:

wiktrapy [options] -t "Your text here"
# or pipe text into it
echo "Your text here" | wiktrapy [options]

Examples:

2. Python Module (wiktra)

For more programmatic control, use the wiktra Python module. The recommended way is to use the Transliterator class.

Example (New API - Recommended):

from wiktra.Wiktra import Transliterator

# Create a Transliterator instance
# This is best done once if you're doing multiple transliterations
tr = Transliterator()

# Transliterate text with automatic language/script detection
# (will try to guess input script and use 'und' - undefined language for that script)
text_cyrillic = "Привет мир"
latin_text = tr.tr(text_cyrillic)
print(f"'{text_cyrillic}' -> '{latin_text}'")
# Expected Output: 'Привет мир' -> 'Privet mir'

text_devanagari = "नमस्ते दुनिया"
latin_text_dev = tr.tr(text_devanagari)
print(f"'{text_devanagari}' -> '{latin_text_dev}'")
# Expected Output: 'नमस्ते दुनिया' -> 'namaste duniyaa'

# Explicitly specify language, input script, and output script
text_russian = "Русский текст"
# lang='ru' (Russian), sc='Cyrl' (Cyrillic), to_sc='Latn' (Latin)
transliterated_explicit = tr.tr(text_russian, lang='ru', sc='Cyrl', to_sc='Latn', explicit=True)
print(f"'{text_russian}' (explicit) -> '{transliterated_explicit}'")
# Expected Output: 'Русский текст' (explicit) -> 'Russkij tekst'

# Using the class instance is more efficient for multiple transliterations
# as the Lua runtime and modules are initialized only once.

Legacy Function (translite):

A legacy translite function is also available, primarily for compatibility with older versions of Wiktra or specific use cases that relied on its distinct language code mapping.

from wiktra.Wiktra import translite as tr_legacy

# Example for Mongolian (Cyrillic) using its legacy code 'mon'
mongolian_text = "монгол бичлэг"
transliterated_mongolian = tr_legacy(mongolian_text, 'mon')
print(f"'{mongolian_text}' (legacy) -> '{transliterated_mongolian}'")
# Expected Output: 'монгол бичлэг' (legacy) -> 'mongol bichleg'

It is generally recommended to use the new Transliterator.tr() method for its more standardized approach to language/script codes and broader capabilities.

Updating Wiktionary Modules

Wiktra can update its local collection of Wiktionary transliteration modules using the wiktrapy_update command:

wiktrapy_update -h # For options
wiktrapy_update

This helps keep your transliterations aligned with the latest rules from Wiktionary.

Technical Details

This section provides a deeper insight into Wiktra’s architecture, its core components, and guidelines for coding and contributing.

How Wiktra Works

Wiktra’s ability to perform complex transliterations stems from its use of Lua modules sourced directly from Wiktionary, executed within a Python environment.

1. Python-Lua Integration via lupa: The core of Wiktra’s cross-language functionality is the lupa library. lupa provides a bridge between Python and Lua (specifically designed for LuaJIT, but can work with standard Lua), allowing Python code to:

2. The Transliterator Class (wiktra/Wiktra.py): This is the central class orchestrating transliteration.

3. Wiktionary Lua Modules (wiktra/wikt/): This directory and its subdirectories contain the Lua code and data sourced from Wiktionary.

4. CLI Entry Point (wiktra/__main__.py): This script powers the wiktrapy command-line tool.

5. Module Update Mechanism (wiktra/update.py and wiktrapy_update): Wiktra includes a built-in mechanism to update its local cache of Wiktionary Lua modules and associated data.

Coding and Contribution Guidelines

We welcome contributions to Wiktra! Here are some guidelines:

Coding Style:

Dependencies:

Testing:

Contribution Process:

  1. Fork the Repository: Start by forking the active Wiktra repository (see Project Locations above).
  2. Create a Branch: For your changes, create a new branch in your fork (e.g., feature/add-georgian-translit or fix/unicode-error-arabic).
  3. Code: Implement your changes, following the coding style guidelines.
  4. Test: Thoroughly test your modifications.
  5. Commit: Write clear, descriptive commit messages. A common convention is a short subject line (max 50 characters), a blank line, and then a more detailed explanation if necessary.
  6. Push: Push your changes to your branch in your forked repository.
  7. Create a Pull Request: Open a pull request against the main branch of the active Wiktra repository. Clearly describe your changes, the problem they solve, and how you tested them.

Reporting Issues:

License:

Wiktra is distributed under the GPLv2 (GNU General Public License version 2). All contributions to the project are also expected to be made under this license.


This README aims to be a comprehensive guide for both users and developers of Wiktra. For further details, exploring the source code and the linked Wiktionary resources is encouraged.