|
Open Chinese Convert 1.3.0.dev2+g90b2a0f
A project for conversion between Traditional and Simplified Chinese
|
plugins/ contains segmentation plugins that are built and distributed separately from the OpenCC core library.
The current plugin layout is:
Runtime naming follows the segmentation type:
Windows loaders also accept the MSYS/MinGW runtime name msys-opencc-<type>.dll when that is the emitted DLL filename. On Windows, plugins must be built with an ABI-compatible toolchain/runtime as the host OpenCC binary. Mixing MSVC-built hosts with MinGW-built plugins, or the reverse, is unsupported.
For the jieba plugin, that means:
MSYS/MinGW builds may emit msys-opencc-jieba.dll, which is also accepted by the loader.
CMake installs plugin binaries into the platform plugin directory and installs plugin configs/resources into the OpenCC data directory. Within a single plugin search directory, keep only one DLL for a given segmentation type. On Windows this applies to both opencc-<type>.dll and msys-opencc-<type>.dll. Multiple matching DLL names for the same type in one search directory are treated as an error.
Plugin JSON uses resource names rather than platform paths. Example:
The core passes these values to the plugin host. The plugin is responsible for resolving them at runtime. Relative resource paths are expected to resolve within the existing OpenCC data layout rather than a plugin-specific ad hoc directory tree.
The current segmentation plugin ABI entry point is:
Segmentation results are returned as a sequence of segment lengths measured in Unicode code points, not as copied token strings. The ABI contract is:
This keeps the ABI simpler and avoids allocating one string per token across the plugin boundary.
When using the jieba plugin, you can add custom terminology to the segmenter by defining a custom user.dict.utf8 or editing the installed one.
Custom dictionaries must be encoded in UTF-8. Each line follows the format: [Word] [Frequency] [Part-of-Speech], separated by spaces. The frequency and POS tags are optional.
Example:
Each plugin should prefer integration tests that exercise:
Current jieba targets:
To align with downstream Linux distribution packaging standards (e.g., Debian apt, Arch pacman), OpenCC plugins support decoupled compilation. This lets maintainers build and distribute the core opencc system separately from heavier third-party plugins such as opencc-jieba.
Compile the main tree normally, but disable the optional jieba plugin:
Plugins can detect standalone builds automatically. Build from the plugin directory and point OpenCC_DIR at the installed OpenCC CMake package:
Standalone default installation paths are intended to align with the core OpenCC layout: