Test suites and execution
To run a test suite, run:
python -m unittest tests.<test_suite_name>
This command will run a default set of test suites:
The default test suite includes all test suites listed in tests/__init__.py file.
Currently, it includes tests.core and tests.compact.
To optimize CI/CD resource utilization and manage dependency overhead,
tests are categorized into four tiers based on their resource
requirements and complexity: “core”, “compact”, “extra”, and “noauto”.
Adding a test case to a test suite
To add a test case to a test suite,
add it to tests_packages list in __init__.py
inside that test suite’s directory.
Test matrix for CI
The following table outlines the automated test coverage across
supported Python versions and operating systems:
| Python |
Ubuntu |
Windows |
macOS |
| 3.14 (Latest) |
O+C |
O |
O |
| 3.13 |
O+C+X |
O+C |
O+C |
| 3.12 |
O |
|
|
| 3.11 |
O |
|
|
| 3.10 |
O |
|
|
| 3.9 (Earliest) |
O+C |
O+C |
O+C |
The CI/CD test workflow is at
https://github.com/PyThaiNLP/pythainlp/blob/dev/.github/workflows/unittest.yml.
Core tests (test_*.py)
- Run
python -m unittest tests.core
- Focus on core functionalities.
- Do not rely on external dependencies beyond the standard library.
- Do not perform network access or corpus downloads.
- Tested on all supported operating systems and all active Python versions.
- Test case class suffix:
TestCase
Compact tests (testc_*.py)
- Run
python -m unittest tests.compact
- Need dependencies from
pip install "pythainlp[compact]"
- Test a limited set of functionalities that rely on a stable
and small set of dependencies.
- These dependencies are
PyYAML, nlpo3, numpy, pyicu,
and python-crfsuite.
- Includes corpus download/remove tests (may require network access).
- Tested on:
- All OSes: earliest and second-latest supported Python versions
- Ubuntu: additionally tested on the latest version
- Test case class suffix:
TestCaseC
- Run
python -m unittest tests.extra
- Need dependencies from
pip install "pythainlp[compact,extra]"
- Test more functionalities that rely on larger set of dependencies
or one that require more time or computation.
- Only tested on Ubuntu using the second-latest Python version.
- Test case class suffix:
TestCaseX
Noauto tests (testn_*.py)
The noauto (no-automated) test suite contains tests for functionalities
that require heavy dependencies which are not feasible to run in automated
CI/CD pipelines. These tests are organized into specialized suites based
on their dependency requirements.
Why separate noauto test suites?
Different ML/AI frameworks often have conflicting version requirements for
their dependencies. For example:
- PyTorch and TensorFlow may require different versions of numpy or protobuf
- Large frameworks take significant time to install (~1-3 GB each)
- Some packages require Cython compilation or system libraries
By separating tests by dependency group, we can:
- Test each framework independently without conflicts
- Optimize CI/CD resources by running only relevant test groups
- Make it easier for developers to test specific functionality
Noauto test suites
Umbrella suite: tests.noauto
- Run
python -m unittest tests.noauto
- Includes all modular noauto test suites
- Use this for comprehensive testing when all dependencies are available
- Test case class suffix:
TestCaseN
Modular suites by dependency
PyTorch-based: tests.noauto_torch
- Run
python -m unittest tests.noauto_torch
- Need dependencies from
pip install "pythainlp[noauto-torch]"
- Tests requiring PyTorch and its ecosystem:
- torch, transformers (PyTorch backend)
- attacut, thai-nner, wtpsplit, tltk
- Tests: spell correction (wanchanberta), NER/POS tagging (transformers-based),
tokenization (attacut), subword tokenization (phayathai, wangchanberta),
sentence tokenization (wtp)
- Dependencies: ~2-3 GB
- Test case class suffix:
TestCaseN
TensorFlow-based: tests.noauto_tensorflow
- Run
python -m unittest tests.noauto_tensorflow
- Need dependencies from
pip install "pythainlp[noauto-tensorflow]"
- Tests requiring TensorFlow:
- Dependencies: ~1-2 GB
- Note: May conflict with PyTorch dependencies
- Test case class suffix:
TestCaseN
ONNX Runtime-based: tests.noauto_onnx
- Run
python -m unittest tests.noauto_onnx
- Need dependencies from
pip install "pythainlp[noauto-onnx]"
- Tests requiring ONNX Runtime:
- oskut, sefr_cut tokenizers
- Dependencies: ~200-500 MB
- Test case class suffix:
TestCaseN
Cython-compiled: tests.noauto_cython
- Run
python -m unittest tests.noauto_cython
- Need dependencies from
pip install "pythainlp[noauto-cython]"
- Tests requiring Cython-compiled packages:
- Requires: Cython, C compiler, system libraries (hunspell)
- Platform-specific build requirements
- Test case class suffix:
TestCaseN
Network-dependent: tests.noauto_network
- Run
python -m unittest tests.noauto_network
- Need dependencies from
pip install "pythainlp[noauto-network]"
- Tests requiring network access:
- Hugging Face Hub model downloads
- External API calls
- Requires: Internet connection, may involve large downloads
- Test case class suffix:
TestCaseN
Robustness tests (test_robustness.py)
A comprehensive test suite within core tests that tests edge cases important
for real-world usage:
- Empty strings and various whitespace handling (spaces, tabs, Unicode spaces)
- Special characters from encoding issues, BOM, terminal copy/paste
- Truncated/malformed Unicode and surrogate pairs
- Emoji and modern Unicode sequences (ZWJ, modifiers, flags)
- Control and hidden/invisible characters (zero-width, control chars)
- Thai-specific edge cases with combining characters and mixed scripts
- Multi-engine robustness testing across all core tokenization engines
- Very long strings that can cause performance issues (issue #893)
Corpus test (corpus/)
A separate test suite that verifies the integrity, format, parseability,
and catalog functionality of corpus in PyThaiNLP.
These tests are separate from regular unit tests because they test actual
file loading and parsing (not mocked), require network access, and
can be resource intensive.
For detailed information about corpus test, see:
tests/corpus/README.md
The corpus test is triggered automatically via GitHub Actions
when changes are made to pythainlp/corpus/** or tests/corpus/**.
Run corpus test:
python -m unittest tests.corpus