pythainlp

Test suites and execution

To run a test suite, run:

python -m unittest tests.<test_suite_name>

This command will run a default set of test suites:

python -m unittest tests

The default test suite includes all test suites listed in tests/__init__.py file. Currently, it includes tests.core and tests.compact.

To optimize CI/CD resource utilization and manage dependency overhead, tests are categorized into four tiers based on their resource requirements and complexity: “core”, “compact”, “extra”, and “noauto”.

Adding a test case to a test suite

To add a test case to a test suite, add it to tests_packages list in __init__.py inside that test suite’s directory.

Test matrix for CI

The following table outlines the automated test coverage across supported Python versions and operating systems:

Python Ubuntu Windows macOS
3.14 (Latest) O+C O O
3.13 O+C+X O+C O+C
3.12 O    
3.11 O    
3.10 O    
3.9 (Earliest) O+C O+C O+C

The CI/CD test workflow is at https://github.com/PyThaiNLP/pythainlp/blob/dev/.github/workflows/unittest.yml.

Core tests (test_*.py)

Compact tests (testc_*.py)

Extra tests (testx_*.py)

Noauto tests (testn_*.py)

The noauto (no-automated) test suite contains tests for functionalities that require heavy dependencies which are not feasible to run in automated CI/CD pipelines. These tests are organized into specialized suites based on their dependency requirements.

Why separate noauto test suites?

Different ML/AI frameworks often have conflicting version requirements for their dependencies. For example:

By separating tests by dependency group, we can:

Noauto test suites

Umbrella suite: tests.noauto

Modular suites by dependency

PyTorch-based: tests.noauto_torch
TensorFlow-based: tests.noauto_tensorflow
ONNX Runtime-based: tests.noauto_onnx
Cython-compiled: tests.noauto_cython
Network-dependent: tests.noauto_network

Robustness tests (test_robustness.py)

A comprehensive test suite within core tests that tests edge cases important for real-world usage:

Corpus test (corpus/)

A separate test suite that verifies the integrity, format, parseability, and catalog functionality of corpus in PyThaiNLP.

These tests are separate from regular unit tests because they test actual file loading and parsing (not mocked), require network access, and can be resource intensive.

For detailed information about corpus test, see: tests/corpus/README.md

The corpus test is triggered automatically via GitHub Actions when changes are made to pythainlp/corpus/** or tests/corpus/**.

Run corpus test:

python -m unittest tests.corpus