Thoth: Mid-Training Bridges LLMs to Time Series Understanding

This is the official repository for the paper Thoth: Mid-Training Bridges LLMs to Time Series Understanding.

📄 Introduction

While Large Language Models (LLMs) demonstrate exceptional proficiency in general reasoning, they often exhibit a fundamental limitation in capturing intricate temporal dependencies. To bridge this gap, Thoth introduces the first family of mid-trained LLMs that transcend the constraints of task-specific Supervised Fine-Tuning (SFT) through a task- and domain-agnostic mid-training stage. By leveraging an automated synthesis pipeline to achieve bidirectional alignment between time-series-to-text and text-to-time-series generation, Thoth equips models with an intrinsic and foundational understanding of temporal dynamics. This internalized comprehension enables the model to effectively address and enhance performance across a wide range of complex, knowledge-intensive time series reasoning downstream tasks in real-world scenarios.

For more details, please check our paper.

✨ Quickstart

We have released Thoth-30B-A3B on Hugging Face, a model fine-tuned from Qwen3-30B-A3B-Instruct-2507 that is ready for immediate inference and testing.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "thuml/Thoth-30B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True).eval()

# A simple time series anomaly detection task
question = """The following data represents the hourly electricity consumption (in kWh) of an office building over a 24-hour period, starting from midnight (00:00).
Data: [12.5, 11.8, 12.1, 11.5, 12.2, 11.9, 15.6, 32.4, 35.1, 34.8, 36.2, 65.5, 37.0, 35.5, 34.2, 33.9, 35.1, 31.8, 18.2, 14.5, 13.1, 12.8, 12.4, 11.9]
Task: 1. Specify the hour (0-23) when the anomaly occurs. 2. Provide a brief reasoning why you consider it an anomaly."""

messages = [
    {"role": "system", "content": "You are an expert in time series understanding and reasoning."},
    {"role": "user", "content": question}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate reasoning output
generated_ids = model.generate(**model_inputs, max_new_tokens=512, temperature=0.7)

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

💻 Installation

conda create -n thoth python=3.10
conda activate thoth
pip install -r requirements.txt

💬 Evaluation

Refer to the YAML files under evaluation/configs/ for setup. Note that additional configurations (e.g., api_key and base_url) are required for proprietary models.

To start the evaluation with the default configuration, simply run:

cd evaluation
bash ./scripts/Thoth.sh

🚀 Release Progress

Thoth-30B-A3B model weights
public benchmark evaluation pipeline
KnoTS benchmark
KnoTS evaluation code

📜 Citation

If you find our work useful, please cite our paper as:

@article{lin2026thoth,
  title={Thoth: Mid-Training Bridges LLMs to Time Series Understanding},
  author={Lin, Jiafeng and Wang, Yuxuan and Wu, Jialong and Luo, Huakun and Pei, Zhongyi and Wang, Jianmin},
  journal={arXiv preprint arXiv:2603.01042},
  year={2026}
}

🤝 Contact

If you have any questions, feel free to contact:

Jiafeng Lin (lin-jf21@mails.tsinghua.edu.cn)
Yuxuan Wang (wangyuxu22@mails.tsinghua.edu.cn)
Jialong Wu (wujialong0229@gmail.com)

💡 Acknowledgment

We sincerely appreciate the following works for their valuable open-source models and evaluation benchmarks:

Qwen3 (https://github.com/QwenLM/Qwen3)
Time-MQA (https://huggingface.co/datasets/Time-MQA/TSQA)
ChatTime (https://github.com/ForestsKing/ChatTime)
ChatTS (https://github.com/NetManAIOps/ChatTS)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
datasets/ChatTime		datasets/ChatTime
evaluation		evaluation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

📄 Introduction

✨ Quickstart

💻 Installation

💬 Evaluation

🚀 Release Progress

📜 Citation

🤝 Contact

💡 Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

📄 Introduction

✨ Quickstart

💻 Installation

💬 Evaluation

🚀 Release Progress

📜 Citation

🤝 Contact

💡 Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages