This repository implements protocol pieces described in the HTML copies under docs/, with a focus on uTP (BEP 29) and BitTorrent v1 (BEP 3) plus several common extensions used in the wild.
| Area | Location | Role |
|---|---|---|
| uTP (micro transport) | utp/ |
UDP-based transport: packets, congestion, blocking UtpSocket |
| picotorrent | picotorrent/ |
Strict bencode, metainfo parsing, trackers, webseeds, DHT helpers, download/seed session, CLI |
| Assigned numbers (BEP 4) | bittorrent_constants.py |
Wire message IDs and reserved handshake bits |
| Peer ID conventions (BEP 20) | bittorrent_peer_ids.py |
Known client prefixes and peer-id parsing helpers |
Requirements: Python 3.10 or newer (tomllib is used when available; older 3.10 falls back to a small regex read of pyproject.toml).
Implemented from docs/bep_0029.rst_post.html:
- v1 header encode/decode (type/version nibble layout, extensions, selective ACK)
- Packet types:
ST_SYN,ST_STATE,ST_DATA,ST_FIN,ST_RESET - Congestion and timeout behavior aligned with the BEP
- Blocking UDP-backed
UtpSocket(connect,accept,sendall,recv,close)
Example:
from utp import UtpSocket
sock = UtpSocket()
sock.bind(("0.0.0.0", 0))
sock.connect(("127.0.0.1", 9000))
sock.sendall(b"hello over utp")
print(sock.recv())
sock.close()This stack is suitable as a reference and for experiments; it is not a full production uTP implementation (pacing, PMTU, full interoperability tuning, and so on are out of scope unless extended).
- Strict bencode decode with dictionary key ordering checks (suitable for info-dictionary hashing)
- Single-file and multi-file torrents
- Fields surfaced in reports include:
announce,announce-list,url-list(web seeds, BEP 19),nodes(DHT bootstrap hints, BEP 5),private(BEP 27), comments, creation metadata, piece table, info hash (SHA-1 of rawinfobytes in the file)
Python API:
from picotorrent import parse_torrent_file, format_torrent_report
meta = parse_torrent_file("example.torrent")
print(format_torrent_report(meta))The Downloader and Seeder classes in picotorrent/session.py implement a basic end-to-end path:
- HTTP and UDP tracker announces (BEP 3 / BEP 15), compact peer lists (BEP 23)
- Optional DHT
get_peerswhennodesis present in the torrent and the torrent is not private - PEX parsing when peers send
ut_pex - Web seeds (BEP 19 / GetRight-style): if tracker-based peers fail, the downloader tries
url-listHTTP(S) URLs, validates piece hashes against the metainfo, and writes output (single file or multi-file tree) - Transport: peer connections use TCP by default. The downloader uses uTP (BEP 29) when a peer was advertised with PEX flag
0x04(supports uTP) (BEP 11), or after the remote extension handshake lists**ut_holepunch** while the current connection is still TCP (BEP 55 then implies uTP for that endpoint), then retries that peer over uTP once. Otherwise traffic stays on TCP. - Piece picking (see
docs/BitTorrent Request and Choking Algorithms.mdfor block pipelining): random-first piece until the first complete piece, then lowest missing piece index the peer offers (sequential in the concatenated payload). That avoids strict rarest-first, which often defers common low-index pieces—where early multi-file entries (e.g. images) live—until the end of the job even though each piece is written as it verifies. 16 KiB pipelined block requests per piece; wait for unchoke and learn availability from bitfield /have*before requesting; peer order favors peers that have delivered more bytes on failed rounds. - On-disk layout: as soon as a download starts, the client creates the folder tree and preallocates each output file to the metainfo size (existing files are kept and only resized if the length is wrong). Each peer piece is written after SHA-1 check using
**os.write+fsync** (no stdio buffering), with SIGINT briefly ignored around that critical section so Ctrl+C is less likely to interrupt between write and sync. Only complete, hash-verified pieces are persisted—the in-flight piece at interrupt time is not written. If the torrent uses a very large piece length, few pieces may finish before you stop, so the file can stay mostly zero until whole pieces complete. - Persistent session loop with
--session-timeoutinstead of failing on the first refused connection - Extension protocol bit and fast-extension reserved bits set on the wire; extension handshake advertises
ut_metadata,ut_pex,ut_holepunchwhere applicable
This is still a reference client, not a full swarm engine (no parallel multi-peer piece picking, no full endgame cancel fan-out, no 10 s rechoke timer, and choking is not modeled on the seeder beyond basic unchoke).
The client peer id is 20 bytes, Azureus-style (BEP 20):
- Prefix:
-pT+ four digits +- - The four digits come from
**[project].versioninpyproject.toml**: major and minor are each encoded as two decimal digits (0–99), e.g.0.2.0→0200. Patch and pre-release labels are not encoded in those four digits.
If digit generation fails, the code falls back to 0001.
The same entry point covers inspection, download, and seed. After installing the package (pip install -e .), the script name is **btinspect**. During development you can run without installing:
python -m picotorrent <subcommand> ...All subcommands are required; there is no default subcommand.
btinspect inspect <torrent> [--show-pieces]
python -m picotorrent inspect <torrent> [--show-pieces]
| Argument | Description |
|---|---|
torrent |
Path to .torrent |
--show-pieces |
Print every piece SHA-1 (can be very long) |
btinspect download <torrent> [--out-dir DIR] [--peer HOST:PORT]
[--session-timeout SECONDS] [--debug-handshake]
python -m picotorrent download <torrent> [options...]
| Option | Default | Description |
|---|---|---|
torrent |
— | Path to .torrent |
--out-dir |
. |
Directory for output (single file or top-level multi-file folder) |
--peer |
(none) | Force a single host:port peer; otherwise use tracker + DHT + PEX discovery |
--session-timeout |
300 |
Seconds to keep retrying discovery and connections before giving up |
--debug-handshake |
off | Log each connect attempt and handshake send/receive fields (hex where relevant) |
On success, the CLI prints the path to the written file or directory.
btinspect seed <torrent> <data> [--host HOST] [--port PORT]
python -m picotorrent seed <torrent> <data> [options...]
| Argument | Description |
|---|---|
torrent |
Path to .torrent |
data |
Path to one file whose bytes are the torrent payload (single-file torrents). For multi-file torrents this must be the concatenation of all files in the order listed in the metainfo (same layout the protocol uses internally). |
--host |
Bind address (default 0.0.0.0) |
--port |
Listen port (default 6881) |
The seeder runs until interrupted (Ctrl+C). Clients must connect to this listener with the same info hash.
btinspect gui
python -m picotorrent gui
The GUI is optional and uses Python's built-in tkinter (no extra package dependency). CLI remains the default mode.
Editable install (recommended once):
pip install -e .This registers the btinspect console script from pyproject.toml.
No install (development):
python -m picotorrent inspect path\to\file.torrent
python -m picotorrent download path\to\file.torrent --out-dir .\out
python -m picotorrent seed path\to\file.torrent path\to\data.bin --port 6881bittorrent_constants.py— BEP 4 message IDs and reserved-bit tablesbittorrent_peer_ids.py— BEP 20 client id tables andparse_peer_id()picotorrent/project_version.py— semver frompyproject.tomlfor peer id digitstests/— unit tests for bencode, metainfo, uTP packets, project version
Run tests:
python -m pytest -q- v2 / hybrid torrents are not implemented; metainfo and pieces are v1 SHA-1 only.
- Downloader is not a full BitTorrent client: choking, pipelining, endgame, and large swarms are only partially or not modeled.
- uTP is a standalone library here; the BitTorrent peer wire path in this repo uses TCP unless you integrate
utp/yourself.
For protocol text, see the mirrored BEP HTML files under docs/.