HttpReader¶

class torchdata.datapipes.iter.HttpReader(source_datapipe: IterDataPipe[str], timeout: Optional[float] = None, skip_on_error: bool = False, **kwargs: Optional[Dict[str, Any]])¶

接收文件 URL（指向文件的 HTTP URL），并生成文件 URL 和 IO 流的元组（函数名称：read_from_http）。

参数：:

source_datapipe – 包含 URL 的 DataPipe
timeout – HTTP 请求的超时时间（秒）
skip_on_error – 是否跳过导致问题的 URL，否则将引发异常
**kwargs – 一个字典，用于传递请求接受的可选参数。有关完整列表，请查看 https://docs.python-requests.org/en/master/api/

示例

from torchdata.datapipes.iter import IterableWrapper, HttpReader

file_url = "https://raw.githubusercontent.com/pytorch/data/main/LICENSE"
query_params = {"auth" : ("fake_username", "fake_password"), "allow_redirects" : True}
timeout = 120
http_reader_dp = HttpReader(IterableWrapper([file_url]), timeout=timeout, **query_params)
reader_dp = http_reader_dp.readlines()
it = iter(reader_dp)
path, line = next(it)
print((path, line))

输出

('https://raw.githubusercontent.com/pytorch/data/main/LICENSE', b'BSD 3-Clause License')

HttpReader¶

文档

教程

资源