HashChecker¶
- class torchdata.datapipes.iter.HashChecker(source_datapipe: IterDataPipe[Tuple[str, IOBase]], hash_dict: Dict[str, str], hash_type: str = 'sha256', rewind: bool = True)¶
计算并检查每个文件的哈希值,来自文件名和数据/流元组的输入 DataPipe(函数名称:
check_hash
)。如果哈希值与字典中给定的哈希值匹配,则它会生成文件名和数据/流的元组。否则,它将引发错误。- 参数:
source_datapipe – 包含文件名和数据/流元组的 IterDataPipe
hash_dict – 将文件名映射到其相应哈希值的字典
hash_type – 要应用的哈希函数类型
rewind – 使用流计算哈希值后倒带流(这对于不可搜索的流(例如 HTTP)不起作用)
示例
>>> from torchdata.datapipes.iter import IterableWrapper, FileOpener >>> expected_MD5_hash = "bb9675028dd39d2dd2bf71002b93e66c" File is from "https://raw.githubusercontent.com/pytorch/data/main/LICENSE" >>> file_dp = FileOpener(IterableWrapper(["LICENSE.txt"]), mode='rb') >>> # An exception is only raised when the hash doesn't match, otherwise (path, stream) is returned >>> check_hash_dp = file_dp.check_hash({"LICENSE.txt": expected_MD5_hash}, "md5", rewind=True) >>> reader_dp = check_hash_dp.readlines() >>> it = iter(reader_dp) >>> path, line = next(it) >>> path LICENSE.txt >>> line b'BSD 3-Clause License'