LineReader¶

class torchdata.datapipes.iter.LineReader(source_datapipe: IterDataPipe[Tuple[str, IO]], *, skip_lines: int = 0, strip_newline: bool = True, decode: bool = False, encoding='utf-8', errors: str = 'ignore', return_path: bool = True)¶

接受一个由文件名和字符串数据流组成的元组组成的 DataPipe，并对流中的每一行，生成一个包含文件名和行的元组（函数名：readlines）。

参数：

source_datapipe – 包含文件名和字符串数据流元组的 DataPipe
skip_lines – 每个文件开始时要跳过的行数
strip_newline – 如果为 True，将剥离换行符
decode – 如果为 True，这将根据指定的 encoding 解码文件的内容
encoding – 文件的字符编码（默认值=’utf-8’）
errors – 解码时使用的错误处理方案
return_path – 如果为 True，则每行将返回一个包含路径和内容的元组，而不是只返回内容

示例

>>> from torchdata.datapipes.iter import IterableWrapper
>>> import io
>>> text1 = "Line1\nLine2"
>>> text2 = "Line2,1\r\nLine2,2\r\nLine2,3"
>>> source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
>>> line_reader_dp = source_dp.readlines()
>>> list(line_reader_dp)
[('file1', 'Line1'), ('file1', 'Line2'), ('file2', 'Line2,1'), ('file2', 'Line2,2'), ('file2', 'Line2,3')]

LineReader¶

文档

教程

资源