There is a bunch of text files on your disc. You’d like to load then into a big one file for further processing/analysis. Let’s see how to do it.
#!/usr/bin/env python3 import pathlib def file_to_transcript(filename, delimiter="\n\n", encoding="UTF8"): with open(filename, encoding=encoding) as ftt: return ftt.read().split(delimiter) def main(): base_dir = pathlib.Path('/tmp/foo/myfiles') transcripts = [ file_to_transcript(file) for file in base_dir.glob('*.txt') ] print(transcripts) if __name__ == '__main__': main()
It is glob method from pathlib.Path used to open files and to traverse the directories.
Code also breaks down entries by paragraphs. If you need it you need to check that they are consistently available in your files and then split them accordingly, for example if there are double blank lines.
Text files should always be opened with the correct encoding (UTF8 here).