There is a bunch of text files on your hard disc. You’d like to load then into a big one file for further processing/analysis. Let’s see how to do it.
How to load files?
#!/usr/bin/env python3 import pathlib def file_to_transcript(filename, delimiter="\n\n", encoding="UTF8"): with open(filename, encoding=encoding) as ftt: return ftt.read().split(delimiter) def main(): base_dir = pathlib.Path('/tmp/foo/myfiles') transcripts = [ file_to_transcript(file) for file in base_dir.glob('*.txt') ] print(transcripts) if __name__ == '__main__': main()
How loading works
It is glob method from pathlib. Path is being used to open files and to traverse the directories.
Code also breaks down entries by paragraphs. If you need it, you need to check that they are consistently available in your files and then split them accordingly. For example if there are double blank lines.
Text files should always be open with the correct encoding (UTF8 here).