Skip to content

Pythoneo

Online How to Python stuff

Loading bunch of text files

Posted on December 13, 2020May 31, 2022 By Luke K

There is a bunch of text files on your hard disc. You’d like to load then into a big one file for further processing/analysis. Let’s see how to do it.

How to load files?

#!/usr/bin/env python3
import pathlib

def file_to_transcript(filename, delimiter="\n\n", encoding="UTF8"):
    with open(filename, encoding=encoding) as ftt:
        return ftt.read().split(delimiter)

def main():
    base_dir = pathlib.Path('/tmp/foo/myfiles')
    transcripts = [
        file_to_transcript(file) for file in base_dir.glob('*.txt')
    ]
    print(transcripts)


if __name__ == '__main__':
    main()

How loading works

It is glob method from pathlib. Path is being used to open files and to traverse the directories.

See also  Convert char to string

Code also breaks down entries by paragraphs. If you need it, you need to check that they are consistently available in your files and then split them accordingly. For example if there are double blank lines.

Text files should always be open with the correct encoding (UTF8 here).

pathlib Tags:pathlib, text

Post navigation

Next Post: Tkinter GUI to fetch data

Categories

  • bokeh (1)
  • datetime (3)
  • Django (5)
  • glob (1)
  • io (1)
  • json (1)
  • math (5)
  • matplotlib (10)
  • numpy (95)
  • OpenCV (1)
  • os (3)
  • Pandas (2)
  • paramiko (1)
  • pathlib (2)
  • Pillow (3)
  • Plotly (3)
  • Python (29)
  • random (7)
  • requests (1)
  • Scipy (4)
  • Seaborn (7)
  • shutil (1)
  • sqlite3 (1)
  • statistics (16)
  • sys (1)
  • Tkinter (9)
  • turtle (2)
  • Uncategorized (1)
  • urllib (1)
  • webbrowser (1)

RSS RSS

  • How to create violin plot using seaborn?
  • How To Use Colormaps In Matplotlib?
  • How to calculate bonds in Python
  • How to handle trigonometry in Python
  • How to Convert Int to Binary in Python?
  • How to fix ValueError: The truth value of an array with zero elements is ambiguous?
  • How to solve NameError: name ‘numpy’ is not defined
  • How to insert seaborn lineplot?
  • How to Find the Length of an Array in Python?
  • How to reset secret key in Django

Tags

arithmetic mean array axis button calculations chart conversion copy count counter data type dictionary dimension draw error files fill float generate grid GUI image index integer list matrix max mean median min normal distribution plot random reshape rotate round size standard deviation string sum test text time variance zero

Copyright © 2023 Pythoneo.

Powered by PressBook WordPress theme

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Cookie settingsACCEPT
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT