Manage Files and Database Connections in Python Like a Pro

In data engineering and data science, we frequently need to retrieve data from databases or flat files. However, using data from external resources without closing the connections to the underlying databases or files may lead to unwanted consequences. In this article, we discuss how to build custom context managers that help us work with those resources in a secure and efficient way.

Context managers

Thanks to the context managers, we can read data from external resources and rest assured that the connections to underlying databases or files are closed, even if we encounter some unhandled exceptions in our code.

“Typical uses of context managers include saving and restoring various kinds of global state, locking and unlocking resources, closing opened files, etc.” [1]

Closing resources is crucial because there is a limit to how many files or database connections can be opened at the same time. The number of files that can be simultaneously opened depends on how many file descriptors are allowed by the respective OS. On Linux and macOS, we can examine this number by using ulimit -n[2]:

Similarly, there is a limit to the number of concurrent database connections. For instance, the maximum number of open PostgreSQL connections is, by default, limited to 100 [3].

A typical use of context managers

The code below shows the typical use of a context manager to:

a) write data to a file:

b) read data from the same file:

The reasons behind context managers

You may ask: why not just write and read the data directly to the file without this with addition? We could write the same code as follows:

But imagine what would happen if we would encounter an error while writing to this file? The connection would never be closed. We could improve this by enclosing the code inside of try-finally blocks, as follows:

But this can be tedious to write, and our code becomes verbose. Therefore, context managers are great — we can accomplish more within just 2 lines and our code becomes more readable.

To sum it up, the reasons for context managers are:

  • ensuring that the resources are released even if we encounter some unhandled exceptions
  • readability
  • convenience — we make it easier for ourselves, as we will no longer forget to close the connections to external resources.

Custom context managers

There are two ways of defining context managers: with a custom class, or with a generator [4]. Let’s look at the generator option to create a context manager that will let us manage a MySQL database connection.

Handle MySQL database connection with a context manager

The logic is similar to the one with try-finally block, except that we yield the connection object, rather than return it — this is due to the nature of a generator that lazily returns the objects when they are needed (i.e., when we iterate over them). In our example, the context manager will yield a single value — the connection object.

Note that all we have to do is to add a decorator @contextlib.contextmanager, and use yield in the try block.

To use this context manager in order to retrieve the data from MySQL into a pandas data frame, we just import and apply this function with our context manager:

Note how much simpler our bus_logic.py script became!

The try-block could include any custom setup code, such as retrieving configuration details and credentials from secrets manager, environment variables, or config files. In the same way, inside of the finally-block, you can define any teardown logic, such as deleting temporary files, closing connections, or changing back the working directory. This is what we’ll do next.

Temporarily changing the working directory with a context manager

Imagine that you are in the src directory, but when you download files from S3, you want to store them within the data directory. In this case, we may want to temporarily change the working directory to the data folder and change it back to the previous directory, when the function is completed. The following context manager can achieve that:

By using the above context manager, we can ensure that we are in the proper directory (data) when downloading files. Then, once the download is finished, we change back to the original working directory.

Conclusion

In this article, we looked at context managers in Python that let us handle external resources in a more elegant way. They also prevent mistakes that may happen when unhandled exceptions cause our script to end before the teardown code would have the chance to be executed. This way, context managers prevent leaving locked resources, open files, or database connections. Finally, they make our code much more readable by abstracting away the setup and teardown code.

Thank you for reading! If this article was helpful, feel free to follow me for the next articles.

References:

Resources:

[1] Python docs about context managers

[2] Stackoverflow

[3] Postgres docs

[4] Python book about context managers

Cover photo by David Yu from Pexels

Image contains an affiliate link. If you join Datacamp this way, you're not only learning a ton of valuable information, but you're also supporting this blog.