Fix Error "xlrd.biffh.XLRDError: Excel xlsx file; not supported"

Introduction Python has a rich ecosystem of libraries that make it an ideal language for data analysis. One of those libraries is pandas, which simplifies the process of reading and writing data between in-memory data structures and different file formats. However, while working with Excel files using pandas.read_excel

Introduction Python has a rich ecosystem of libraries that make it an ideal language for data analysis. One of those libraries is pandas, which simplifies the process of reading and writing data between in-memory data structures and different file formats. However, while working with Excel files using pandas.read_excel, you might run into an error that looks like this: xlrd.biffh.XLRDError: Excel xlsx file; not supported In this Byte, we'll dissect this error message, understand why it occurs, and learn how to fix it. What is the Error "xlrd.biffh.XLRDError" The xlrd.biffh.XLRDError is a specific error message that you might encounter while working with the pandas library in Python. This error is thrown when you try to read an Excel file with the .xlsx extension using pandas.read_excel method. Here's an example of the error: import pandas as pd df = pd.read_excel('file.xlsx') Output: xlrd.biffh.XLRDError: Excel xlsx file; not supported Cause of the Error The xlrd.biffh.XLRDError error is caused by a recent change in the xlrd library that pandas uses to read Excel files. The xlrd library now only supports the older .xls file format and no longer supports the newer .xlsx file format. This change can be a bit of a surprise if you've been using pandas.read_excel with xlrd. By default, pandas.read_excel uses the xlrd library to read Excel files, but as of xlrd version 2.0.0, this library no longer supports .xlsx files. As developers, we've all been there... How to Fix the Error The solution to this error is simple. You just need to install openpyxl and specify the engine argument in the pandas.read_excel method to use the openpyxl library instead of xlrd. The openpyxl library supports both .xls and .xlsx file formats. Here's how to do it: First, you need to install the openpyxl library. You can do this using pip: $ pip install openpyxl Then, you can specify the engine argument in the pandas.read_excel method like this: import pandas as pd df = pd.read_excel('file.xlsx', engine='openpyxl') This code will read the Excel file using the openpyxl library, and you will no longer encounter the xlrd.biffh.XLRDError error. Conclusion In this Byte, we've learned about the xlrd.biffh.XLRDError error that happens when using pandas.read_excel to read .xlsx files. We've learned why this error occurs and how to fix it by using the openpyxl library.