Loading Files With Varying Schema in Power BI using Power Query

The Problem

Imagine we have a folder with multiple csv files that we want to load into Power BI. These files have the same first few columns, but then an undefined number of date-related columns. For example, we might have:

Project, Category, SubCategory, Mon 7-3, Tue 7-4, Wed 7-5

and another file might have

Project, Category, SubCategory, Mon 7-3, Tue 7-4, Wed 7-5, Thu 7-6, Fri 7-7

In general

Project, Category, SubCategory, [ddd M-d], [ddd M-d], [ddd M-d], ...

We’ll need to eventually unpivot these date columns, but the standard ‘Get files from Folder’ wizard takes the first file as the schema, and then errors out when it encounters a file with more columns.

The Solution

The solution is pretty straight-forwards. We just need to Table.UnpivotOtherColumns in the function which loads each file. This way we’re normalising the schema as we load each file.

Here’s some M code to do this. The function takes the file contents, parses it as CSV (without specifying the number of columns) and promotes headers as normal. It then finds all the date columns and removes them from the list of “other” columns to unpivot. Finally it unpivots and returns the result.

Since we only have Mon-Sun prefixes, I’ve just coded them into the List.Select call. You could do something fancy with regex or some other way of determining which columns to unpivot.

LoadFile = (FileContents) => 
    let
        Source = Csv.Document(FileContents,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
        Table = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        ColNames = Table.ColumnNames(Table),
        StaticCols = List.Select(ColNames, each (not ( Text.StartsWith(_, "Mon") or Text.StartsWith(_,"Tue") or Text.StartsWith(_,"Wed")or Text.StartsWith(_,"Thu")or Text.StartsWith(_,"Fri")or Text.StartsWith(_,"Sat")or Text.StartsWith(_,"Sun"))) ),
        Unpivot = Table.UnpivotOtherColumns(Table, StaticCols, "key", "value"),
        Result = Unpivot
    in
        Result

Putting this into a query to load all files in a folder and adding a parameter for the DataFolder we get:

let
    LoadFile = (FileContents) => 
       let
            Source = Csv.Document(FileContents,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
            Table = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
            ColNames = Table.ColumnNames(Table),
            StaticCols = List.Select(ColNames, each (not ( Text.StartsWith(_, "Mon") or Text.StartsWith(_,"Tue") or Text.StartsWith(_,"Wed")or Text.StartsWith(_,"Thu")or Text.StartsWith(_,"Fri")or Text.StartsWith(_,"Sat")or Text.StartsWith(_,"Sun"))) ),
            Unpivot = Table.UnpivotOtherColumns(Table, StaticCols, "key", "value"),
            Result = Unpivot
        in
            Result,
    Source = Folder.Files(DataFolder),
    #"Filtered Hidden Files" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function" = Table.AddColumn(#"Filtered Hidden Files", "Transform File", each LoadFile([Content])),
    #"Renamed Columns" = Table.RenameColumns(#"Invoke Custom Function", {"Name", "Source.Name"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Renamed Columns", {"Source.Name", "Transform File"}),
    #"Expanded Transform File" = Table.ExpandTableColumn(#"Removed Other Columns", "Transform File", {"Project", "Category", "SubCategory", "key", "value"}, {"Project", "Category", "SubCategory", "key", "value"})
in
    #"Expanded Transform File"

This is a pretty basic solution which can be extended to clean the loaded csvs, fill in any missing dates, process into facts and dims, etc.

The Problem#

The Solution#

The Problem

The Solution