The Problem
Imagine we have a folder with multiple csv files that we want to load into Power BI. These files have the same first few columns, but then an undefined number of date-related columns. For example, we might have:
Project, Category, SubCategory, Mon 7-3, Tue 7-4, Wed 7-5
and another file might have
Project, Category, SubCategory, Mon 7-3, Tue 7-4, Wed 7-5, Thu 7-6, Fri 7-7
In general
Project, Category, SubCategory, [ddd M-d], [ddd M-d], [ddd M-d], ...
We’ll need to eventually unpivot these date columns, but the standard ‘Get files from Folder’ wizard takes the first file as the schema, and then errors out when it encounters a file with more columns.
The Solution
The solution is pretty straight-forwards. We just need to Table.UnpivotOtherColumns
in the function which loads each file. This way we’re normalising the schema as we load each file.
Here’s some M code to do this. The function takes the file contents, parses it as CSV (without specifying the number of columns) and promotes headers as normal. It then finds all the date columns and removes them from the list of “other” columns to unpivot. Finally it unpivots and returns the result.
Since we only have Mon-Sun prefixes, I’ve just coded them into the List.Select call. You could do something fancy with regex or some other way of determining which columns to unpivot.
LoadFile = (FileContents) =>
let
Source = Csv.Document(FileContents,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
Table = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ColNames = Table.ColumnNames(Table),
StaticCols = List.Select(ColNames, each (not ( Text.StartsWith(_, "Mon") or Text.StartsWith(_,"Tue") or Text.StartsWith(_,"Wed")or Text.StartsWith(_,"Thu")or Text.StartsWith(_,"Fri")or Text.StartsWith(_,"Sat")or Text.StartsWith(_,"Sun"))) ),
Unpivot = Table.UnpivotOtherColumns(Table, StaticCols, "key", "value"),
Result = Unpivot
in
Result
Putting this into a query to load all files in a folder and adding a parameter for the DataFolder
we get:
let
LoadFile = (FileContents) =>
let
Source = Csv.Document(FileContents,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
Table = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ColNames = Table.ColumnNames(Table),
StaticCols = List.Select(ColNames, each (not ( Text.StartsWith(_, "Mon") or Text.StartsWith(_,"Tue") or Text.StartsWith(_,"Wed")or Text.StartsWith(_,"Thu")or Text.StartsWith(_,"Fri")or Text.StartsWith(_,"Sat")or Text.StartsWith(_,"Sun"))) ),
Unpivot = Table.UnpivotOtherColumns(Table, StaticCols, "key", "value"),
Result = Unpivot
in
Result,
Source = Folder.Files(DataFolder),
#"Filtered Hidden Files" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function" = Table.AddColumn(#"Filtered Hidden Files", "Transform File", each LoadFile([Content])),
#"Renamed Columns" = Table.RenameColumns(#"Invoke Custom Function", {"Name", "Source.Name"}),
#"Removed Other Columns" = Table.SelectColumns(#"Renamed Columns", {"Source.Name", "Transform File"}),
#"Expanded Transform File" = Table.ExpandTableColumn(#"Removed Other Columns", "Transform File", {"Project", "Category", "SubCategory", "key", "value"}, {"Project", "Category", "SubCategory", "key", "value"})
in
#"Expanded Transform File"
This is a pretty basic solution which can be extended to clean the loaded csvs, fill in any missing dates, process into facts and dims, etc.