Article
Evgeniy Potapov · Jul 19 7m read

Showing dates in missing periods

When we collect temporary data (the number of purchases in the store, the number of comments on the post), it may happen that there is no data for a certain period of time. In this case, this time period (hour, day, month) is not represented in the database, that is, there is not a single row for this period. In other words, there are no rows in the database for this period.

When we start building analytical reports based on data from IRIS or Adaptive Analytics cubes, a visualization problem appears. On charts and tables, we expect to see a value of 0 for the period with missing data, but the BI system simply does not display this time period. The BI system does not know about the existence of a missing period of time due to lack of data, and it can consider that 05/25/2022 is followed by 05/27/2022 or March immediately follows January. This is because we are distributing our data across the members of the time dimension, or, in other words, across all the unique time spans that exist in the database.

Another feature of using Adaptive Analytics is that we must specify a certain table as a data source for the time dimension, and only those dates that were in the table appear in the dimension. If we bind another table to this dimension, and it contains dates that are missing in the first table, they would be simply discarded as not existing among the members of the dimension. The built-in solution to this problem in Adaptive Analytics is to create dimensions that can have duplicate members. All records from the tables are entered into such a dimension. Here we can get 2 records about a specific date from one table, 3 more - from another, and 10,000 from the third, and they will all become members of this dimension. This allows you to be sure that not a single record from any table is eliminated due to the fact that there was no such record in the main table. On the other hand, this approach significantly reduces query performance.

Both of these problems can be solved by adding an additional table with dates to IRIS. It should contain all dates, starting with the first existing one in the dataset and ending with the distant future. Also, this table should contain the minimum intervals at which you need to analyze the data. In our case, these were days, so all the dates were simply added to the table.

The CSV file with data to fill the table was generated using a simple python script. Additionally, fields were generated to display larger periods of time in the format we need.

import time

import pandas as pd

from datetime import date, timedelta, datetime

 

start_date = date(1970,1,1)

end_date = date(2100,1,1)

iter_date = start_date

calendar = []

while iter_date <= end_date:

  calendar.append([iter_date,iter_date.strftime("%b-%Y "),iter_date.strftime("%Y%m"),iter_date.strftime("%Y")])

  iter_date = iter_date + timedelta(days=1)

pd.DataFrame(calendar, columns=["FullDate","MonthYear","MonthYearNum","Year"]).to_csv("calendar.csv",index=False)

Next, the table should be added to the IRIS Data Platform using the System Explorer - SQL section.

Firstly, you should create the table structure using the SQL query executor:

CREATE TABLE calendar (FullDate DATE, MonthYear varchar(20), MonthYearNum int, Year int)

After that, you can use the wizard import data to fill this structure with data from the CSV file created by the script. Don’t forget to mark that the separator in data is ‘,’ (comma). Of course, it is correct to load data with SQL commands as described here, but sometimes you can use a wizard.

Now we can add this table to cubes and use it as a data source for measuring time, and not a single date which is interesting to us will be lost. Below you can see a structure from DC Community analytics as an example:

It should be noted right away that after adding such a table, it is necessary to check all dashboard charts to make sure that you have an upper time limit filter in them. Otherwise, a chart might display the sum of the entire measure for each period of time (for example, the number of registered users of the site at the end of the month) starting with the year 2100 with the current number of users.

Next, we want to talk about Power BI and how to solve some of the problems there. When we connect to Adaptive Analytics cubes in Power BI with a build-in connector, we don't see the created calendar, and therefore we can't get the full list of dates directly from Adaptive Analytics. We only get measures and dimensions. Because of that, we need to create a complete list of dates one more. In addition to the described advantages of the calendar, it allows us to link tables to each other by date and filter them.

Calendar creation methods

We can create a calendar in two ways:

  1. DAX functions (preferred).
  2. Power Query

The first method requires less effort. Moreover, the date range can be specified based on existing dates. Thus, the dates will be updated when the data is updated.
The second method is more complicated, and if you need to use dates from sources, you will need additional parameters. But Power Query is more flexible and allows you to customize the table for a specific task.

Using built-in DAX functions.

Creating a date table using DAX functions. 

One way to create a date table is to use the built-in DAX CALENDAR  or CALENDARAUTO (not recommended, as there may be extra start or end dates, such as employees' birthdays, so it's better to set them manually in the CALENDAR). These functions return all days between the minimum and maximum dates.

Calendar = CALENDAR(<start_date>, <end_date>))

Now we can get the year, month name and desired month format for this date. You can use the date hierarchy ('Calendar'[Date].[Year], 'Calendar'[Date].[Month]).

Let's add a column in the format (MMM-YYYY). To do this, we will write the FORMAT function.
Month-Year = FORMAT('Calendar'[Date], "MMM-YYYY").

Yet, since the Month and Month-Year fields are text fields, they will be sorted alphabetically. Let's create columns for correct sorting: MonthNum with the numeric value of the month for the Month column, and MonthYearNum for the Month-Year column. 

MonthYearNum = FORMAT('Calendar'[Date], "YYYYMM")

Now we need to set the sort column. To do this, on the page with tools select colunt to sort by, under “Sort by column” button as shown below. 

 

Create a calendar in Power Query

To create a calendar in Power Query, first, you must create an empty query.

We will insert the following code into it:

 

let
Source = List.Dates,
#"Invoked FunctionSource" = Source(#date(2021, 1, 1), Duration.Days(#date(2022, 12, 31)-#date(2021 , 1, 1))+1, #duration(1, 0, 0, 0)),
#"Table from List" = Table.FromList(#"Invoked FunctionSource", Splitter.SplitByNothing(), null, null, ExtraValues .Error),
#"Date" = Table.RenameColumns(#"Table from List",{{"Column1", "Date"}}),
    #"AddedUpdateDate" = Table.AddColumn(#"Date", "UpdateDate" , each DateTime.LocalNow()),
#"Added Year" = Table.AddColumn(#"AddedUpdateDate", "Year", each Date.Year([Date])),
#"Added WeekDay" = Table.AddColumn(# "Added Year", "WeekDay", each Date.ToText([Date],"ddd")),
#"Added Month" = Table.AddColumn(#"Added WeekDay", "Month", each Date.ToText([ Date],"MMM")&"'"&Date.ToText([Date],"yy")),
#"Added Day" = Table.AddColumn(#"Added Month", "Day", each Date.ToText( [Date],"dd/MM/yy")),
#"Added Quarter" = Table.AddColumn(#"Added Day", "Quarter", each Number.ToText(Date.QuarterOfYear([Date]))&" Q-"&Date.ToText([Date],"y y")),
#"Added MonthBegin" = Table.AddColumn(#"Added Quarter", "MonthBegin", each Date.StartOfMonth([Date]), type date),
#"Added QuarterBegin" = Table.AddColumn(# "Added MonthBegin", "QuarterBegin", each Date.StartOfQuarter([Date])),
#"Added WeekBegin" = Table.AddColumn(#"Added QuarterBegin", "WeekBegin", each Date.StartOfWeek([Date]), type date),
#"Added YearBegin" = Table.AddColumn(#"Added WeekBegin", "YearBegin", each Date.StartOfYear([Date]), type date),
#"Added Week" = Table.AddColumn(#" Added YearBegin", "Week", each Date.ToText([WeekBegin],"d/M")&"-"&Date.ToText(Date.EndOfWeek([Date]),"d/M/yy")),
#"Added Today" = Table.AddColumn(#"Added Week", "Today", each if [Date] = DateTime.Date([UpdateDate]) then true else false),
#"Added CurrentMonth" = Table.AddColumn( #"Added Today", "CurrentMonth", each if [Month] = Date.ToText(DateTime.Date([UpdateDate]),"MMM")&"'"&Date.ToText(DateTime.Date([UpdateDate]), "yy") then true else false),
   #"Added DayDif" = Table.AddColumn(#"Added CurrentMonth", "DayDif", each Duration.Days([Date]-DateTime.Date([UpdateDate]))),
   #"Added YearDif" = Table.AddColumn(#"Added DayDif", "YearDif", each Date.Year([ Date])-Date.Year([UpdateDate])),
   #"Added MonthDif" = Table.AddColumn(#"Added YearDif", "MonthDif", each [YearDif]*12+(Date.Month([Date]) -Date.Month([UpdateDate]))),
   #"Change type" = Table.TransformColumnTypes(#"Added MonthDif",{{"Date", type date}, {"Year", type text}, {"WeekDay ", type text}, {"Month", type text}, {"Day", type text}, {"Quarter", type text}, {"Week", type text}, {"Today", type logical} , {"CurrentMonth", type logical}, {"MonthDif", Int64.Type}, {"YearDif", Int64.Type}, {"DayDif", Int64.Type}, {"UpdateDate", type datetime}})
in
   #"Change type"

 

After creation of the Calendar, you have to attach it to the data from Adaptive Analytics. This allows you to use all existing data in such a way that empty values become zero and columns are displayed even if there is no data for that period.

3
0 266
Discussion (1)1
Log in or sign up to continue

Hi @Evgeniy Potapov 

Thank you for your time writing this article. I'm happy to see you playing around with Python pandas, Power BI DAX, and power query. They together make everyone's job easy.