Lead & Lag: SQL Window Functions and a bit more ...

In this blog, we will examine some of the SQL window functions available within ABAP-SQL and delve into their application through the use of examples.

So, here’s the problem: We have a table called ZMOVIE_SEATS with two columns, SEAT_ID (Integer) and STATUS (Boolean). SEAT_ID is an auto increment primary key (this column will have values like 1,2,3 … so on). Our goal is to find out which consecutive seat ids are occupied (or unoccupied which is simple flip in the solution). This problem is also known as the “Consecutive Available Seat Problem” The trick is that we need to solve it using SQL alone. That’s what caught my attention. It’s easy enough to solve using programming constructs like loops and conditionals, but doing it with SQL alone is what makes it interesting according to me.

Definitely, there could be multiple variations of the solutions, but the one I intend to present here is to utilize window functions, namely LEAD and LAG, while concurrently introducing the concept of window functions to readers who may not yet be acquainted with it.

Sample Input Table Data:

Expected Output: 3, 4, 5

Before we get to solving it using window functions, lets learn what are the “window functions“

What are window functions?

Window functions in SQL are a powerful tool for performing calculations across a set of rows that are related to the current row. They provide the ability to perform aggregate-like operations (e.g., SUM, COUNT, AVG) or ranking calculations (e.g., RANK, ROW_NUMBER) without actually grouping the result set into fewer rows. This means that each row maintains its individual identity while still incorporating information from its surrounding “window” of rows.

Now, let’s look at definitions of some key terms and concepts:

Windows: A window is essentially a subset of rows within your result set. You define the window using the OVER clause. It can encompass the entire result set, a partition (group) of rows, or a specific range of rows relative to the current row.
Window Frame: The window frame is the specific set of rows within the window that are used in the calculation for a particular row. You can control the frame’s boundaries using keywords like ROWS BETWEEN, UNBOUNDED PRECEDING, and CURRENT ROW.
Partitioning (PARTITION BY): This clause divides the result set into distinct groups or partitions. Window functions are then applied separately within each partition.
Ordering (ORDER BY): This clause sorts the rows within the window or partition, which is crucial for calculations like running totals or rankings.

Types of Window Functions

Aggregate Window Functions: These functions (SUM, AVG, COUNT, MIN, MAX) work similarly to regular aggregate functions but without collapsing rows.
Ranking Window Functions: These functions (RANK, DENSE_RANK, ROW_NUMBER, NTILE) assign rankings or numerical positions to rows within their window.
Value Window Functions: These functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE) access values from other rows within the window.

Why Use Window Functions?

Flexibility: You can perform complex calculations without restructuring your result set.
Performance: They are often more efficient than solutions involving self-joins or subqueries.
Expressiveness: They make your SQL queries more concise and readable.

Sample solution for the Consecutive Available Seat Problem:

TYPES: BEGIN OF movie_seat,
         seat_id TYPE i,
         free    TYPE abap_bool,
       END OF movie_seat,
       movie_seats TYPE STANDARD TABLE OF movie_seat WITH NON-UNIQUE KEY seat_id.

DATA(lt_movie_seats) = VALUE movie_seats( ( seat_id = 1 free = abap_true )
                                          ( seat_id = 2 free = abap_false )
                                          ( seat_id = 3 free = abap_true )
                                          ( seat_id = 4 free = abap_true )
                                          ( seat_id = 5 free = abap_true ) ).

SELECT seat_id                               AS seat_id,
       free                                  AS current_free,
       LAG( free ) OVER( ORDER BY seat_id )  AS previous_free,
       LEAD( free ) OVER( ORDER BY seat_id ) AS next_free
  FROM _movie_seats AS movie_seats
  ORDER BY seat_id
  INTO TABLE (lt_tab).

SELECT seat_id FROM _tab AS data1
  WHERE current_free = 
    AND ( previous_free =  OR next_free =  )
  INTO TABLE (lt_output).

cl_demo_output=>display_data( value = lt_output ).

Program Output:

Now let’s explore what do these functions LEAD and LAG do?

The window functions LEAD and LAG are designed to give you access to data from rows other than the current row you’re processing within a result set. They operate within a specified window (a subset of rows) and help you analyze data in relation to its neighboring rows.

LEAD: Peels back the curtain to show you data from rows that come after the current row in your window. It’s like having a sneak peek into the future.
LAG: Takes you back in time to reveal data from rows that precede the current row. Think of it as looking through a rearview mirror.

These have the following syntax:

LEAD|LAG( sql_exp1[, diff[, sql_exp2]]

diff determines the number of rows to either look forward or backward for LEAD and LAG respectively. If diff is not specified a value of 1 is assumed implicitly.If the row determined by diff is not in the current window, the result is the null value by default.

OVER Clause is essential for both LEAD and LAG. It defines the window (set of rows) within which the functions operate. The ORDER BY clause within OVER is crucial to establish the sequence of rows.

The below illustrations should be helpful in understanding the concept:

LAG:

LEAD:

Let’s look at two more window functions …

Since we are on the topic of window functions, lets also explore two more window functions namely FIRST_VALUE and LAST_VALUE with an example:

Imagine we have a logging table with below entries:

Now, if we are tasked to identify the first and the last log entry for every user per day then we achieve it by using the above introduced window functions as shown below:

TYPES: BEGIN OF lty_entry_log,
         entry_id TYPE char5,
         time     TYPE tims,
         date     TYPE date,
         user_id  TYPE char10,
       END OF lty_entry_log,
       ltt_entry_log TYPE STANDARD TABLE OF lty_entry_log WITH NON-UNIQUE KEY entry_id.

DATA lt_entry_log TYPE ltt_entry_log.

lt_entry_log = VALUE #( ( entry_id = '10001' time = '084534' date = '20231010' user_id = 'mark' )
                        ( entry_id = '10002' time = '093012' date = '20231010' user_id = 'mark' )
                        ( entry_id = '10003' time = '111547' date = '20231010' user_id = 'john' )
                        ( entry_id = '10004' time = '123022' date = '20231010' user_id = 'john' )
                        ( entry_id = '10005' time = '142551' date = '20231010' user_id = 'mark' )
                        ( entry_id = '10006' time = '083609' date = '20231011' user_id = 'john' )
                        ( entry_id = '10007' time = '092841' date = '20231011' user_id = 'mark' )
                        ( entry_id = '10008' time = '113012' date = '20231011' user_id = 'mark' )
                        ( entry_id = '10009' time = '121855' date = '20231011' user_id = 'mark' )
                        ( entry_id = '10010' time = '143639' date = '20231011' user_id = 'john' ) ).

SELECT DISTINCT user_id,
                date,
                FIRST_VALUE( time ) OVER( PARTITION  BY user_id, date ORDER BY date ) AS first_entry,
                LAST_VALUE( time ) OVER( PARTITION BY user_id, date  ORDER BY date )  AS last_entry
  FROM _entry_log AS log_entry
  ORDER BY date
  INTO TABLE (lt_result).

cl_demo_output=>display_data( lt_result ).

Upon, execution we get the following output which is as expected:

So what do FIRST_VALUE and LAST_VALUE essentially do?

Consider a window, which is a subset of rows from a query result. These functions offer a glimpse into values within that window:

FIRST_VALUE: retrieves the value from the very first row in the window.
LAST_VALUE: obtains the value from the very last row in the window.

The window is specified using an OVER clause. Similar to LEAD and LAG, FIRST_VALUE and LAST_VALUE require this clause. It’s crucial to specify the ordering of the rows within the window using ORDER BY to determine the “first” or “last” row.

OVER Clause: The OVER clause defines the window. It shares similarities with LEAD and LAG. ORDER BY within the OVER clause is essential in determining which row is considered the “first” or “last”.

PARTITION BY: While optional, PARTITION BY is useful for dividing data into groups.