LeetCode #176 Second Highest Salary - Solved in Pandas
Sorting Over Distinct Salaries and Two Passes With No Sort
LeetCode 176 asks for the second highest distinct salary from an employee table. The task returns a single value and produces NULL when no second salary level exists. The problem treats salaries as levels rather than counts of employees, so repeated values collapse into a single tier before determining which one falls in the second position.
Solving this in pandas starts with thinking about salary levels rather than individual rows. The goal is to isolate distinct salaries, then decide whether to sort them or scan for boundary values. Both paths follow the mechanics of series operations, building unique sets, filtering with boolean masks, and extracting a single scalar that anchors the final one-row DataFrame.
LeetCode #176 Second Highest Salary
Pandas Schema:
Table:
Employee+-------------+------+ | Column Name | Type | +-------------+------+ | id | int | | salary | int | +-------------+------+ id is the primary key (column with unique values) for this table. Each row of this table contains information about the salary of an employee.Write a solution to find the second highest distinct salary from the
Employeetable. If there is no second highest salary, returnnull (return None in Pandas).The result format is in the following example.
Example 1:
Input: Employee table: +----+--------+ | id | salary | +----+--------+ | 1 | 100 | | 2 | 200 | | 3 | 300 | +----+--------+ Output: +---------------------+ | SecondHighestSalary | +---------------------+ | 200 | +---------------------+Example 2:
Input: Employee table: +----+--------+ | id | salary | +----+--------+ | 1 | 100 | +----+--------+ Output: +---------------------+ | SecondHighestSalary | +---------------------+ | null | +---------------------+
Solution 1: Sorting Over Distinct Salaries
This solution builds a list of distinct salary levels, sorts them in descending order, and returns the second value when it exists.
import pandas as pd
def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
unique_salaries = employee['salary'].drop_duplicates()
sorted_salaries = unique_salaries.sort_values(ascending=False)
if len(sorted_salaries) < 2:
return pd.DataFrame({'SecondHighestSalary': [None]})
second = sorted_salaries.iloc[1]
return pd.DataFrame({'SecondHighestSalary': [second]})Let’s walk through what each part does step by step:
unique_salaries = employee['salary'].drop_duplicates()This line pulls the salary column out of the employee frame as a Series. Every row contributes its salary value. The call to drop_duplicates collapses repeated salaries so that each level appears once, which matches the problem statement that talks about salary levels rather than people.
sorted_salaries = unique_salaries.sort_values(ascending=False)Here the salary levels move into descending order. The highest value lands at index position zero in sorted_salaries, and the next lower level lands at position one, as long as that level exists. Sorting happens on the distinct values, not on the full row count.
if len(sorted_salaries) < 2:
return pd.DataFrame({'SecondHighestSalary': [None]})This guard handles edge cases where a second level does not exist. When the table has zero rows or only one salary level, sorted_salaries has length zero or one, and the function returns a frame with a single row and a SecondHighestSalary value of None, which matches the LeetCode requirement to return NULL.
second = sorted_salaries.iloc[1]Index access with iloc[1] steps past the top salary level and grabs the second one. iloc treats the series like a zero-based array, so index 1 always refers to the second item in the sorted order.
return pd.DataFrame({'SecondHighestSalary': [second]})The final line packages the scalar second value into a one-row DataFrame with the expected column name. That matches the output format of the official pandas version of this problem.
For this solution, drop_duplicates scans all N rows in employee[‘salary’] and builds the set of distinct salary values, so time grows proportionally to N and memory tracks the number of distinct values U. The call to sort_values on sorted_salaries then works over those U values and runs in about O(U log U) time with memory proportional to U. Overall time can be written as O(N + U log U) and is usually treated as O(N log N) when U is large, while memory is dominated by the distinct salary series and stays near O(U). From a pandas angle, this solution leans on Series.drop_duplicates, Series.sort_values, and positional indexing with iloc, which mirrors the idea of “form the levels, order them, then index into the result.” In interview settings, this is a common baseline solution that closely follows the problem statement and focuses on clarity of the transformation steps.
Solution 2: Two Passes With No Sort
This solution follows a nested MAX strategy where one pass finds the highest salary, and a second pass finds the largest salary that falls strictly below it. Sorting is replaced by two scans over the data that first locate the top level and then the next lower level.
import pandas as pd
def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
if employee.empty:
return pd.DataFrame({'SecondHighestSalary': [None]})
max_salary = employee['salary'].max()
lower_salaries = employee.loc[employee['salary'] < max_salary, 'salary']
if lower_salaries.empty:
return pd.DataFrame({'SecondHighestSalary': [None]})
second = lower_salaries.max()
return pd.DataFrame({'SecondHighestSalary': [second]})Let’s break this one down:
if employee.empty:
return pd.DataFrame({'SecondHighestSalary': [None]})This early return covers the completely empty table. With no rows present, there is no salary level at all, so the function returns a one-row frame with None as the second highest salary value.
max_salary = employee['salary'].max()This line scans the salary column once and pulls out the highest value present. Pandas walks the series and tracks the largest number it has seen so far. That mirrors a MAX(salary) aggregate in SQL.
lower_salaries = employee.loc[employee['salary'] < max_salary, 'salary']A boolean filter narrows down to salaries that fall strictly below the top level. The expression employee[‘salary’] < max_salary builds a boolean mask with one entry per row. That mask selects rows where the salary is below the maximum, and the final , ‘salary’ narrows the result to the salary column again. When every row carries the same salary as max_salary, this series ends up empty.
if lower_salaries.empty:
return pd.DataFrame({'SecondHighestSalary': [None]})This guard handles cases where no lower level exists. That covers both the “only one distinct salary level” case and the situation where max_salary is present but no row has a smaller salary. The return value again uses a single row with None to match the LeetCode requirements.
second = lower_salaries.max()The second pass through the data now works only over rows that survived the filter. max() on lower_salaries gives the highest salary below the top level, which is exactly the second highest salary level in the table.
return pd.DataFrame({'SecondHighestSalary': [second]})Just like the first solution, the last line wraps the scalar value in a frame with the required column name for submission.
Time complexity for this version centers on scans rather than sorting. The first max() over employee[‘salary’] traverses N rows to find max_salary. Building lower_salaries with a boolean mask touches N rows again to evaluate employee[‘salary’] < max_salary and extract the filtered series, and the final max() over lower_salaries walks only the remaining entries, which in the worst case is still on the order of N. Total time is therefore O(N). Memory use includes the temporary boolean mask and the lower_salaries series, both sized to N, so the space cost is O(N) and can exceed the O(U) space of the distinct salary sort when U is much smaller than N. Mechanically, this solution relies on Series.max, boolean indexing with DataFrame.loc, and a second aggregate, mirroring a pair of MAX aggregates in SQL and replacing the distinct sort from the first solution with two scans and a filter, where the first solution spends extra time on sort_values over U distinct levels with space near O(U), while this version trades that sort work for linear scans and higher O(N) temporary space.


