Sunday, April 18, 2021

heap

 A MinHeap is a binary tree where the node's value is smaller than its children. (vs MaxHeap)

https://realpython.com/python-heapq-module/

https://en.wikipedia.org/wiki/Heap_(data_structure)

https://stackoverflow.com/questions/2501457/what-do-i-use-for-a-max-heap-implementation-in-python

https://www.quora.com/What-are-the-time-complexities-of-heap

insertion O(logn)

deletion O(logn) to remove the first node and reorder to maintain the heap


def heapsort(nums):
    h = []
    for i in range(len(nums)):
        heapq.heappush(h, nums[i])
        print('h',h)
    for i in range(len(nums)):
        print('pop',heapq.heappop(h))
heapsort([29,38,2,10,1])

Tuesday, March 30, 2021

MySQL 8.0 RANK()

 https://towardsdatascience.com/mysql-how-to-write-a-query-that-returns-the-top-records-in-a-group-12865695f436#:~:text=MySQL%208.0%20introduces%20a%20rank,each%20row%20within%20each%20partition.

For each year and each month, rank records by decreasing order_amount.

year, month, amount, rank
2018, 11, 255.0, 1
2018, 11, 200.0, 2
2018, 11, 175.0, 3
...
2018, 12, 321.0, 1
2018, 12, 321.0, 1
2018, 12, 145.0, 2
...


MySQL 5.7 using variable assignment and IF(expr, true, false)

SELECT order_number, customer_number, customer_name, order_date,
YEAR(order_date) AS order_year,
MONTH(order_date) AS order_month,
order_amount,
@order_rank := IF(@current_month = MONTH(order_date),
@order_rank + 1, 1) AS order_rank,
@current_month := MONTH(order_date)
FROM orders
ORDER BY order_year, order_month, order_amount DESC


MySQL 8.0 using RANK() OVER(PARTITION BY ... ORDER BY ...)

SELECT order_number, customer_number, customer_name, order_date,
 YEAR(order_date) AS order_year, 
 MONTH(order_date) AS order_month, 
 order_amount,
 RANK() OVER (
 PARTITION BY YEAR(order_date), MONTH(order_date)
ORDER BY YEAR(order_date), MONTH(order_date), order_amount DESC) order_value_rank
FROM orders;



SELECT  RANK() OVER(ORDER BY cars_sold DESC) AS rank_sales,
        first_name,
        last_name,
        cars_sold
FROM salespeople;


Window functions:

Table 12.26 Window Functions

NameDescription
CUME_DIST()Cumulative distribution value
DENSE_RANK()Rank of current row within its partition, without gaps
FIRST_VALUE()Value of argument from first row of window frame
LAG()Value of argument from row lagging current row within partition
LAST_VALUE()Value of argument from last row of window frame
LEAD()Value of argument from row leading current row within partition
NTH_VALUE()Value of argument from N-th row of window frame
NTILE()Bucket number of current row within its partition.
PERCENT_RANK()Percentage rank value
RANK()Rank of current row within its partition, with gaps
ROW_NUMBER()Number of current row within its partition


mysql> SELECT val, ROW_NUMBER() OVER w AS 'row_number', CUME_DIST() OVER w AS 'cume_dist', PERCENT_RANK() OVER w AS 'percent_rank' FROM numbers WINDOW w AS (ORDER BY val); +------+------------+--------------------+--------------+ | val | row_number | cume_dist | percent_rank | +------+------------+--------------------+--------------+ | 1 | 1 | 0.2222222222222222 | 0 | | 1 | 2 | 0.2222222222222222 | 0 | | 2 | 3 | 0.3333333333333333 | 0.25 | | 3 | 4 | 0.6666666666666666 | 0.375 | | 3 | 5 | 0.6666666666666666 | 0.375 | | 3 | 6 | 0.6666666666666666 | 0.375 | | 4 | 7 | 0.8888888888888888 | 0.75 | | 4 | 8 | 0.8888888888888888 | 0.75 | | 5 | 9 | 1 | 1 | +------+------------+--------------------+--------------+


mysql> SELECT time, subject, val, SUM(val) OVER (PARTITION BY subject ORDER BY time ROWS UNBOUNDED PRECEDING) AS running_total, AVG(val) OVER (PARTITION BY subject ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS running_average FROM observations; +----------+---------+------+---------------+-----------------+ | time | subject | val | running_total | running_average | +----------+---------+------+---------------+-----------------+ | 07:00:00 | st113 | 10 | 10 | 9.5000 | | 07:15:00 | st113 | 9 | 19 | 14.6667 | | 07:30:00 | st113 | 25 | 44 | 18.0000 | | 07:45:00 | st113 | 20 | 64 | 22.5000 | | 07:00:00 | xh458 | 0 | 0 | 5.0000 | | 07:15:00 | xh458 | 10 | 10 | 5.0000 | | 07:30:00 | xh458 | 5 | 15 | 15.0000 | | 07:45:00 | xh458 | 30 | 45 | 20.0000 | | 08:00:00 | xh458 | 25 | 70 | 27.5000 | +----------+---------+------+---------------+-----------------+

mysql> SELECT time, subject, val, FIRST_VALUE(val) OVER w AS 'first', LAST_VALUE(val) OVER w AS 'last', NTH_VALUE(val, 2) OVER w AS 'second', NTH_VALUE(val, 4) OVER w AS 'fourth' FROM observations WINDOW w AS (PARTITION BY subject ORDER BY time ROWS UNBOUNDED PRECEDING); +----------+---------+------+-------+------+--------+--------+ | time | subject | val | first | last | second | fourth | +----------+---------+------+-------+------+--------+--------+ | 07:00:00 | st113 | 10 | 10 | 10 | NULL | NULL | | 07:15:00 | st113 | 9 | 10 | 9 | 9 | NULL | | 07:30:00 | st113 | 25 | 10 | 25 | 9 | NULL | | 07:45:00 | st113 | 20 | 10 | 20 | 9 | 20 | | 07:00:00 | xh458 | 0 | 0 | 0 | NULL | NULL | | 07:15:00 | xh458 | 10 | 0 | 10 | 10 | NULL | | 07:30:00 | xh458 | 5 | 0 | 5 | 10 | NULL | | 07:45:00 | xh458 | 30 | 0 | 30 | 10 | 30 | | 08:00:00 | xh458 | 25 | 0 | 25 | 10 | 30 | +----------+---------+------+-------+------+--------+--------+

https://en.wikipedia.org/wiki/Percentile_rank