Understanding Proximal Policy Optimization (PPO) in Reinforcement Learning

Today's Progress

Started some Heap Problems.


  class Solution {
      public int[] kWeakestRows(int[][] mat, int k) {
          PriorityQueue pq = new PriorityQueue<>((a, b) -> {
              if (a[0] == b[0]) return a[1] - b[1];
              return a[0] - b[0];
          });

          for (int i = 0; i < mat.length; i++) {
              int count = 0;
              for (int j = 0; j < mat[i].length; j++) {
                  if (mat[i][j] == 1) count++;
              }
              pq.offer(new int[]{count, i});
          }

          int[] result = new int[k];
          for (int i = 0; i < k; i++) {
              result[i] = pq.poll()[1];
          }
          return result;
      }
  }