admin管理员组

文章数量:1431918

Just as title reads, I need to check whether the number of unique entries within array exceeds n.

Array.prototype.some() seems to fit perfectly here, as it will stop cycling through the array right at the moment, positive answer is found, so, please, do not suggest the methods that filter out non-unique records and measure the length of resulting dataset as performance matters here.

So far, I use the following code, to check if there's more than n=2 unique numbers:

const res = [1,1,2,1,1,3,1,1,4,1].some((e,_,s,n=2) => s.indexOf(e) != s.lastIndexOf(e) ? false : n-- ? false : true);

console.log(res);
.as-console-wrapper { min-height: 100%}

Just as title reads, I need to check whether the number of unique entries within array exceeds n.

Array.prototype.some() seems to fit perfectly here, as it will stop cycling through the array right at the moment, positive answer is found, so, please, do not suggest the methods that filter out non-unique records and measure the length of resulting dataset as performance matters here.

So far, I use the following code, to check if there's more than n=2 unique numbers:

const res = [1,1,2,1,1,3,1,1,4,1].some((e,_,s,n=2) => s.indexOf(e) != s.lastIndexOf(e) ? false : n-- ? false : true);

console.log(res);
.as-console-wrapper { min-height: 100%}

And it returns false while there's, obviously 3 unique numbers (2,3,4).

Your help to figure out what's my (stupid) mistake here is much appreciated.

p.s. I'm looking for a pure JS solution

Share Improve this question edited Sep 20, 2019 at 15:38 Yevhen Horbunkov asked Sep 20, 2019 at 14:45 Yevhen HorbunkovYevhen Horbunkov 15.6k3 gold badges27 silver badges45 bronze badges 6
  • You've got an O(n^2) algorithm, if you're worried about performance. – Pointy Commented Sep 20, 2019 at 14:49
  • @Pointy : now it seems to be O(n) to me. Nonetheless, no luck still. – Yevhen Horbunkov Commented Sep 20, 2019 at 14:56
  • 3 Well you pletely changed the code; now the .sort() makes it O(n log(n)). You have to make a plete pass over the array to determine what you're looking for. The simplest way to do it would be to keep a mapping of values to repeat counts, and then after looking at the entire array just check to see if there are at least n entries with repeat count 1. – Pointy Commented Sep 20, 2019 at 14:59
  • 1 Time plexity can't get better than O(n) because every number in the array must be visited to find the number of unique numbers. – Nikhil Commented Sep 20, 2019 at 15:31
  • 1 Any solution involving sorting wouldn't give better time plexity than O(n log(n)). We can achieve a better O(n) time plexity using map as mentioned in my answer. – Nikhil Commented Sep 20, 2019 at 15:41
 |  Show 1 more ment

7 Answers 7

Reset to default 2

You can use a Map() with array values as map keys and count as values. Then iterate over map values to find the count of unique numbers. If count exceeds the limit return true, if not return false.

Time plexity is O(n). It can't get better than O(n) because every number in the array must be visited to find the count of unique numbers.

var data = [1, 1, 2, 1, 1, 3, 1, 1, 4, 1];

function exceedsUniqueLimit(limit) {
  var map = new Map();

  for (let value of data) {
    const count = map.get(value);
    if (count) {
      map.set(value, count + 1);
    } else {
      map.set(value, 1);
    }
  }

  var uniqueNumbers = 0;

  for (let count of map.values()) {
    if (count === 1) {
      uniqueNumbers++;
    }

    if (uniqueNumbers > limit) {
      return true;
    }
  }

  return false;
}

console.log(exceedsUniqueLimit(2));

To know if a value is unique or duplicate, the whole array needs to be scanned at least once (Well, on a very large array there could be a test to see how many elements there is left to scan, but the overhead for this kind of test will make it slower)

This version uses two Set

function uniqueLimit(data,limit) {
  let
    dup = new Set(),
    unique = new Set(),
    value = null;
  for (let i = 0, len = data.length; i < len; ++i) {
    value = data[i];
    if ( dup.has(value) ) continue;
    if ( unique.has(value) ) {
      dup.add(value);
      unique.delete(value);
      continue;
    }
    unique.add(value);
  }
  return unique.size > limit;
}

I also tried this version, using arrays:

function uniqueLimit(data, limit) {
  let unique=[], dup = [];
  for (let idx = 0, len = data.length; idx < len; ++idx) {
    const value = data[idx];
    if ( dup.indexOf(value) >= 0 ) continue;
    const pos = unique.indexOf(value); // get position of value
    if ( pos >= 0 ) {
      unique.splice(pos,1); // remove value
      dup.push(value);
      continue;
    }
    unique.push(value);
  }
  return unique.length > limit;
};

I tested several of the solutions in this thread, and you can find the result here. If there are only a few unique values, the method by using arrays is the fastest, but if there are many unique values it quickly bees the slowest, and on large arrays slowest by several magnitudes.

More profiling

I did some more tests with node v12.10.0. The results are normalized after the fastest method for each test.

Worst case scenario: 1000000 entries, all unique:

Set     1.00     // See this answer
Map     1.26     // See answer by Nikhil
Reduce  1.44     // See answer by Bali Balo
Array   Infinity // See this answer

Best case scenario: 1000000 entries, all the same:

Array   1.00
Set     1.16
Map     2.60
Reduce  3.43

Question test case: [1, 1, 2, 1, 1, 3, 1, 1, 4, 1]

Array    1.00
Map      1.29
Set      1.47
Reduce   4.25

Another test case: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1, 1,1,1,1,1,1,1,3,4,1,1,1,1,1,1,1,2,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,5 ]

Array    1.00
Set      1.13
Map      2.24
Reduce   2.39

Conclusion

The method that uses Set works for both small and large arrays, and performs well regardless of if there are many unique values or not. The version that are using arrays can be faster if there are few unique values, but quickly bees very slow if there are many unique values.

Using sets, We count hypothetical unique set size and duplicateSet size and delete unique set element for each duplicate found. If unique set size goes below n, we stop iterating.

function uniqueGtN(res, n) {
  let uniqSet = new Set(res);
  let max = uniqSet.size;
  if (max <= n) return false;
  let dupSet = new Set();
  return !res.some(e => {
    if (dupSet.has(e)) {
      if (uniqSet.has(e)) {
        uniqSet.delete(e);
        console.log(...uniqSet);
        return (--max <= n);
      }
    } else {
      dupSet.add(e);
    }
  });
}
console.log(uniqueGtN([1, 1, 2, 1, 1, 3, 3, 1], 2));

From your original solution, I have changed few things, it seems to be working fine:

(function() {
  const array = [1,1,2,1,1,3,1,1,4,1];
  
  function hasExceedingUniqueNumber(array, number) {
    return array.some((e,_,s,n=number) => {
      let firstIndex = s.indexOf(e);
      let lastIndex = s.lastIndexOf(e);

      // NOT unique 
      if (firstIndex != lastIndex) {
        return false;
      }

      // unique
      return e > n;
    });
  }

  console.log('1', hasExceedingUniqueNumber(array, 1));
  console.log('2', hasExceedingUniqueNumber(array, 2));
  console.log('3', hasExceedingUniqueNumber(array, 3));
  console.log('4', hasExceedingUniqueNumber(array, 4));
 })();

So the shorter version looks like this:

(function() {
  const array = [1,1,2,1,1,3,1,1,4,1];

  function hasExceedingUniqueNumber(array, number) {
    return array.some((e,_,s,n=number) => s.indexOf(e) != s.lastIndexOf(e) ? false : e > n);
  }

  console.log('1', hasExceedingUniqueNumber(array, 1));
  console.log('2', hasExceedingUniqueNumber(array, 2));
  console.log('3', hasExceedingUniqueNumber(array, 3));
  console.log('4', hasExceedingUniqueNumber(array, 4));
 })();

The code listed in your question does not work because m is not shared across the calls to the some callback function. It is a parameter, and its value is 2 at each iteration.

To fix this, either put m outside, or use the thisArg of the some function (but that means you can't use an arrow function)

let m = 2;
const res = [1,1,1,2,1,1,3,1,1,1,4,1,1]
    .sort((a,b) => a-b)
    .some((n,i,s) => i > 0 && n == s[i-1] ? !(m--) : false);

// ----- or -----

const res = [1,1,1,2,1,1,3,1,1,1,4,1,1]
    .sort((a,b) => a-b)
    .some(function(n,i,s) { return i > 0 && n == s[i-1] ? !(this.m--) : false; }, { m: 2 });

Note: this code seems to count if the number of duplicates exceeds a certain value, not the number of unique values.


As another side note, I know you mentioned you did not want to use a duplicate removal algorithm, but performant ones (for example hash-based) would result in something close to O(n).
Here is a solution to count all the values appearing exactly once in the initial array. It is a bit obfuscated and hard to read, but you seem to be wanting something concise. It is the most performant I can think of, using 2 objects to store values seen at least once and the ones seen multiple times:

let res = [1,1,2,3,4].reduce((l, e) => (l[+!l[1][e]][e] = true, l), [{},{}]).map(o => Object.keys(o).length).reduce((more,once) => once-more) > 2;

Here is the less minified version for people who don't like the short version:

let array = [1,1,2,3,4];
let counts = array.reduce((counts, element) => {
    if (!counts.atLeastOne[element]) {
        counts.atLeastOne[element] = true;
    } else {
        counts.moreThanOne[element] = true;
    }
    return counts;
}, { atLeastOne: {}, moreThanOne: {} });
let exactlyOnceCount = Object.keys(counts.atLeastOne).length - Object.keys(counts.moreThanOne).length;
let isOverLimit = exactlyOnceCount > 2;

Whenever I have a type of problem like this, I always like to peek at how the underscore JS folks have done it.

[Ed again: removed _.countBy as it isn't relevant to the answer]

Use the _.uniq function to return a list of unique values in the array:

var u = _.uniq([1,1,2,2,2,3,4,5,5]); // [1,2,3,4,5]
if (u.length > n) { ...};

[ed:] Here's how we might use that implementation to write our own, opposite function that returns only non-unique collection items

function nonUnique(array) {
  var result = [];
  var seen = [];
  for (var i = 0, length = array.length; i < length; i++) {
    var value = array[i];
    if (seen.indexOf(value) === -1) { // warning! naive assumption
      seen.push(value);
    } else {
      result.push(value);
    }
  }
  console.log("non-unique result", result);
  return result;
};

function hasMoreThanNUnique(array, threshold) {
  var uArr = nonUnique(array);
  var accum = 0;
  for (var i = 0; i < array.length; i++) {
    var val = array[i];
    if (uArr.indexOf(val) === -1) {
      accum++;
    }
    if (accum > threshold) return true;
  }
  return false;
}

var testArrA = [1, 1, 2, 2, 2, 3, 4, 5]; // unique values: [3, 4, 5]
var testArrB = [1, 1, 1, 1, 4]; // [4]
var testResultsA = hasMoreThanNUnique(testArrA, 3)
console.log("testArrA and results", testResultsA);

var testResultsB = hasMoreThanNUnique(testArrB, 3);
console.log("testArrB and results", testResultsB);

So far, I came up with the following:

const countNum = [1,1,1,2,1,1,3,1,1,1,4,1,1].reduce((r,n) => (r[n]=(r[n]||0)+1, r), {});
const res = Object.entries(countNum).some(([n,q]) => q == 1 ? !(m--) : false, m=2);
console.log(res);
.as-console-wrapper{min-height:100%}

But I don't really like array->object->array conversion about that. Is there a faster and (at the same time pact) solution?

本文标签: javascriptChecking whether the number of unique numbers within array exceeds nStack Overflow