locked
C# Find IndexOf Array in Array in Huge arrays comparing RRS feed

  • Question

  • Hi All,

    I have an array and i want to find the index of that array in a bigger array. For example:

    arr1 = [8,10,3,4,7,2]
    arr2 = [10,3]

    The result should be '1'.

    But i want to make this bigger, i have like 20 arrays of +/-10 000 bytes. I take the first array split that into 16-byte arrays and loop with these chunks the other 19bytearrays to know where i find matches of these bytes. But this is happening very slowly. I already made it a bit faster, but it's not enough.

    int length = 16;
    int number_of_counting = 0;
    
    Parallel.For(0, bytes.Length - length, i =>
    {
        byte[] base_bytes = bytes.Slice(i, length);
    
        if (AllFilesContainBytes(files, base_bytes))
        {
            Console.WriteLine("{0}", BitConverter.ToString(base_bytes).Replace('-', ' '));
            number_of_counting++;
        }
    });
    
    private bool AllFilesContainBytes(List<byte[]> files, byte[] base_bytes)
    {
        if (base_bytes.Sum(b => (int)b) > 0)
        {
            bool is_found = true;
    
            Parallel.ForEach(files, (file, state) =>
            {
                if (file.FindArrayIndex(base_bytes) < 0)
                {
                    is_found = false;
                    state.Stop();
                    return;
                }
            });
    
            return is_found;
        }
    
        return false;
    }
    
    public static int FindArrayIndex<T>(this T[] haystack, T[] needle) //fastest way untill now
    {
        //int result_idx = -1;
    
        //Parallel.For(0, haystack.Length - needle.Length, (i, state) =>
        for (int i = 0; i < haystack.Length - needle.Length; i++)
        {
            if (needle[0].Equals(haystack[i]))
            {
                int idx = i;
    
                for (int j = 1; j < needle.Length; j++)
                {
                    if (!needle[j].Equals(haystack[i + j]))
                    {
                        idx = -1;
                        break;
                    }
                }
    
                if (idx >= 0)
                {
                    return idx;
                    //result_idx = idx;
                    //state.Stop();
                }
            }
        }
        //});
    
        //return result_idx;
        return -1;
    }
    //public static int FindArrayIndex<T>(this T[] haystack, T[] needle)
    //{
    //    int idx = Array.IndexOf(haystack, needle[0]);
    
    //    while (idx >= 0 && !FindArrayIndexMatch(haystack, needle, idx))
    //    {
    //        idx = Array.IndexOf(haystack, needle[0], idx);
    //    }
    
    //    return idx;
    //}
    //private static bool FindArrayIndexMatch<T>(T[] haystack, T[] needle, int idx)
    //{
    //    for (int i = 0; i < needle.Length; i++)
    //    {
    //        if (!haystack[idx + i].Equals(needle[i]))
    //        {
    //            return false;
    //        }
    //    }

    //    return true;
    //} public static T[] Slice<T>(this T[] haystack, int index, int length) { T[] result = new T[length]; Array.Copy(haystack, index, result, 0, length); return result; }

    Thanks a lot!

    Tuesday, October 4, 2016 8:34 AM

Answers

  • Hi Condra963,

    Thank you for posting here.

    For your question, you want to find index of arrays.

    In your code, you want to split the array into 1616-byte arrays and then find index of arrays. You want to make it faster?

    How about use LINQ to group all items by the chunk size and create new Arrays afterwards.

    Here is a simple example.

    // build sample data with 1200 Strings
    string[] items = Enumerable.Range(1, 1200).Select(i => "Item" + i).ToArray();
    // split on groups with each 100 items
    String[][] chunks = items
                        .Select((s, i) => new { Value = s, Index = i })
                        .GroupBy(x => x.Index / 100)
                        .Select(grp => grp.Select(x => x.Value).ToArray())
                        .ToArray();
    
    for (int i = 0; i < chunks.Length; i++)
    {
        foreach (var item in chunks[i])
            Console.WriteLine("chunk:{0} {1}", i, item);
    }

    And you could use this way to find index of arrays quickly. Convert the array into a hash. Then look for the key like the code below.

    array = ['a', 'b', 'c']
    hash = Hash[array.map.with_index.to_a]    # => {"a"=>0, "b"=>1, "c"=>2}
    hash['b'] # => 1

    I hope this would be helpful to you.

    If you have something else, please feel free to contact us.

    Best Regards,

    Wendy


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Wednesday, October 5, 2016 8:51 AM

All replies

  • Hi Condra963,

    Thank you for posting here.

    For your question, you want to find index of arrays.

    In your code, you want to split the array into 1616-byte arrays and then find index of arrays. You want to make it faster?

    How about use LINQ to group all items by the chunk size and create new Arrays afterwards.

    Here is a simple example.

    // build sample data with 1200 Strings
    string[] items = Enumerable.Range(1, 1200).Select(i => "Item" + i).ToArray();
    // split on groups with each 100 items
    String[][] chunks = items
                        .Select((s, i) => new { Value = s, Index = i })
                        .GroupBy(x => x.Index / 100)
                        .Select(grp => grp.Select(x => x.Value).ToArray())
                        .ToArray();
    
    for (int i = 0; i < chunks.Length; i++)
    {
        foreach (var item in chunks[i])
            Console.WriteLine("chunk:{0} {1}", i, item);
    }

    And you could use this way to find index of arrays quickly. Convert the array into a hash. Then look for the key like the code below.

    array = ['a', 'b', 'c']
    hash = Hash[array.map.with_index.to_a]    # => {"a"=>0, "b"=>1, "c"=>2}
    hash['b'] # => 1

    I hope this would be helpful to you.

    If you have something else, please feel free to contact us.

    Best Regards,

    Wendy


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Wednesday, October 5, 2016 8:51 AM
  • Thank you very much Wendy!

    The linq method is something i already tried before, but for what i need it's not good enough.

    The part about the hash was very helpful, something i didn't knew.

    Kind regards!

    Friday, February 10, 2017 8:57 AM
  • Hi Condra963,

    I am glad that the reply would be helpful to you.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Friday, February 10, 2017 9:13 AM