none
Regex match the pattern considering any character at the midle RRS feed

  • Question

  • In order to list all the files in a directory I'm trying to ignore part of the filename in the regex pattern, but it is not working. My files have the following format:

    20200101_TableA_100000_output.csv

    20200101_TableA_100000_output.zip

    20200101_TableA_200000_output.csv

    20200101_TableA_200000_output.zip

    20200101_TableB_100000_output.csv

    20200101_TableB_100000_output.zip

    I want to get all table A and table B csv files, so I did the followng only to test (the tablevalues will come from a dictionary):

    var pattern = "TableA(.*)_IRRBB.csv";
    var queries= from f in Directory.EnumerateFiles(inputSourceFiles, pattern) 
                          select f;

    But it is returning nothing. Could you please, help me on that? 

    Thursday, January 2, 2020 9:15 AM

Answers

  • As documented here Directory.EnumerateFiles does not support RE. You are limited to simple pattern matching as supported by the underlying Win32 call. Basically you are limited to * and ? and even then it is limited. For example this pattern will give you the results you expect given the sample set you provided.

    var pattern = "*_Table?_*.csv";
    
    //Returns 3 files
    var files = Directory.EnumerateFiles(inputSourceFiles, pattern);
    
    

    There is nothing wrong with doing sub filtering after you retrieve the initial list of files. Use the pattern to eliminate most of the files and then filter from there using RE or whatever else you might need to get the final list. For example suppose you then only want files where the last number is greater than 100000.

    //Do some more filtering, could be combined with EnumerateFiles call if desired
    files = from f in files
            let tokens = f.Split('_')
            where tokens.Length >= 3 && Int32.TryParse(tokens[2], out var num) && num > 100000
            select f;
    Use the simple search pattern to filter out the bulk of the files and then use more powerful tools (RE or whatever) to filter out the more complex stuff.

    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by FcabralJ Friday, January 3, 2020 3:37 PM
    Thursday, January 2, 2020 3:50 PM
    Moderator

All replies

  • Try this pattern (Table[A-Z]_[0-9]+)

    TR| Sorunuzun yanıtı bu ise "Yanıt Olarak Öner" olarak işaretleyin, eğer faydalı bir yorum ise "Oy Ver"erek forumun işleyişine katkıda bulunabilirsiniz. EN| If this is the answer to your problem, mark "Propose as Answer" and if it is helpful, you can contribute to the workig of the forum by "Voting".


    Thursday, January 2, 2020 10:56 AM
  • Egoist Developer, The part you suggested I think it worked well but the overall pattern is not working due to concatenation or misunderstanding parts

    My code is like the following now: 

    var inputSourceFiles=@"C:\testFolder\";

    var file = @"TableA"

    var pattern = new Regex(@"(^\w+)" + file + @"_(\d+_)?IRRBB.csv", RegexOptions.Compiled) var inputCfgFiles = Directory.GetFiles(inputSourceFiles, ".csv").Where(path => regexExpression.IsMatch(path)).ToList()

    To highlight, the number in the middle is optional that's why I tried to include the ? symbol. Sometimes the files inside the inputSourceFiles come like this: "C:\testFolder\AnotherTable_output.csv" and sometimes "C:\testFolder\AnotherTable_9000_output.csv"




    • Edited by FcabralJ Thursday, January 2, 2020 12:43 PM
    Thursday, January 2, 2020 12:05 PM
  • As documented here Directory.EnumerateFiles does not support RE. You are limited to simple pattern matching as supported by the underlying Win32 call. Basically you are limited to * and ? and even then it is limited. For example this pattern will give you the results you expect given the sample set you provided.

    var pattern = "*_Table?_*.csv";
    
    //Returns 3 files
    var files = Directory.EnumerateFiles(inputSourceFiles, pattern);
    
    

    There is nothing wrong with doing sub filtering after you retrieve the initial list of files. Use the pattern to eliminate most of the files and then filter from there using RE or whatever else you might need to get the final list. For example suppose you then only want files where the last number is greater than 100000.

    //Do some more filtering, could be combined with EnumerateFiles call if desired
    files = from f in files
            let tokens = f.Split('_')
            where tokens.Length >= 3 && Int32.TryParse(tokens[2], out var num) && num > 100000
            select f;
    Use the simple search pattern to filter out the bulk of the files and then use more powerful tools (RE or whatever) to filter out the more complex stuff.

    Michael Taylor http://www.michaeltaylorp3.net

    • Marked as answer by FcabralJ Friday, January 3, 2020 3:37 PM
    Thursday, January 2, 2020 3:50 PM
    Moderator