locked
Large DSC File Sets (> petabyte) RRS feed

  • Question

  • I am creating DSC File Sets of IIS log files that will continue to grow on a scheduled basis. The initial file set size will be 1.2 petabytes. We will want to keep 2 years of data in the file set with incrimental upates to occur every 15 minutes.

    What ae my options?

    Sealed file sets cannot be changed. Can they be re-opened and have additional data added? With .AddExistingFile, can I take exisitng files in one file set and add them to another file set?

     


    --Patrick Gallucci
    Wednesday, September 21, 2011 3:11 PM

Answers

  • Yes, this is one of the scenarios which .AddExistingFile is intended to address. For your incremental updates, you can create a new fileset and use AddExistingFile to add the old data, then AddNewFile to add the incremental update.
    Wednesday, September 21, 2011 4:42 PM
  • Yes, this is one of the scenarios which .AddExistingFile is intended to address. For your incremental updates, you can create a new fileset and use AddExistingFile to add the old data, then AddNewFile to add the incremental update.
    Wednesday, September 21, 2011 4:43 PM
  • Yes, the physical files can be in multiple file sets. Replication will ensure that there are n replicas of a file, where n is the largest replication factor of any fileset to which the file belongs. File sets inherit the cluster replication factor at the time they're created, so if you change the cluster replication factor after creating some file sets, the existing file sets will not be affected, but newly created file sets will get the new replication factor.

    So, if file A belongs to file sets X and Y, where X has replication factor 2 and Y has replication factor 3, replication will ensure that there are 3 replicas of file A. However, if all your filesets have the same replication factor, then adding an existing file to a new fileset won't affect replication.

    Wednesday, September 21, 2011 5:22 PM

All replies

  • Yes, this is one of the scenarios which .AddExistingFile is intended to address. For your incremental updates, you can create a new fileset and use AddExistingFile to add the old data, then AddNewFile to add the incremental update.
    Wednesday, September 21, 2011 4:42 PM
  • Yes, this is one of the scenarios which .AddExistingFile is intended to address. For your incremental updates, you can create a new fileset and use AddExistingFile to add the old data, then AddNewFile to add the incremental update.
    Wednesday, September 21, 2011 4:43 PM
  • Does this mean that physical files can be in more than 1 file set? Is Replication affected by this?
    --Patrick Gallucci
    Wednesday, September 21, 2011 5:15 PM
  • Yes, the physical files can be in multiple file sets. Replication will ensure that there are n replicas of a file, where n is the largest replication factor of any fileset to which the file belongs. File sets inherit the cluster replication factor at the time they're created, so if you change the cluster replication factor after creating some file sets, the existing file sets will not be affected, but newly created file sets will get the new replication factor.

    So, if file A belongs to file sets X and Y, where X has replication factor 2 and Y has replication factor 3, replication will ensure that there are 3 replicas of file A. However, if all your filesets have the same replication factor, then adding an existing file to a new fileset won't affect replication.

    Wednesday, September 21, 2011 5:22 PM
  • Awesome! Thanks!
    --Patrick Gallucci
    Wednesday, September 21, 2011 5:50 PM