none
SharePoint Web App randomly gives 302 redirect to re-authenticate even with valid FedAuth cookie (from an ADFS STS)

    Question

  • My SP2010 Web App will at seemingly random intervals redirect a user logged in through ADFS with a valid token to the login screen causing loss of work.

    This does not occur for the users of the same Web App that are logged in under Windows Basic Authentication. 

    I have programmatically viewed the SPSessionAuthenticationModule.ContextSessionSecurityToken and the timeouts all look good (IsValid, KeyEffectiveTime, etc) all show a token that should be valid for 10+ more hours).

    Sniffing the packets with wireshark I see the GET request go in and everything looks fine, FedAuth token, etc.  However i see the response is a 302 redirect to /_layouts/authenticate.aspx with a Set-Cookie: FedAuth=; expires=Thu, 01-Jan-1970 06:00:00 GMT; path=/

    We do have the 2012 April CU applied if that matters.  We are using Persistent Cookies.

    We are using AAM but it appears configured correctly (nothing in ULS regarding any issues here) (a suggestion someone had in a related post)

    We do not have any search service crawls occurring on the system (a suggestion someone had in a related post)

    The app-pool never recycles...

    At this point we are desperate for any suggestions.  I can provide any further information requested.

    Regards,

    Steven

    Wednesday, June 27, 2012 8:39 PM

Answers


  • Ok our tests are in now.  This is what we see though testing:

    MaxServiceTokenCacheItems = 250
    MaxLogonTokenCacheItems = 250
    Result = Booted in 5-10 minutes

    MaxServiceTokenCacheItems = 10000
    MaxLogonTokenCacheItems = 10000
    Result = Cannot break +30 minutes

    MaxServiceTokenCacheItems = 25
    MaxLogonTokenCacheItems = 25
    Result = Booted quickly (1-2 minutes)


    MaxServiceTokenCacheItems = 25
    MaxLogonTokenCacheItems = 10000
    Result = Cannot break (40  minutes)

    MaxServiceTokenCacheItems = 10000
    MaxLogonTokenCacheItems = 25
    Result =  Booted (1-3 minutes)

    What we have here is what would appear to be something that alleviates our problem...  MaxLogonTokenCacheItems being set to a higher value is providing relief.  I am not saying this is a solution until I hear from SOMEONE AT MICROSOFT as to why this is or whether it is a valid thing to do.

    Microsoft states about MaxLogonTokenCacheItems: The strong cache keeps the most recently used items to guarantee the token is alive during the life of the request. The weak cache can release resources by garbage collection under memory pressure.

    Until I have the following questions answered I consider this a temporary fix:

    1. What does it mean for us to make it 10000?  Does it mean that we will start experiencing issues that are memory related in another area?  

    2. Why does this only appear for our ADFS users and not our Active Directory/Windows Auth users, or forms based users?

    3. Is setting the value to 10000 simply covering up a real bug that we may see in an upcoming patch?

    If someone at Microsoft could answer these question for us we would so appreciated it. :)

    Steve

    Thursday, September 20, 2012 10:41 PM

All replies

  • I have been doing more research and see how the 2012 April CU was recalled.  Could anyone provide any sort of confirmation whether this problem is related and if it will be fixed soon?  

    Our SharePoint installation lives on a VM, we could roll back that VM to prior to the 2012 April CU but the database would still be post the April patch.  If no schemas changed, would that be a possibility?

    Regards,

    Steven

    Thursday, June 28, 2012 2:05 PM
  • We have been throwing everything we can think of (including the kitchen sink) to fix this problem to no avail.  Including putting this code that runs on every page to re-issue a new security token (you see the Set-Cookie: <new FedAuth token> on every http.response).  Still at random intervals you get the 302 with Set-Cookie: FedAuth to blank! 

           public void Page_Init(object sender, EventArgs e)

           {
                FederatedAuthentication.SessionAuthenticationModule.SessionSecurityTokenReceived += FixSession;
            }

            protected void FixSession(object sender, SessionSecurityTokenReceivedEventArgs e)
            {
                SessionAuthenticationModule sam = FederatedAuthentication.SessionAuthenticationModule;

                DateTime now = DateTime.UtcNow;
                e.SessionToken = sam.CreateSessionSecurityToken(e.SessionToken.ClaimsPrincipal, e.SessionToken.Context, now,
                                                                now.AddMinutes(960), e.SessionToken.IsPersistent);
                e.ReissueCookie = true;

            }

    Does anyone have any further suggestions?

    Steve

    Monday, July 02, 2012 7:07 PM
  • Hello,

    Thank you for your post.

    This is a quick note to let you know that we are performing research on this issue.

    Thanks,

    Tuesday, July 03, 2012 2:23 AM
    Moderator
  • Hey Steven,

    I do not believe that the April CU has anything to do with your issue. The SP regression in the April CU was related to sharepoint lists having issues. This has been addressed already in the latest April CU release that is now on the MSFT downloads page. So are you trying to jump from 1 web application to another web application and pass the credentials from 1 web application another. I have seen people develop a "SingleSignOn HTTPModule which handles the re-authentication when you go between the web applications."

    In their custom code they ran into a similar issue that you are seeing, the solution is below:

    Initially we were setting a redirection in the Response at the BeginRequest stage of the page lifecycle. We have moved the action to the EndRequest stage of the lifecycle and we no longer get the error from the SPFederationAuthenticationModule.

    Here is a related blog but not exactly the same as your issue: http://social.technet.microsoft.com/Forums/en-US/sharepoint2010general/thread/1ad56eb1-8762-4c4b-ac8f-f29692012a58

    If you could give me a little more verification of what you are trying to do in the browser that results in this action, that would be helpful.  Also, what authentication method are you using on your web applications?  Thanks,


    Tuesday, July 03, 2012 3:42 PM
  • The shackles did not thrust themselves upon him like a convicted prisoner abruptly.  No, instead they slithered up in an insidiously nonchalant manner.   At first he brushed at them not knowing their origin or their veracity.  A minor annoyance.  But they would not yield to his will, rather they slowly tightened their grip upon him the more he resisted.  As he became more aware of his new reality he had to ask.  How could this be? 

    For weeks he wrestled with the chains, challenging them with every aspect of his being.  Back and forth he thrust at them with every thought, every idea.  The shackles were relentless in their grip.  The ever growing laughter of their victory echoing thunderously.

    Alas, he had had enough.  He slumped down in defeat.  How could he accept this new reality?  He would have to learn to come to grips with his new future...

    And then he was RELEASED!

    With one great act of mercy Mr. Gao put an end to his torment....  HE WAS FREE AT LAST!

    Jack,

    Thank you for your note.  If you need anything from me please let me know (I can easily reproduce the behavior at will on our VM's).  I would like to ask please to keep me informed as to the status of the research.

    Regards,

    Steve

    Tuesday, July 03, 2012 4:13 PM
  • baileybb123,

    I agree with you that the April CU is not my issue.  We rolled back the VM to before the patch was applied and still see the same behavior.

    We only have a single web application running.  As for that app settings, it has Anonymous Access enabled, is using Windows Authentication as Basic Authentication only.  We have FBA enabled, and we have a Trusted Identity provider enabled.

    In addition we use an apache reverse proxy to funnel all requests to the system (external url uses https, the internal site url does not)

    The trusted identity provider is ADFS (using an ADFS proxy in the DMZ).  

    The actions in the browser that can cause the issue are simple page requests for various pages in the web application.  Any request appears to be able to trigger it.  We have not seen a pattern of times or exact pages.

    Using wireshark for our testing on the server, I can see requests being made with no problem and then suddenly we get the 302 with the cookie reset.  The failure would occur on a request that had previously succeeded a few seconds before. 

    Suspecting perhaps apache to be the issue, I have inspected the packets all the way down to the IP level and there was no difference whatsoever between the request that succeeded a little earlier and the one that failed.

    Users that are logged in via the Windows Basic Authentication never experience this issue.  Only the ADFS users.  The FBA account is for workflow so we really haven't done testing on that but have not seen a problem with workflow.

    The in-memory representation of the sessiontoken for the ADFS users appears identical to the Windows authenticated user (except of course their identity is different).  I say appears because I only looked at the two objects in visual studios debugger and really did not know what properties to compare.  The obvious ones seemed fine.

    Steve

    Tuesday, July 03, 2012 5:20 PM
  • have you considered both ADFS TokenLifeTime and LogonTokenCacheExpirationWindow? please read http://msdn.microsoft.com/en-us/library/hh446526.aspx for detail.
    Wednesday, July 04, 2012 8:01 AM
    Moderator
  • GuYuming thank you for your response.  We have played with those settings quite a bit with no help to our situation.  

    For one test we set the TokenLifeTime to say 12 minutes (LogonTokenCacheExpirationWindow was set to the original 10minutes still) and we would see a logout after two minutes as you would expect.  The logout was in the form of a 302 redirect with a Set-Cookie that would clear out the FedAuth cookie.

    Setting those values to realistic values such as TokenLifeTime as 960 and LogonTokenCacheExpirationWindow to one second does not help.  Randomly usually within 30minutes of clicking in the site we get the undesired logout (causing loss of work).  This logout looks identical in wireshark to the 302 redirect that happens when the token *does* expire (like during the test I just mentioned).

    Some pictures may help:

    Here is my AdfsRelyingPartyTrust, you can see the TokenLifetime is set to 960:

    https://docs.google.com/folder/d/0B6mexPEk0z68WHRfT2pZNEk1Z2s/edit?pli=1#docId=0B6mexPEk0z68WnhNZzh3SlNaYmc

    Here is my SPSecurityTokenServiceConfig, you can see my LogonTokenCacheExpirationWindow is set to 1 second:

    https://docs.google.com/folder/d/0B6mexPEk0z68WHRfT2pZNEk1Z2s/edit?pli=1#docId=0B6mexPEk0z68TU5lTW9laHRFLTQ

    Now, even more interesting would be the actual in-memory token representation.  Using the debugger I have two tokens displayed.  The first is the windows token that works perfectly.  We never get a time out with this token:

    https://docs.google.com/folder/d/0B6mexPEk0z68WHRfT2pZNEk1Z2s/edit?pli=1#docId=0B6mexPEk0z68QmdBRWhlUkhWUUk

    Here is the ADFS token in memory representation that WILL fail at random times:

    https://docs.google.com/folder/d/0B6mexPEk0z68WHRfT2pZNEk1Z2s/edit?pli=1#docId=0B6mexPEk0z68QWxkamlYcFZqMXc

    Finally, Jack Gao you had marked GuYuming's post as the answer.  My question is does that mean that there is not active research on this issue at Microsoft outside of this thread? 

    Thanks,

    Steve

    Thursday, July 05, 2012 3:07 PM
  • Steve,

    you describe exactly the Problem ive seen. Only Trusted Provider Sessions are affected.

    Did you try to 'touch' the AAM Settings in CA? I suspect this did it in my case - i did not change anything only clicked ok on the existing URL. The log wasnt full of AAM warnings - only one that brought me to this attempt.

    best regards

    Markus

    Thursday, July 05, 2012 3:43 PM
  • Hi Markus,

    I have seen your thread you posted earlier last month.  Thanks for the reply.  When I read the thread I did not understand exactly what you meant by "touched", but now I do. 

    When you say "The log wasnt full of AAM warnings - only one that brought me to this attempt", what log are you referring to can I ask?  The ULS Application log or the sharepoint Logs?  I do not see any error messages regarding that in logs for us.

    We are using AAM here (because of a reverse proxy) and this is our setup:

    https://docs.google.com/folder/d/0B6mexPEk0z68WHRfT2pZNEk1Z2s/edit?pli=1&docId=0B6mexPEk0z68S1hqcWZCOFF1LUU

    I went ahead and "touched" every one of the AAM's.  It is interesting because "touching" them actually deletes and recreates them as per the sharepoint LOG.  These entries were added to the log when I did the "touch" (truncated lines for readability):

    ...Information Attempting to delete Alternate Access Mapping https://encompass.iecokc.com in zone Default.  Web application affected: Encompass. Current user: IECOKC\Callahan. ac04ffc9-cf46-4487-af96-a6d241a05c8d

    ...Information Attempting to add Alternate Access Mapping https://encompass.iecokc.com in zone Default.  Web application affected: Encompass. Current user: IECOKC\Callahan. ac04ffc9-cf46-4487-af96-a6d241a05c8d

    ...Medium   Updating SPPersistedObject SPAlternateUrlCollection Name=Encompass. Versi...

    ...Medium   ...em.Web.UI.Page.ProcessRequest(Boolean includeStagesBeforeAsyncPoint, B...

    ...Medium   ...System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper(IntPtr managedHttpContext, I...

    ...Information Alternate Access Mapping successfully updated.  Web application affected: Encompass. Current user: IECOKC\Callahan. ac04...

    So as you can see, "touching" the AAM by simply clicking on it then hitting OK does do something! 

    After doing the touches, I was able to reproduce our re-authenticate problem again.  So, unfortunately, this does not seem to fix our issue.  I will continue to play with this idea (using different iterations of "touches", etc) to see if I can figure out some magic permutation.

    For the failure I had after performing the "touches", this is the sharepoint logs that show our page request and subsequent re-authentication request (they do not show any sort of problem or error, just showing how a request is made to a generic page on our site and the next request is to re-authenticate again):

    07/05/2012 11:21:59.69  w3wp.exe (0x1970)                        0x1C38 SharePoint Foundation          Logging Correlation Data       xmnv Medium   Name=Request (GET:https://encompass.iecokc.com:443/_layouts/CincomAcquireApplications/GenericMasterDocumentPortalPage.aspx?args=143FDDFBABEBCFAA248B4859EC1D62318CF4677817D618AB01FC60FDEB23944B4446AE8C7124276DF53921702B76EE0D6E8A5D05A5D7F9B82D6D5630CE8A0E96CFB5858AF80E0F047A920BE9196462FA134F6D314DE6AD028B77CC406613BE0D0409D030B0CE73AF4DF1C0270C2554840D245E5CEC0578BDF8E8928520232806DC98D66E4214FD26385AD067C1E234C36C966C0EA27326BE68963FB961EB08EBF5D2D2221202E7892B8F1EE4660A6EED00038DAF0886D316DE484DB8D040BCF707C5CFA6439274698B89B62C5FB9A3C7062B8B008E12D1615D3A90E8A4795D6DB8A45618D13DB7CB78A1F0FFCC7B21589AE3AB5BF22E231F7E7B90915E7A397188942F67C76D78D7B6E1C35599B8AEB1C327F1DD2157917DAED120C8DBDF754C5361444E544C1AC25FE13EC5EE280BDE3A39BE35EA327A5029FD10173D3879841F556913C6691E00C5BED0F8DA3F5AC8 aeec5e0d-d648-4663-b26a-2d529a6c1d02
    07/05/2012 11:21:59.70  w3wp.exe (0x1970)                        0x1C38 SharePoint Foundation          Monitoring                     b4ly Medium   Leaving Monitored Scope (Request (GET:https://encompass.iecokc.com:443/_layouts/CincomAcquireApplications/GenericMasterDocumentPortalPage.aspx?args=143FDDFBABEBCFAA248B4859EC1D62318CF4677817D618AB01FC60FDEB23944B4446AE8C7124276DF53921702B76EE0D6E8A5D05A5D7F9B82D6D5630CE8A0E96CFB5858AF80E0F047A920BE9196462FA134F6D314DE6AD028B77CC406613BE0D0409D030B0CE73AF4DF1C0270C2554840D245E5CEC0578BDF8E8928520232806DC98D66E4214FD26385AD067C1E234C36C966C0EA27326BE68963FB961EB08EBF5D2D2221202E7892B8F1EE4660A6EED00038DAF0886D316DE484DB8D040BCF707C5CFA6439274698B89B62C5FB9A3C7062B8B008E12D1615D3A90E8A4795D6DB8A45618D13DB7CB78A1F0FFCC7B21589AE3AB5BF22E231F7E7B90915E7A397188942F67C76D78D7B6E1C35599B8AEB1C327F1DD2157917DAED120C8DBDF754C5361444E544C1AC25FE13EC5EE280BDE3A39BE35EA327A5029FD10173D3879841F556913C669... aeec5e0d-d648-4663-b26a-2d529a6c1d02
    07/05/2012 11:21:59.70* w3wp.exe (0x1970)                        0x1C38 SharePoint Foundation          Monitoring                     b4ly Medium   ...1E00C5BED0F8DA3F5AC8E45BC4F76A37A45A1224BC02A8F536505496D0FD1D535E553C327610618CD3267BA855AD6896DBA51A00CD618CC2B238E2DDD675FD8D34AF8E22E821D3799E505C5B9F142B383026DEF9F07AA8FB577A)). Execution Time=3.9361 aeec5e0d-d648-4663-b26a-2d529a6c1d02
    07/05/2012 11:21:59.70  w3wp.exe (0x1970)                        0x1C38 SharePoint Foundation          Logging Correlation Data       xmnv Medium   Site=/ 
    07/05/2012 11:21:59.97  w3wp.exe (0x1970)                        0x1C38 SharePoint Foundation          Monitoring                     nasq Medium   Entering monitored scope (Request (GET:https://encompass.iecokc.com:443/_layouts/Authenticate.aspx?Source=%2F%5Flayouts%2FCincomAcquireApplications%2FGenericMasterDocumentPortalPage%2Easpx%3Fargs%3D143FDDFBABEBCFAA248B4859EC1D62318CF4677817D618AB01FC60FDEB23944B4446AE8C7124276DF53921702B76EE0D6E8A5D05A5D7F9B82D6D5630CE8A0E96CFB5858AF80E0F047A920BE9196462FA134F6D314DE6AD028B77CC406613BE0D0409D030B0CE73AF4DF1C0270C2554840D245E5CEC0578BDF8E8928520232806DC98D66E4214FD26385AD067C1E234C36C966C0EA27326BE68963FB961EB08EBF5D2D2221202E7892B8F1EE4660A6EED00038DAF0886D316DE484DB8D040BCF707C5CFA6439274698B89B62C5FB9A3C7062B8B008E12D1615D3A90E8A4795D6DB8A45618D13DB7CB78A1F0FFCC7B21589AE3AB5BF22E231F7E7B90915E7A397188942F67C76D78D7B6E1C35599B8AEB1C327F1DD2157917DAED120C8DBDF754C5361444E544C1AC25FE13EC5EE... 
    07/05/2012 11:21:59.97* w3wp.exe (0x1970)                        0x1C38 SharePoint Foundation          Monitoring                     nasq Medium   ...280BDE3A39BE35EA327A5029FD10173D3879841F556913C6691E00C5BED0F8DA3F5AC8E45BC4F76A37A45A1224BC02A8F536505496D0FD1D535E553C327610618CD3267BA855AD6896DBA51A00CD618CC2B238E2DDD675FD8D34AF8E22E821D3799E505C5B9F142B383026DEF9F07AA8FB577A)) 
    07/05/2012 11:21:59.97  w3wp.exe (0x1970)                        0x1C38 SharePoint Foundation          Logging Correlation Data       xmnv Medium   Name=Request (GET:https://encompass.iecokc.com:443/_layouts/Authenticate.aspx?Source=%2F%5Flayouts%2FCincomAcquireApplications%2FGenericMasterDocumentPortalPage%2Easpx%3Fargs%3D143FDDFBABEBCFAA248B4859EC1D62318CF4677817D618AB01FC60FDEB23944B4446AE8C7124276DF53921702B76EE0D6E8A5D05A5D7F9B82D6D5630CE8A0E96CFB5858AF80E0F047A920BE9196462FA134F6D314DE6AD028B77CC406613BE0D0409D030B0CE73AF4DF1C0270C2554840D245E5CEC0578BDF8E8928520232806DC98D66E4214FD26385AD067C1E234C36C966C0EA27326BE68963FB961EB08EBF5D2D2221202E7892B8F1EE4660A6EED00038DAF0886D316DE484DB8D040BCF707C5CFA6439274698B89B62C5FB9A3C7062B8B008E12D1615D3A90E8A4795D6DB8A45618D13DB7CB78A1F0FFCC7B21589AE3AB5BF22E231F7E7B90915E7A397188942F67C76D78D7B6E1C35599B8AEB1C327F1DD2157917DAED120C8DBDF754C5361444E544C1AC25FE13EC5EE280BDE3A39BE35EA327A5 956692ef-f702-4b6d-b40e-c6d19825c79f

    Steve


    Thursday, July 05, 2012 4:46 PM
  • Hi Steve,

    In the last 3 weeks i still had a few users complaining about "data loss while editing". Reauthentication through Browser-Redirect in a CBA Web Application may lead to data loss -  this is by design.

    However i could never observe uexpected authentication prompts. Instead my FedAuth cookies survived their 8 hours validity most of the time.

    The July 2012 CU seems to address an "unexpected authentication prompt" Problem: (Forms not Claims but..)

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;2598373

    "... Consider the following scenario:
    You create nine task lists on a team site that uses forms-based authentication.
    You move from task list one to task list nine one by one.
    You go to the Home page of the team site.
    In this scenario, you are prompted unexpectedly for authentication...."

    Guess what: in my Farm (CBA with trusted IP) i can reproduce this behavior on both Web Apps. In my case the reauthentication occured after creating the 6th Task List.

    This week i wont have the time but i will try and install it asap.

    regards

    Markus

    Thursday, July 05, 2012 8:56 PM
  • Here is my request that is made using a valid FedAuth token that should not expire for many hours:

    https://docs.google.com/folder/d/0B6mexPEk0z68WHRfT2pZNEk1Z2s/edit?pli=1#docId=0B6mexPEk0z68OTdDQjJVc3NQbFk

    And the response that clears my FedAuth cookie making me re-authenticate

    https://docs.google.com/folder/d/0B6mexPEk0z68WHRfT2pZNEk1Z2s/edit?pli=1&docId=0B6mexPEk0z68ZjYyQjdIUU9rRTQ

    Steve

    Thursday, July 05, 2012 8:57 PM
  • My understanding is that the request comes with a FedAuth cookie which is not expired, but still there is a redirect to ADFS.

    According to http://msdn.microsoft.com/en-us/library/ee517293.aspx and http://blogs.msdn.com/b/besidethepoint/archive/2012/05/10/ws-federation-authentication-module-wsfam-and-sharepoint-extensions.aspx , reading the authentication cookie and redirect is done by SPSessionAuthenticationModule and SPFederationAuthenticationModule, which inherit SessionAuthenticationModule and WSFederationAuthenticationModule respectively.  

    So, i would suggest you to attach to SharePoint w3wp process in VS2010, setting breakpoints in methods in those modules and debug with the help of .net reflector.

    Friday, July 06, 2012 5:13 AM
    Moderator
  •  

    I noticed in the screen shot that in the get-spsecurityTokenServiceConfig you have the logonTokenCacheExpirationWindow set to 1 minute. Have you tried increasing that value to the default 10 minutes like it says here? http://msdn.microsoft.com/en-us/library/microsoft.sharepoint.administration.claims.spsecuritytokenservicemanager.logontokencacheexpirationwindow.aspx

    I am just wondering if that switch is setting them to expired immediately.


    Do you have any Load Balancers in your farm that could have settings on them that limit the timeout session?
    Monday, July 09, 2012 9:48 PM
  • GuYuming,

    Thank you for your suggestions but after authentication is performed there are no more redirects to ADFS.  The browser has its FedAuth cookie and that is that. 

    I can see the cookie in the debugger.  I can continually re-issue the cookie in a code event handler (token received).  After reading the patch message for the soon to be official June 2012 CU (?I'm guessing here)

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;2598373

    It appears that there are situations where an internal module of the sharepoint process can fail and thus lose its internal data representation of valid tokens.  Again this is a guess but certainly would explain the patch message and would explain how I could see the behavior I see.

    BTW, applying the patch did not solve the issue. 

    steve



    Tuesday, July 10, 2012 2:41 PM
  • baileybb123,

    My original logonTokenCacheExpirationWindow was set to 10minutes before we started playing around with different values to see if it made any difference.  The actual value it shows in my screenshot is 1 second as per some advise given on msdn sites.

    They also are not expiring immediately.  It is randomly usually within 30minutes.  We have no load balancers, just a single server farm.

    Thanks,

    Steve

    Tuesday, July 10, 2012 2:43 PM
  • you seems to be implementing sliding sessions, but usually, you do this by subclassing HTTPApplication, why did you put your code in a Page_init event?

    please read http://blogs.msdn.com/b/scicoria/archive/2011/06/15/hack-forcing-fba-token-refresh-against-spclaimprovider-with-no-credential-challenge.aspx and http://msdn.microsoft.com/en-us/library/hh446526.aspx#sec7

    Wednesday, July 11, 2012 4:36 AM
    Moderator
  • We are not implementing sliding sessions.  We set our session for 960minutes (16 hours).  I was playing around with trying to see what could possibly fix our spurious timeouts (usually within the first 30minutes) by having a new token re-created from the current principal and re-issuing the cookie on every single http request.

    Sure enough in wireshark you can see the Set-Cookie: FedAuth ....  would be in every single response coming from sharepoint with a new fresh 16 hour cookie.

    To no avail sooner or later we still get a 302 redirect to re-authenticate and that redirect has a destroy instruction (Set-Cookie: FedAuth=; expires=Thu, 01-Jan-1970 06:00:00 GMT) for the FedAuth cookie to the browser.

    I hate to say it but all signs point to us hitting on some obscure bug inside the microsoft code.

    Thursday, July 12, 2012 2:09 PM
  • i had put some screenshot to help understand: https://skydrive.live.com/#!/view.aspx?cid=6F40FB61D28CF147&resid=6F40FB61D28CF147%211176

    you are right about the phenomenon you described. but it cannot be called a bug. You can find the following description for step 4 in figure 4 in http://msdn.microsoft.com/en-us/library/hh446526.aspx :

    • SharePoint uses the session lifetime and the LogonTokenCacheExpirationWindow property to determine when the user must sign in again. At this point, SharePoint determines that the session has expired, so it begins the sign-in process again. If the ADFS SSO cookie has expired, Rick will have to enter his credentials to obtain a new SAML token.
      Hh446526.note(en-us,PandP.10).gifNote:

      To force users to re-enter their credentials whenever they are redirected back to ADFS, you should set the web SSO lifetime in ADFS to be less than or equal to SAMLtokenlifetime minus the value of LogonTokenCacheExpirationWindow. In the Adatum scenario, the web SSO lifetime in ADFS should be set to eight minutes to force users to re-authenticate when SharePoint redirects them to ADFS.

    The same article has suggested a sliding session solution similar to your fixsession code . What's your current situation after implemant the sliding session solution.

    Update: my test result is that after deploying the SlidingSessionModule sample in http://claimsid.codeplex.com/ , the FedAuth cookie is renewed without redirected to /_layouts/authentication.aspx




    Friday, July 13, 2012 9:39 AM
    Moderator
  • Yes, session duration (FedAuth Cookie) in a CBA Environment depends on the lifetime of the SMAL Token plus configuration of the sp securitytokenservice.

    But we are talking here about random session loss.

    I also experienced this behavior and while it currently seems to have improved after a 'AAM touch' i still have few users complaining about wiki edit data loss.

    The July 2012 CU adresses an Issue "unexpected authentication prompt" which i can reproduce in my farm. The '9 Task List Problem'.

    Interesting: I could reproduce the  '9 Task List Problem' only with Internet Explorer (both - trusted sites or Internet Zone). Not Firefox or Chrome.

     

    Friday, July 13, 2012 10:23 AM
  • Hi GuYuming,

    Thank you for that great explaination that you gave.  I did learn a couple of things that I did not previously know:

    1. the TokenLifetime = 0 (the default) actually means 1 hour!  (somebody should document that somewhere)

    2. How easily you can view the cookies and their expiration in the F12 window.

    That being said, I think there is some confusion regarding the sliding sessions and my fixsession code:

    1. We are *not* implementing sliding sessions.

    2. The only reason I put that Fixsession code was just to experiment with different attempts to fix my original problem.  I apologize for the confusion that caused.

    3. We basically do not care about sliding sessions, we simply want the session to last for 8 hours.

    Original problem is we have these settings:

    ADFS SsoLifetime: 480

    TokenLifetime: 480

    LogonTokenCacheExpirationWindow: 1second

    UseSessionCookies=False

    If you look at the cookie it looks great!  The expiration is 8 hours in the future as you expect, all seems well!  HOWEVER.... at some random interval waaayyyy before the 8 hours is up, we get a redirect that is identical to path taken when the cookie actually does expire (/_layouts/Authenticate.aspx).  But it is not set to expire for MANY more hours!  No rhyme or reason as to what causes the issue, it is never performing the same action, etc. 

    as wehr pointed out earlier:

    The July 2012 CU adresses an Issue "unexpected authentication prompt" which i can reproduce in my farm. The '9 Task List Problem'

    Well that "unexpected authentication prompt" is identical to what we see.   It is something that is unexpected because all our settings and the cookie, etc all point to the session should be valid for many more hours.  That is the reason why I had stated before that perhaps there is an additional opportunity inside the microsoft code (read: bug) that could cause the "unexpected authentication prompt" that we are seeing.

    I installed that patch but still see the same behavior unfortunately, prompting me to think there was an additional issue (bug) that could be addressed (patched).

    Regards,

    Steve

    wehr: Do you have windows authentication as well as the adfs trusted provider?  or are you doing windows authentication through adfs as well?  If the former, are you seeing this issue with your windows users?  (we are not)

    Friday, July 13, 2012 7:11 PM
  • Hi Steve,

    I have Windows Authentication (Account like  i:0#.w|MyDomain\wehr ) an could not reproduce the  '9 Task List Problem' here.

    Only With IE8  XP / or IE9 Win7 and Trusted Provider (Account like i:0ǵ.t|My-id|wehr) i can reproduce everytime. In kb2598373 it reads it would happen with Forms Authentication.

    There appears nothing in the ULS Log when premature Authentication happens - just the request with

     authenticate.aspx.

    regards

    Markus

    Friday, July 13, 2012 10:00 PM
  • We have the same issue. No problem with SSO.
    But when we use ADFS form login from external, we get random re-authenticaiton issue ebfore the session expired. pls help us.
    Tuesday, July 24, 2012 2:43 AM
  • Guys please refer to this post by Steve Peschka where he points out Load Balancing Sticky Sessions that could very well be the culprit of your issue.

    http://blogs.technet.com/b/speschka/archive/2011/10/28/make-sure-you-know-this-about-sharepoint-2010-claims-authentication-sticky-sessions-are-required.aspx

    Sunday, August 26, 2012 2:05 AM
  • SharePoint 2013 use appFabric distributed cache service to cache login tokens, so it won't be problem now.
    Monday, August 27, 2012 2:39 AM
    Moderator
  • We have a single server farm.  I believe this would not apply to us.  BTW.  We still have this issue!  :(

    Regards

    Steve

    Thursday, August 30, 2012 11:37 PM
  • I am having the same issues as Steven with the FedAuth cookie expiring prematurely.

    I have a 2010 farm consisting of 4 servers, 2 of which are WFE. The 2 WFE servers are load balanced with sticky sessions.

    I had started off with having an 8 hour session and have changed it to 88 minutes to be in line with the sticky session cookie of the load balancer which is set to 90 minutes.

    I have a custom login page which is also tracking user logins by writing WFE machine name where user logged in into a database table. I am seeing that the load balancer is not switching the users WFE when session has timed out. Which means its not a load balancer/sticky session issue as user is getting routed correctly. I also have a response header that tells me which server responded and I can see in the browser that my WFE has stayed the same when session timed out.

    It seems to happen when there are a lot of users on the servers. I don't see any app pool recycles. For some reason, SharePoint decides that session is invalid and clears the FedAuth cookie.

    Thanks,

    Nilesh

    Wednesday, September 19, 2012 4:52 PM

  • Ok our tests are in now.  This is what we see though testing:

    MaxServiceTokenCacheItems = 250
    MaxLogonTokenCacheItems = 250
    Result = Booted in 5-10 minutes

    MaxServiceTokenCacheItems = 10000
    MaxLogonTokenCacheItems = 10000
    Result = Cannot break +30 minutes

    MaxServiceTokenCacheItems = 25
    MaxLogonTokenCacheItems = 25
    Result = Booted quickly (1-2 minutes)


    MaxServiceTokenCacheItems = 25
    MaxLogonTokenCacheItems = 10000
    Result = Cannot break (40  minutes)

    MaxServiceTokenCacheItems = 10000
    MaxLogonTokenCacheItems = 25
    Result =  Booted (1-3 minutes)

    What we have here is what would appear to be something that alleviates our problem...  MaxLogonTokenCacheItems being set to a higher value is providing relief.  I am not saying this is a solution until I hear from SOMEONE AT MICROSOFT as to why this is or whether it is a valid thing to do.

    Microsoft states about MaxLogonTokenCacheItems: The strong cache keeps the most recently used items to guarantee the token is alive during the life of the request. The weak cache can release resources by garbage collection under memory pressure.

    Until I have the following questions answered I consider this a temporary fix:

    1. What does it mean for us to make it 10000?  Does it mean that we will start experiencing issues that are memory related in another area?  

    2. Why does this only appear for our ADFS users and not our Active Directory/Windows Auth users, or forms based users?

    3. Is setting the value to 10000 simply covering up a real bug that we may see in an upcoming patch?

    If someone at Microsoft could answer these question for us we would so appreciated it. :)

    Steve

    Thursday, September 20, 2012 10:41 PM
  • Hello Steven,

    As for your questions:

    1) What does it mean for us to make it 10000? …

    This means that SharePoint keeps up to 10'000 logon tokens in its cache and keeps them cached even when garbage collection runs and tries to free memory.

    (Oldest) tokens are removed from cache under the following conditions:

      • There are more than SPSecurityTokenServiceManager.MaxLogonTokenOptimisticCacheItems logon tokens in the cache.
      • There are more than 10'000 (or any other value of SPSecurityTokenServiceManager.MaxLogonTokenCacheItems) logon tokens in the cache, but less than SPSecurityTokenServiceManager.MaxLogonTokenOptimisticCacheItems, and garbage collection runs to free more memory.
      • Tokens are expired or older than the timeout specified by SPSecurityTokenServiceManager.FormsTokenLifetime­­­­ (for tokens issued by forms based IdP’s) or SPSecurityTokenServiceManager.WindowsTokenLifetime (for tokens issued by Windows authentication based IdP’s).

    As far as I know, there is no way in SharePoint 2010 to monitor when tokens are removed from the cache and for which reason.

    1. … Does it mean that we will start experiencing issues that are memory related in another area?

    That might be possible. The token cache caches incoming SAML tokens. These tokens are typically > 5’000 Bytes in uncompressed plain-text format and depending on the number of claims your AD FS implementation returns to SharePoint might even be much larger, up to tens or even hundreds kBytes per token. You can check the form post when your browser returns from AD FS to the SharePoint /_trust/ path to determine the size of the token.

    I don’t know the format in which the cache stores these tokens internally. Assuming it is the same format as the incoming token, and the incoming tokens are 10 kBytes in size and cache overhead (for storing additional information such are key, expiration time etc.) is negligible, caching 10’000 tokens would imply a storage requirement of ca. 100 MBytes in RAM. Assuming your Operating System has enough free memory, you could increase the overall RAM available to your web application by 100 Mbytes to compensate for the larger cache. You can also monitor you web application for frequent garbage collection with little freed memory, low available free memory and out-of-memory messages to determine if your web application has enough memory allocated.

    2. Why does this only appear for our ADFS users and not our Active Directory/Windows Auth users, or forms based users?

    The SPSecurityTokenServiceManager manages only Federated Authentication tokens:

    When a user authenticates via Federated Authentication (such as via AD FS), a SAML token is posted to the SharePoint /_trust/ path. SharePoint then puts the incoming SAML token in its token cache and returns a FedAuth cookie containing the cache key to the SAML token to the browser. The browser passes the key to SharePoint on each succeeding web request. SharePoint uses the key to retrieve the SAML token from its cache and then uses the token for authorization within SharePoint. If the token can’t be retrieved from the cache or it is (almost) expired, SharePoint redirects back to the token issuer (AD FS) so the user can obtain a new token.

    The token is stored in the cache instead of directly in the cookie for performance reasons: sending a large (> 5000 Bytes) token back and forth from browser to SharePoint is much less efficient than sending the key (a few hundred Bytes in size) back and forth.

    Credentials are managed and cached differently for users that authenticate with Active Directory or forms. These users don’t get the FedAuth cookie and SharePoint doesn’t need to look up the key in its SPSecurityTokenServiceManager Token Cache.

    Notice that SharePoint 2013 will have a distributed token cache that eliminates several problems related to token caching in SharePoint 2010, including optimized memory management and monitoring of removals of tokens from the cache.

    3. Is setting the value to 10000 simply covering up a real bug that we may see in an upcoming patch?

    Increasing the number of tokens to be cached makes sense if you have many users simultaneously accessing your SharePoint farm. The default values might be too small for the number of simultaneous users your farm accommodates. If tokens have removed from cache when a user returns, the user is redirected to AD FS to obtain a new token. This might be transparent to the user if his session on the IdP is still valid. Otherwise, he might have to provide his credentials to the IdP again.

    Under normal circumstances, SharePoint “remembers” the HTTP GET request that caused the redirect to AD FS and the user continues in SharePoint as if the redirect hadn’t occurred. This is NOT the case for a HTTP POST (e.g., submission of a form), since the POST data isn’t preserved during the redirect. Hence the loss of work your users experience occasionally when the cache setting is too small.

    Regards,

    Beat Nideröst

    Some background info:

    Overview of Federated Identity for SharePoint Applications: http://msdn.microsoft.com/en-us/library/hh446526.aspx

    Garbage collection and weak references: http://msdn.microsoft.com/en-us/magazine/bb985011.aspx

    Distributed Cache in SharePoint 2013: http://technet.microsoft.com/library/jj219700(office.15).aspx

    • Proposed as answer by Beat Nideröst Friday, November 30, 2012 12:20 PM
    Friday, September 28, 2012 11:10 AM
  • I suspected it is connected to AAM.

    Sharepoint 2013 is supposed to give mor logging info on SAML Claims authentication.

    Recently i did some testing with a SharePoint 2013 foundation install and made this observation:

    - using saml claims and a custom STS.

    - in AAM i only configured one URL : https://sp.domain.com

    at one point there was an infinite logon redirect - loop.

    The redirect-loop in the ULS:

    xmnv Medium   Name=Request (POST:https://sp.domain.com:443/_trust/)
    adyrv High     Cannot find site lookup info for request Uri http://sp/.
    agb9s Medium   Non-OAuth request. IsAuthenticated=False, UserIdentityName=, ClaimsCount=0
    nasq Medium   Entering monitored scope (Request (POST:https://sp.domain.com:443/_trust/default.aspx)).

    Solution: i added http://sp as internal URL in AAM.

    Friday, November 30, 2012 11:05 AM
  • Hello Wehr,

    What issue are you responding to?

    The original issue described random 302 redirects even with a valid FedAuth cookie in a SharePoint 2010 farm. I've seen several SharePoint 2010 farms where this was caused by too small token cache sizes given the large number of concurrent users on those farms. Increasing the token cache size offers relieve, as Steven Callahan answered before.

    It seems from your description that you were experiencing a different issue. The redirect-loops you experienced on a SharePoint 2013 farm seem to be related to incorrect configuration of the Alternate Access Mappings  indeed.

    However, with a large number of concurrent users on a SharePoint 2010 farm and small token cache sizes, one might experience random 302 redirects even when the Alternate Access Mappings are configured correctly.

    On a SharePoint 2013 farm, the logon token cache now is a "Distributed Logon Token Cache" (http://technet.microsoft.com/library/jj219700(office.15).aspx) and SharePoint 2013 can log events when tokens are dropped from the cache.

    Regards,

    Beat Nideröst

    • Proposed as answer by Beat Nideröst Friday, November 30, 2012 12:20 PM
    Friday, November 30, 2012 12:19 PM
  • Hallo Beat,

    i experienced "random 302 redirects even with a valid FedAuth cookie" months ago with a very small SP2010 Farm with no more than a few hundert users. (only a few of them concurrent)

    Web Application will re-authenticate to soon with custom STS and valid FedAuth cookie

    The Problem went away somehow. i related it to AAM.

    You say "incorrect configuration" of AAM. Does that must mean i must add e.g.  http://sp always even if i only use one URL for all like https://sp.company.com ?

    Do you know some Docu about his?

    thanks

    Markus

    Friday, November 30, 2012 12:54 PM
  • Hello Markus,

    The alternate access mappings are needed to tell SharePoint which absolute URLs to render for a certain client. SharePoint allows for up to 5 external URLs (one for each zone: default, internet, extranet, intranet, custom). With each external URL, you can associate one or more internal URLs.

    When SharePoint receives a request on a SharePoint front-end server for a certain (internal) URL, SharePoint looks up the external URL it is associated with and uses it to render absolute URLs in it's replies based on that URL. This way, SharePoint can generate the correct external URLs that users can use in their web browsers, even when load balancers or reverse proxy publishing rewrite the urls in incoming requests so that SharePoint receives alternate, internal URLs on its front-end servers.

    See TechNet for more details, e.g. http://technet.microsoft.com/en-us/library/dd903064.aspx

    Regards, Beat Nideröst

    Friday, December 21, 2012 1:06 PM
  • Hello Beat,

    thanks.

    Also its possible to have different authentication configurations per zone. (trusted ip, windows, anonymous...)

    Now i did a setup like this (SP2013 Foundation):

    1. create web-app http://name 

    default zone, Auth:Windows and trusted IP. this zone will be used for search crawling.

    2. extend web app to https://name.domain.edu

    custom zone, auth: trusted IP only. Users only use this. Without custom code its possible to prevent the display of the authentication selector for users.

    I hope this is the supported way do do this. So far i havent seen any problems with premature FedAuth Cookie invalidation. (The redirect loops i saw may have been result of my testing envoronment)

    With the 2010 Installation i did not choose the (supported?) way through extending the WebApp but rather added the 443 binding in IIS Manager. I still would not rule out the possibility that the occasional FedAuth Cookie problems were connected to AAM.

    SP2013 brings a new setting: CookieLifetime in SPSecurityTokenServiceConfig with a default of 5 days.

    I havent found any documentation for this - can tell that i found sessions lasting longer than the actual validity of the saml token. Feature or bug? but thats another topic.

    best regards

    Markus

     

    Friday, December 21, 2012 2:35 PM