none
I want to select more than one key column in Time Series algorithm,its possible? RRS feed

  • Question

  • HI All

    am using time series algorithm for predict the crime 

    so far i completed in following details ,

    input data

    Date-KeyTime

    CFS-Key

    CountofIncident as a Predict

    Based on  i got  particular date ,what is the crime ,how many incident  going to happen ,like that i have complete.

    but now i want to add one more key Columns,that is PatrolArea .

     I tried PatrolArea Also  One of the Key colyumn.but i got error(one key Column only possible).

    so what i can do ?

    i want to select another one key columns..

    i want result like this:

    Date PatrolArea CFS   Howmanys

    2012-04-01 PatrolArea1  Asulted 10

    like this i want ,plz let me know 


    pandiyan


    • Edited by Pandiyanpvp Tuesday, March 31, 2015 3:21 PM
    Tuesday, March 31, 2015 3:20 PM

Answers

  • You should understand that since we have added another one key to the Key Column Collection this Key Column became a composite key. It still may have the same name but it is represented for model input as two different columns. And to make a meaningful representation for you, you specify the Name Column, in our case it just combines CFS and PA in one string and let us understand what the model is talking about.

    Wish you knew Russian, I would explain it much faster :)

    As for clients. There is one difficult moment, I'm working in the banking area and know nothing about crimes you are researching. In my situation, we often predicts the fact of some action. For example, whether the client is going to buy a particular product or not. Of course, even if model predicts he is, it doesn't give us 100% guarantee! It gives us a probability of this action. This probability is based on the most frequent cases/patterns/behavior etc. And the only way to prove it - to test in real world and assess the accuracy of your model prediction. If prediction data differs from the real data with 5-10% deviation, it's ok for small numbers you mentioned in your example. And if there are some possible ways your customer may act having this knowledge, it's great. He may have some KPIs, for example, the time of reaction when the information about a crime appears. This may depends on how many operating groups are in the area of the crime at this moment. If you predict that tomorrow there will be more crimes then average, they can send more people to be ready to react. Thus the time of reaction may decrease.

    Something that comes to my mind without background data of what you are researching.


    Andrey V Artemyev | Saint-Petersburg, Russia

    • Marked as answer by Pandiyanpvp Friday, April 3, 2015 11:41 AM
    Friday, April 3, 2015 11:23 AM

All replies

  • Hi, try the following:

    Go to the Mining Structure and select the current Key Column. Select it in the left window where the hierarchy of the structure is, not in the Data Source View area.

    Go to the Properties (F4) and find General section. There is KeyColumns property. Now you should have CFS here. At the right hand of the name of CFS column there is a button with 3 dots. Click it.

    Click Add in the opened window and then click OK.

    Now there is a word Collection instead of CFS. Click again 3 dots button near New Binding (WChar).

    You can see a new window where you can assign an existing column to the new Key Column. Choose

    Binding Type = Column Binding

    Source Table = the name of your table

    Source Column = PatrolArea

    Congrats, you have a complex key column.


    Andrey V Artemyev | Saint-Petersburg, Russia

    Wednesday, April 1, 2015 6:36 AM
  • HI Andrey..

    thanks for you reply,

    as per your reply i did, see below screen short.

    then i tried to process, it completed sucessfully,

    but i did not get any difference in result.

    see this screeen ...

    plz andrey if you know let me know..


    pandiyan


    • Edited by Pandiyanpvp Wednesday, April 1, 2015 10:41 AM
    Wednesday, April 1, 2015 10:39 AM
  • Ah, that's another part of the question. :)

    As you can seem you have 5 results for every CFS. I suppose there are 5 different PAs but you can't see them. That's the matter of NameColumn property.

    Name Column can't be represented as a collection of columns, thus it can't be complex. Assuming you want to have one row per every CFS and every PA, you want to see something like this:

    Burglary.PA1 4/5/2015

    Burglary.PA2 5/5/2015

    etc.

    There are 2 steps.

    1. We have to create a Named Calculation which is in fact just a calculated field in a Data Source View table.

    Go to the Data Source Views in Solution Explorer and double click your DSV.

    Right click on the table's header (where you can see the name of the table) and click New Named Calculation...

    Fill in the form like this:

    If expression doesn't work (e.g. returns an error), try to CAST or CONVERT the values to the data type which is suitable for string concatenation (e.g. (n)(var)char).

    If it still fails, try to explicitly specify a table name, I mean [Next 30 days].[PA] instead of just [PA].

    2. Now we have a column which shows us CFS and PA divided by "." sign.

    The only thing we have to do is to assign it to our KeyColumn. In other words, we have to let a model know what fields to give as input key fields and which field to show a user instead of the actual key fields. Since every pair of CFS and PA is unique, correspondingly, every Named Calculation will uniquely identify this pair.

    Go again to the mining structure, but this time find Misc section in Properties. There is NameColumn property, click 3 dots button and choose the Named Calculation we have just created.

    Reprocess the model and check whether the result is what you want it to be.


    Andrey V Artemyev | Saint-Petersburg, Russia


    Wednesday, April 1, 2015 11:21 AM
  • Thanks Andrey,

    what ever you told the 1st step, that I have tried already.

    let me  confirm one thing with you ,will  it gives the exact prediction what i am expecting?

    or Please suggest me if u know any other ways other than this.

    thanks ,

    pandiyan.


    pandiyan

    Wednesday, April 1, 2015 1:43 PM
  • What about training model? Have it been trained correct? You show me the result of the prediction query. Can you show the query itself? Maybe join is incorrect.

    Andrey V Artemyev | Saint-Petersburg, Russia

    Wednesday, April 1, 2015 1:53 PM
  • This is my Source Query :

    SELECT        Date, PACFS, Incidents
    FROM            (SELECT        DD.Date, (CASE WHEN DL.PatrolAreaDesc IS NULL THEN 'No PatrolArea' ELSE DL.PatrolAreaDesc END) + ' -- ' + (CASE WHEN DI.HighestCFSClassDesc IS NULL 
                                                        THEN 'No Crimes' ELSE dI.HighestCFSClassDesc END) AS PACFS, COUNT(FI.IncidentID) AS Incidents
                              FROM            DimDate AS DD LEFT OUTER JOIN
                                                        FactIncident AS FI ON DD.DateKey = FI.DateKey LEFT OUTER JOIN
                                                        DimIncident AS DI ON FI.IncidentID = DI.IncidentID LEFT OUTER JOIN
                                                        DimLocation AS DL ON FI.LocationKey = DL.LocationKey LEFT OUTER JOIN
                                                        DimTime AS DT ON DT.TimeKey = FI.TimeKey
                              WHERE        left(dd.DateKey,4) Between 2010 and left(Getdate(),4)
                              GROUP BY DD.Date, (CASE WHEN DL.PatrolAreaDesc IS NULL THEN 'No PatrolArea' ELSE DL.PatrolAreaDesc END) + ' -- ' + (CASE WHEN DI.HighestCFSClassDesc IS NULL 
                                                        THEN 'No Crimes' ELSE dI.HighestCFSClassDesc END)) AS a
    WHERE        (Date <= GETDATE())

    This is my Predcition Query what i am getting from BIDS:

    SELECT flattened
      [Next30 Days].[PACFS],
      PredictTimeSeries([Next30 Days].[Incidents],5)
    From
      [Next30 Days]

    Note: I am Create mining structure using SSAS Wizard.


    pandiyan

    Wednesday, April 1, 2015 2:10 PM
  • Hi,

    what if you change your prediction query to:

    SELECT flattened
       [Next30 Days].[PA],
       [Next30 Days].[CFS],
       PredictTimeSeries([Next30 Days].[Incidents],5)
     From
       [Next30 Days]


    Andrey V Artemyev | Saint-Petersburg, Russia

    Thursday, April 2, 2015 9:56 AM
  • HI andrey ,

    am unable to select PA .


    pandiyan

    Thursday, April 2, 2015 10:34 AM
  • Switch to Query mode, copy and paste my proposed piece of code and run it.


    Andrey V Artemyev | Saint-Petersburg, Russia

    Thursday, April 2, 2015 10:43 AM

  • i used this Query its is giving error:


    pandiyan

    Thursday, April 2, 2015 10:51 AM
  • Because for this mining strucuture 

    i used 

    CFS as Key column,

    Date as Key time Column

    Count(Incident) as Predic column

    for PA i did not select any thing, what i can select for that andrey,

    if i select PA also Key column means its throwing error(More than one key column not allowed)

    if i select PA as a Input ,unable to use prediction Query .


    pandiyan


    • Edited by Pandiyanpvp Thursday, April 2, 2015 10:55 AM
    Thursday, April 2, 2015 10:54 AM
  • "CFS as KeyColumn"

    and then here in Properties of CFS Column you added another one column to Key Column Collection, right?

    If so, your query should look like that

    SELECT flattened
       [Next30 Days].[CFS],
       PredictTimeSeries([Next30 Days].[Incidents],5)
     From
       [Next30 Days]

    If you did not change the name of KeyColumn! If it is still CFS.

    Let's see my example. I have no your data, so I created my own sample data for prediction the number of site users. That's my structure:

    For your understanding:

    [День недели] means Week day

    [Кол-во входов] means Number of logins

    [Выходной] means Weekend

    [День] means Day and it's a Named Calculation: cast([День недели] as char(1)) + '.' + case [Выходной] when 1 then 'Выходной' else 'Рабочий' end

    'Рабочий' means Working day.

    My prediction query is

    SELECT
      [TS_MIXED].[День Недели],
      PredictTimeSeries([TS_MIXED].[Кол-во Входов],14)
    From
      [TS_MIXED]

    and it works fine. Note that the field in the prediction query SELECT clause should be same as the name of Key Column in the structure. Not the name of one of KeyColumns in the Collection, but the name of the whole Column in structure.

    Does it make sense?


    Andrey V Artemyev | Saint-Petersburg, Russia

    Thursday, April 2, 2015 11:35 AM
  • Thanks for your response andrey,

    i understood the prediction query what you have provided, but in that only one column(Week Day) in select list you have mentioned. but i am expecting there should be taken 2 key columns(Week day,Weekend) in the select list.

    for my scenario, i have to select 2 COLUMNS(PatrlolArea, CFS), Means.., i have to create the prediction for the mentioned 2 COLUMNS.

    and i have one business doubt regarding my project. ie, all of my customers they do not have belief on predicted data. they are asking how can we proceed further with this predicted Information. this may or may not occur in future , like that they are asking.

    mean while how you are satisfying your customers with your predicted data.


    pandiyan

    Friday, April 3, 2015 10:27 AM
  • You should understand that since we have added another one key to the Key Column Collection this Key Column became a composite key. It still may have the same name but it is represented for model input as two different columns. And to make a meaningful representation for you, you specify the Name Column, in our case it just combines CFS and PA in one string and let us understand what the model is talking about.

    Wish you knew Russian, I would explain it much faster :)

    As for clients. There is one difficult moment, I'm working in the banking area and know nothing about crimes you are researching. In my situation, we often predicts the fact of some action. For example, whether the client is going to buy a particular product or not. Of course, even if model predicts he is, it doesn't give us 100% guarantee! It gives us a probability of this action. This probability is based on the most frequent cases/patterns/behavior etc. And the only way to prove it - to test in real world and assess the accuracy of your model prediction. If prediction data differs from the real data with 5-10% deviation, it's ok for small numbers you mentioned in your example. And if there are some possible ways your customer may act having this knowledge, it's great. He may have some KPIs, for example, the time of reaction when the information about a crime appears. This may depends on how many operating groups are in the area of the crime at this moment. If you predict that tomorrow there will be more crimes then average, they can send more people to be ready to react. Thus the time of reaction may decrease.

    Something that comes to my mind without background data of what you are researching.


    Andrey V Artemyev | Saint-Petersburg, Russia

    • Marked as answer by Pandiyanpvp Friday, April 3, 2015 11:41 AM
    Friday, April 3, 2015 11:23 AM
  • HI Andrey , 

    thank you so much for  spending time for me,it was a great experience to discuss about Data mining with you.


    pandiyan

    Friday, April 3, 2015 11:44 AM
  • Андрей, доброго времени суток!

    Я правильно Вас понял, что если задать составной ключ вышеописанным Вами способом, то при написании prediction query я не смогу получить два поля, каждое из которых являлось бы частью составного ключа?

     В моей задаче, помимо KeyTime-ключа, есть еще два ключевых поля: возрастная категория и параметр, определяющий данные, которые будут предсказываться. Хотелось бы узнать возможно ли что-нибудь придумать, не создавая вычисляемое поле?

    Заранее спасибо!

    Monday, February 6, 2017 2:12 PM
  • Виталий, добрый день!

    Думаю, нет. А в чем смысл? Во временных рядах есть только идентификатор ряда и сам ряд. Предсказываться в любом случае будут следующие несколько значений определенного временного ряда. Можно сделать хоть 100 полей, но подавать их придется в любом случае в качестве идентификатора. Можно сделать таблицу соответствий, куда засунуть все варианты значений Ваших 2 факторов и пронумеровать (сделать новый ID), тогда этот новый ID будет простым одинарным ключом. Но опять же, зачем? :) Это все косметические изощрения, чтобы однозначно определить тот или иной временной ряд.

    Да, иногда возникает желание подать на вход модели, использующей алгоритм временных рядом, "еще какой-то фактор", но этот не тот алгоритм, который принимает факторы, он принимает только предыдущие исторические значения в виде последовательности.

    Другой вариант - брать другие алгоритмы. Ту же кластеризацию последовательностей, но там надо аккуратно. Если значения прогнозируемого поля сильно отличаются, получится слишком много комбинаций, тогда сначала надо нормировать эти значения. В общем, вариантов много. Посмотреть сначала, как справляется простой временной ряд, если совсем плохо, то уже браться за другие алгоритмы, предварительно подготовив данные.


    Andrey V Artemyev | Saint-Petersburg, Russia

    • Proposed as answer by Sidorov Vitaly Friday, February 10, 2017 7:15 AM
    Wednesday, February 8, 2017 12:53 PM
  • Андрей, спасибо за ответ!

    Буду пробовать) Возможно, рассмотрю другие алгоритмы. Просто у меня стоит задача прогнозирования числового ряда на несколько периодов вперед. Я подумал, что для этой задачи подходит только алгоритм MS Time Series.

    Т.е. есть возможность делать прогноз на несколько периодов вперед и с помощью алгоритма кластеризации последовательностей?

    Friday, February 10, 2017 7:15 AM