none
How to implement NCQ in kernal mode? Is this is same as read dma in a for loop? RRS feed

  • Question

  • I am trying to write a function to implement NCQ, I have identified the steps to make a simple program.

    • Step 1: Find the maximum number of NCQ length supported.
    • Step 2: Get queue length.
    • Step 3: If queue length > maximum NCQ length
    • goto exit
    • Step 4: Find all free Command Slots
    • Step 5: Build ‘queue length’ number of command FISes
    • Step 6: Set corresponding SActive bits.
    • Step 7: Set all queue length number of bits in CI register.
    • Step 8: Wait until CI bit clears.
    • Step 9: Return Status
    • Step 10: Exit

    Could anybody tell me if I am doing anything wrong here? I do not know whether the procedure written above is right way or not, I need some confirmation before going for implementation.


    varun rao K M

    Friday, January 13, 2017 6:27 AM

Answers

  • Typically a miniport specifies the maximum transfer to be a value that is handled as a single write or read.  The upper layers of the storage stack will then break the request into requests of that size.   What happens then with NCQ is that you queue a request, and let hardware service it, this can be a situation where multiple concurrent requests are completed concurrently.

    So your model is basically wrong, you are not worrying about multiple slots for a request since the upper layers have broken the request into pieces that only take a slot.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Friday, January 13, 2017 1:04 PM

All replies

  • Typically a miniport specifies the maximum transfer to be a value that is handled as a single write or read.  The upper layers of the storage stack will then break the request into requests of that size.   What happens then with NCQ is that you queue a request, and let hardware service it, this can be a situation where multiple concurrent requests are completed concurrently.

    So your model is basically wrong, you are not worrying about multiple slots for a request since the upper layers have broken the request into pieces that only take a slot.


    Don Burn Windows Driver Consulting Website: http://www.windrvr.com

    Friday, January 13, 2017 1:04 PM
  • Thank you for your reply.

    From the website "osdev.org" I could get the code and so a rough idea about the simple read command as given below.

    #define ATA_DEV_BUSY 0x80
    #define ATA_DEV_DRQ 0x08
     
    BOOL read(HBA_PORT *port, DWORD startl, DWORD starth, DWORD count, WORD *buf)
    {
    	port->is = (DWORD)-1;		// Clear pending interrupt bits
    	int spin = 0; // Spin lock timeout counter
    	int slot = find_cmdslot(port);
    	if (slot == -1)
    		return FALSE;
     
    	HBA_CMD_HEADER *cmdheader = (HBA_CMD_HEADER*)port->clb;
    	cmdheader += slot;
    	cmdheader->cfl = sizeof(FIS_REG_H2D)/sizeof(DWORD);	// Command FIS size
    	cmdheader->w = 0;		// Read from device
    	cmdheader->prdtl = (WORD)((count-1)>>4) + 1;	// PRDT entries count
     
    	HBA_CMD_TBL *cmdtbl = (HBA_CMD_TBL*)(cmdheader->ctba);
    	memset(cmdtbl, 0, sizeof(HBA_CMD_TBL) +
     		(cmdheader->prdtl-1)*sizeof(HBA_PRDT_ENTRY));
     
    	// 8K bytes (16 sectors) per PRDT
    	for (int i=0; i<cmdheader->prdtl-1; i++)
    	{
    		cmdtbl->prdt_entry[i].dba = (DWORD)buf;
    		cmdtbl->prdt_entry[i].dbc = 8*1024;	// 8K bytes
    		cmdtbl->prdt_entry[i].i = 1;
    		buf += 4*1024;	// 4K words
    		count -= 16;	// 16 sectors
    	}
    	// Last entry
    	cmdtbl->prdt_entry[i].dba = (DWORD)buf;
    	cmdtbl->prdt_entry[i].dbc = count<<9;	// 512 bytes per sector
    	cmdtbl->prdt_entry[i].i = 1;
     
    	// Setup command
    	FIS_REG_H2D *cmdfis = (FIS_REG_H2D*)(&cmdtbl->cfis);
     
    	cmdfis->fis_type = FIS_TYPE_REG_H2D;
    	cmdfis->c = 1;	// Command
    	cmdfis->command = ATA_CMD_READ_DMA_EX;
     
    	cmdfis->lba0 = (BYTE)startl;
    	cmdfis->lba1 = (BYTE)(startl>>8);
    	cmdfis->lba2 = (BYTE)(startl>>16);
    	cmdfis->device = 1<<6;	// LBA mode
     
    	cmdfis->lba3 = (BYTE)(startl>>24);
    	cmdfis->lba4 = (BYTE)starth;
    	cmdfis->lba5 = (BYTE)(starth>>8);
     
    	cmdfis->countl = LOBYTE(count);
    	cmdfis->counth = HIBYTE(count);
     
    	// The below loop waits until the port is no longer busy before issuing a new command
    	while ((port->tfd & (ATA_DEV_BUSY | ATA_DEV_DRQ)) && spin < 1000000)
    	{
    		spin++;
    	}
    	if (spin == 1000000)
    	{
    		trace_ahci("Port is hung\n");
    		return FALSE;
    	}
     
    	port->ci = 1<<slot;	// Issue command
     
    	// Wait for completion
    	while (1)
    	{
    		// In some longer duration reads, it may be helpful to spin on the DPS bit 
    		// in the PxIS port field as well (1 << 5)
    		if ((port->ci & (1<<slot)) == 0) 
    			break;
    		if (port->is & HBA_PxIS_TFES)	// Task file error
    		{
    			trace_ahci("Read disk error\n");
    			return FALSE;
    		}
    	}
     
    	// Check again
    	if (port->is & HBA_PxIS_TFES)
    	{
    		trace_ahci("Read disk error\n");
    		return FALSE;
    	}
     
    	return TRUE;
    }
    
    Similarly could you give me some rough idea how I can implement the "NCQ read"? 
     

    varun rao K M

    Monday, January 16, 2017 4:52 AM