locked
Http request and Unicode - different behavior RRS feed

  • Question

  • I'm developing a MFC application for a client that sends a request to a URL.

    I have tested the application in two different computers (both with XP and Unicode support).

    The application allows an user to type an URL that will be requested by the code below.

    When tested on computer A, request is sent correctly with Unicode characters in url/body.

    When tested on computer B, request is sent with question signs '?' instead of Unicode characters in url/body.

    Both computers allow typing with Unicode chars, there is no problem with that.

    Here's the part of code where I'm having trouble:

    CString MyExeHttp(CString strUrl2, CString strAction, LPCSTR pPostData, DWORD dwPostDataSize, LPCSTR sHead, CString strProxy)
    {
    	CString strUrl = strUrl2;
    	CString strDomain;
    	CString strPath;
    	CString strHead;
    	CString strOut;
    	CString strAgent = L"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36";
    
    	int n;
    	DWORD dwSize;
    	int nPort = 80;
    
    	bool bIsHttps = false;
    
    	if (strUrl.Left(7).CompareNoCase(L"http://") == 0) strUrl = strUrl.Mid(7);
    	if (strUrl.Left(8).CompareNoCase(L"https://") == 0)
    	{
    		strUrl = strUrl.Mid(8);
    		bIsHttps = true;
    		nPort = 443;
    	}
    
    	n = strUrl.Find(L"/");
    	if (n == -1)
    	{
    		strDomain = strUrl;
    		strPath   = "/";
    	} 
    	else
    	{
    		strDomain = strUrl.Left(n);
    		strPath   = strUrl.Mid(n);
    	}
    
    //	nPort = INTERNET_DEFAULT_HTTPS_PORT;
    
    	n = strDomain.Find(L":");
    	if (n != -1)
    	{
    		nPort = atoi(CT2CA(strDomain.Mid(n + 1)));
    		strDomain = strDomain.Left(n);
    	}
    
    	if (!strAction.GetLength()) strAction = L"GET";
    	if (sHead) strHead = sHead;
    	if (strAction.CompareNoCase(L"POST") == 0)
    	{
    		CString strPostHead = L"Content-Type: application/x-www-form-urlencoded";
    		if (strHead.GetLength() == 0 || strHead.Find(L"Content-Type: ") == -1)
    		{
    			if (strHead.GetLength() != 0 && strHead.Right(2) != L"\r\n") strHead += L"\r\n";
    			strHead += strPostHead;
    		}
    	}
    
    	HINTERNET hInternet = NULL;
    	HINTERNET hConnect  = NULL;
    	HINTERNET hRequest = NULL;
    
    	if (strProxy.GetLength() == 0)
    		hInternet = InternetOpen(strAgent, NULL, NULL, NULL, NULL);
    	else
    		hInternet = InternetOpen(strAgent, INTERNET_OPEN_TYPE_PROXY, strProxy, NULL, NULL);
    
    	hConnect  = InternetConnect(hInternet, strDomain, nPort, NULL, NULL, INTERNET_SERVICE_HTTP, NULL, NULL);
    	if (hConnect)
    	{
    		LPCWSTR pAccpet[2];
    		pAccpet[0] = L"text/html";
    		pAccpet[1] = NULL;
    
    		int nFlags = INTERNET_FLAG_NO_CACHE_WRITE | INTERNET_FLAG_RELOAD | INTERNET_FLAG_PRAGMA_NOCACHE;
    		if (bIsHttps) nFlags |= INTERNET_FLAG_SECURE;
    
    		hRequest = HttpOpenRequestW(hConnect, strAction, strPath, HTTP_VERSION, NULL, pAccpet, nFlags, NULL);
    	}
    
    	bool bIsUtf8 = false;
    
    	if (hRequest)
    	{
    		dwSize = strHead.GetLength();
    	
    		if (HttpSendRequestW(hRequest, strHead, dwSize, (LPSTR)pPostData, dwPostDataSize))
    		{
    			BYTE sBuff[1024 + 1];
    			TCHAR szResp[1024 + 1];
    			DWORD dwIndex = 0;
    
    			dwSize = 1024;
    			if (HttpQueryInfoW(hRequest, HTTP_QUERY_STATUS_CODE, sBuff, &dwSize, &dwIndex))
    			{
    				if (sBuff[0] == '2' || sBuff[0] == '5')
    				{
    					dwSize = 1024;
    					if (HttpQueryInfoW(hRequest, HTTP_QUERY_CONTENT_TYPE, sBuff, &dwSize, &dwIndex))
    					{
    						sBuff[dwSize] = 0;
    						CString strTemp = sBuff;
    						strTemp.MakeLower();
    						if (strTemp.Find(L"utf-8") != -1 || strTemp.Find(L"utf8") != -1) bIsUtf8 = true;
    						}
    						while (InternetReadFile(hRequest, sBuff, 1024, &dwSize))
    						{
    							MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, (LPCSTR) &sBuff[0], sizeof(sBuff), &szResp[0], sizeof(szResp));
    
    							if (dwSize == 0) break;
    							szResp[dwSize] = '\0';
    							strOut += szResp;
    						}
    				}
    			}
    		}
    	}
    	if (hRequest) InternetCloseHandle(hRequest);
    	if (hConnect)  InternetCloseHandle(hConnect);
    	if (hInternet) InternetCloseHandle(hInternet);
    
    	return strOut;
    }

    Might it be a problem with InternetConnect?

    Thanks in advance for any help. If you need any more info please don't doubt to ask me.

    Sunday, November 3, 2013 4:44 AM

Answers

  • Try InternetSetOption with INTERNET_OPTION_CODEPAGE_EXTRA and specify the L”utf-8” encoding.

    Check which values are returned by InternetQueryOption on both computers. Is any difference?

    Otherwise you probably have to convert the query part to UTF-8, then encode using the ‘%XX’ form.

    Sunday, November 3, 2013 8:48 AM
  • The difference between the two machines could be Internet Options > Advanced > International > Send UTF-8 URLs  setting. I'm not sure exactly how it affects WinInet though. It might be prudent to use HttpOpenRequestA instead, where WinInet doesn't perform any conversions and you are in control of exactly what bytes the URL is comprised of.

    As to POST requests, the last two parameters of HttpSendRequest specify an array of bytes. For all WinInet knows, you are sending binary data; it doesn't perform any conversions on it whatsoever. If you are getting bad data on the server, it means bad data was passed to MyExeHttp to begin with. Find out why.


    Igor Tandetnik

    Sunday, November 3, 2013 3:07 PM

All replies

  • Try InternetSetOption with INTERNET_OPTION_CODEPAGE_EXTRA and specify the L”utf-8” encoding.

    Check which values are returned by InternetQueryOption on both computers. Is any difference?

    Otherwise you probably have to convert the query part to UTF-8, then encode using the ‘%XX’ form.

    Sunday, November 3, 2013 8:48 AM
  • The difference between the two machines could be Internet Options > Advanced > International > Send UTF-8 URLs  setting. I'm not sure exactly how it affects WinInet though. It might be prudent to use HttpOpenRequestA instead, where WinInet doesn't perform any conversions and you are in control of exactly what bytes the URL is comprised of.

    As to POST requests, the last two parameters of HttpSendRequest specify an array of bytes. For all WinInet knows, you are sending binary data; it doesn't perform any conversions on it whatsoever. If you are getting bad data on the server, it means bad data was passed to MyExeHttp to begin with. Find out why.


    Igor Tandetnik

    Sunday, November 3, 2013 3:07 PM