Microsoft.com Home | Site Map
MSDN
Search Microsoft.com for:
Sign in to the Microsoft Passport Network
Home Search My Threads Member List Faq  
Visual C++ Language
Re: An STL Problem: Reading Unicode File in VC++ 2005

Thread Starter: Pomelo   Started: 20 Jan 2006 1:57 PM UTC   Replies: 5
 MSDN Forums » Visual C++ » Visual C++ Language » Re: An STL Problem: Reading Unicode File in VC++ 2005
      « Previous Thread   Next Thread »
  20 Jan 2006, 1:57 PM UTC
Pomelo

Posts 10
An STL Problem: Reading Unicode File in VC++ 2005
Answered Question This post has a code sample within it. Was this post helpful ?
Reply Quote

Hi,

I wrote some code, aiming at loading a text files in UNICODE:

#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int _tmain(int /*argc*/, _TCHAR* /*argv[]*/)
{
 wstring szresource;
 wstringstream strstm;
 wifstream file;
 file.open(L"E:\\Page1.xml");

 while(!file.fail() && !file.eof())
 {
  file >> szresource;

  wcout << szresource.c_str() << endl;
  
  strstm << szresource;
 }

 // other operation

 return 0;
}

The file "Page1.xml" is not a standard XML file, but just a normal text. However, the wcout function just printed out the first character stored in szresource, and after the while loop, strstm had only the first character in its stream too.

I debugged the code and found it quite strange. The text in Page1.xml contained only a word:

Dialog

And I thought since it was handled by wchar_t, it should be like

44 00 69 00 61 00 6c 00 6f 00 67 00

However, in the corresponding memory map, it was

44 00 00 00 69 00 00 00 61 00 00 00 6c 00 00 00 6f 00 00 00 67 00 00 00

I was really confused...is there anybody could help? by the way, the solution was built using unicode char set by default.

Thanks a lot!




Get Busy Living, or Get Busy Dying

   Report Abuse 
  20 Jan 2006, 3:24 PM UTC
OShah

Posts 188
Answer Re: An STL Problem: Reading Unicode File in VC++ 2005
Answer Was this post helpful ?
Reply Quote

Taka Muraoka's article still applies

http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp

.

 


   Report Abuse 
  21 Jan 2006, 1:52 AM UTC
Pomelo

Posts 10
Re: An STL Problem: Reading Unicode File in VC++ 2005
Comment Was this post helpful ?
Reply Quote

Thank you!

Do you suggest that it is a bug (isn't it?)  by-design, and if we want to handle unicode we had to take the conversions as what in Taka Muraoka's article?

BTY, I doubt if it is the way that ANSI C++ handles wchar_ts with unicode.




Get Busy Living, or Get Busy Dying

   Report Abuse 
  20 Jan 2006, 4:05 PM UTC
vbvan

Posts 73
Answer Re: An STL Problem: Reading Unicode File in VC++ 2005
Answer Was this post helpful ?
Reply Quote

It seems that the STL fstream assumes ANSI files, so it will automatically convert ansi -> unicode for you no matter whether the file is really ansi encoded.

Please see the url OShah given(http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp) for the workround


   Report Abuse 
  21 Jan 2006, 2:02 PM UTC
OShah

Posts 188
Re: An STL Problem: Reading Unicode File in VC++ 2005
Comment Was this post helpful ?
Reply Quote

 Pomelo wrote:

Thank you!

Do you suggest that it is a bug (isn't it?)  by-design, and if we want to handle unicode we had to take the conversions as what in Taka Muraoka's article?

BTY, I doubt if it is the way that ANSI C++ handles wchar_ts with unicode.

I checked the relevant part of the C++ standard (27.8), but couldn't understand it. Therefore, I'll quote from Taka's article (in the section Wide File I/O):

"

It turns out that the C++ standard dictates that wide-streams are required to convert double-byte characters to single-byte when writing to a file. So in the example above, the wide string L"ABC" (which is 6 bytes long) gets converted to a narrow string (3 bytes) before it is written to the file. And if that wasn't bad enough, how this conversion is done is implementation-dependent.

"

(but I'm not sure how he came to this conclusion).


   Report Abuse 
  21 Jan 2006, 4:02 PM UTC
vbvan

Posts 73
Re: An STL Problem: Reading Unicode File in VC++ 2005
Comment Was this post helpful ?
Reply Quote

I can only find the following info from the c++ standard

Multibyte character and Files A File provides byte sequences. So the streambuf (or its derived classes) treats a file
as the external source/sink byte sequence. In a large character set environment, multibyte character sequences are
held in files
. In order to provide the contents of a file as wide character sequences, wide-oriented filebuf, namely
wfilebuf should convert wide character sequences.

And something about codecvt

The class codecvt<internT,externT,stateT> is for use when converting from one codeset to another, such as from
wide characters to multibyte characters or between wide character encodings such as Unicode and EUC

BTW: In my opinion, the txt file encoded in Unicode will also have various encoding, like Unicode,Unicode big endian,UTF-8 (these are supported by Windows Notepad), so there is no portable way to define the default encoding of the file. And the c++ provides the codecvt class to customize the converting option. I think it is up to the programmer to tell the fstream which encode the file really uses.


   Report Abuse 
 Page 1 of 1 (6 items)
MSDN Forums » Visual C++ » Visual C++ Language » Re: An STL Problem: Reading Unicode File in VC++ 2005


© 2006 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement
Microsoft