日期:2014-05-20  浏览次数:20792 次

关于utf8的疑问
写了一段代码,实现从web服务器中读出一段UTF8编码的文件
相干代码:
while   ((str   =   urlReader.readLine())   !=   null)   {
                  //读取str
}
resultText.setText(new   String(str.toString().getBytes(),
"UTF8 "));

注:resultText为SWT中的Text,str是从web服务器中读出来的数据流,编码为UTF-8

/*******************/

同一文件内的英文没有问题,可是中文却乱码,而且多出在标点符号后。等待高人解围~~

错误实例:(resultText中复制过来的)

??般来??,压缩档案不应包含??有档案压缩目录下,例如   Java语言文字档案和档案卷宗应排除.  



------解决方案--------------------
把 "UTF8 " 去掉,默认试试
------解决方案--------------------
做一下字符转换应该可以!到网上找找
------解决方案--------------------
resultText.setText(new String(str.toString().getBytes( "GBK "),
"UTF8 "));

试下.
------解决方案--------------------
设置一下数据传输的编码方式!
------解决方案--------------------
> resultText.setText(new String(str.toString().getBytes(), "UTF8 "));

这行程序的用法是不对的。这么做,好的字符串也会给搞乱码了。

其实只要 resultText.setText(str) 就 OK 了。

如果有乱码的话,问题应该出在 urlReader.readLine() 上,从那里出来的 String 就已经是乱码了。
------解决方案--------------------
下面是引自openJDK javac1.7 com.sun.tools.javac.util.Convert类的源码:

/** Convert `len ' bytes from utf8 to characters.
* Parameters are as in System.arraycopy
* Return first index in `dst ' past the last copied char.
* @param src The array holding the bytes to convert.
* @param sindex The start index from which bytes are converted.
* @param dst The array holding the converted characters..
* @param dindex The start index from which converted characters
* are written.
* @param len The maximum number of bytes to convert.
*/
public static int utf2chars(byte[] src, int sindex,
char[] dst, int dindex,
int len) {
int i = sindex;
int j = dindex;
int limit = sindex + len;
while (i < limit) {
int b = src[i++] & 0xFF;
if (b > = 0xE0) {
b = (b & 0x0F) < < 12;
b = b | (src[i++] & 0x3F) < < 6;
b = b | (src[i++] & 0x3F);
} else if (b > = 0xC0) {
b = (b & 0x1F) < < 6;
b = b | (src[i++] & 0x3F);
}
dst[j++] = (char)b;
}
return j;
}

/** Return bytes in Utf8 representation as an array of characters.
* @param src The array holding the bytes.
* @param sindex The start index from which bytes are converted.
* @param len The maximum number of bytes to convert.
*/
public static char[] utf2chars(byte[] src, int sindex, int len) {
char[] dst = new char[len];
int len1 = utf2chars(src, sindex, dst, 0, len);
char[] result = new char[len1];
System.arraycopy(dst, 0, result, 0, len1);
return result;
}

/** Return all bytes of a given array in Utf8 representation
* as an array of characters.
* @param src The array holding the bytes.