日期:2014-05-18  浏览次数:20664 次

关于JAVA远程获取XML的UTF-8中文乱码的问题!!
最近要远程获取的XML数据,而XML的编码方式问UTF-8,获取之后里面的中文全部是乱码,而获取GB2312编码的XML数据就能的到正确的数据!
Java code
public class XmlTransfer{
  private String urlAddr;
  private String xmlStr;
  HttpURLConnection urlCon = null;

  public XmlTransfer(String _urlAddr,String _xmlStr) {
    this.urlAddr = _urlAddr;
    this.xmlStr = _xmlStr;
  }
  public InputStream get() throws Exception
  {
    if(urlCon==null){urlCon=getUrlConnection();}
    if(urlCon==null){throw new Exception("连接失败");}
    PrintWriter out = new PrintWriter(urlCon.getOutputStream());
   

    urlCon.disconnect();
    InputStream fin1 = urlCon.getInputStream();
    return fin1;
  }

  private HttpURLConnection getUrlConnection(){

    try{
      URL url = new URL(urlAddr);
      URLConnection conn = url.openConnection();
      urlCon = (HttpURLConnection)conn;
      urlCon.setRequestProperty("Content-type", "text/html;charset=utf-8");
      urlCon.setDoOutput(true);
      urlCon.setRequestMethod("GET");
      urlCon.setUseCaches(false);
    }
    catch (MalformedURLException mex) {
      mex.printStackTrace();
    }
    catch (ProtocolException pex) {
      pex.printStackTrace();
    }
    catch (IOException iex) {
      iex.printStackTrace();
    }

    return urlCon;
  }


  public   String getHttp( String strURL ){
      XmlTransfer xt=new XmlTransfer(strURL,"");
      StringBuffer sb = new StringBuffer();
      try{
          InputStream is = xt.get();
          
          byte[] b = new byte[1024];
          int iCount = 0;
          while ((iCount = is.read(b)) > 0) {
              sb.append(new String(b, 0, iCount));
          }
      }catch(Exception e){
          sb.append("An error occurs in XmlTransfer.getHttp\n");
          sb.append(e.getMessage());
      }

     return (sb.toString());
  }
  
  
}





高手指点一二 明天结贴

------解决方案--------------------
这个不是取决于你的编码,而是取决于对方的编码格式

GB2312是两个字节存贮一个字符
而utf-8在c#中好像是3个字节

所以如果对方的xml是gb2312编码,而你采用uft-8编码
那么在读取的时候会将原本两个字节一组的字符强制以3个字节一组进行解码
结果可能就是乱码

总之就是对方的xml用什么编码格式编码,你就用什么编码格式进行解码,不是说简单的utf-8就行
之所以用utf-8,只是因为utf-8是大多数xml采用的编码格式而已

详细请参考
http://www.cnblogs.com/mjgforever/archive/2008/02/27/1083135.html
------解决方案--------------------
是这样的,我以前遇到过,你要进行转码的,转码就好了
Java code
StringBuffer temp = new StringBuffer();
InputStream in = new BufferedInputStream(urlCon.getInputStream(););
                Reader rd = new InputStreamReader(in,"UTF-8");
                int c = 0;
                while ((c = rd.read()) != -1) {
                    temp.append((char) c);
                }
                in.close();
temp.toString();//得到xml

------解决方案--------------------
你一个过滤器就ok!

package com.shop.filter;

import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.util.*;
/**************************************************
 * author:East
 * date:2008-6-13
 * note: EncodingFilter用来解决中文的乱码
 **************************************************/

public class EncodingFilter extends HttpServlet implements Filter {
private FilterConfig filterConfig;
//Handle the passed-in FilterConfig
public void init(FilterConfig filterConfig) throws Servle