(1)Java 如何escape HTML代码
如何转义HTML标签
- /**
- * 去除HTML字串中的控制字符及不可视字符
- *
- * @param str
- * HTML字串
- * @return 返回的字串
- */
- public static String escapeHTML(String str) {
- int length = str.length();
- int newLength = length;
- boolean someCharacterEscaped = false;
- for (int i = 0; i < length; i++) {
- char c = str.charAt(i);
- int cint = 0xffff & c;
- if (cint < 32)
- switch (c) {
- case 11:
- default:
- newLength--;
- someCharacterEscaped = true;
- break;
- case '\t':
- case '\n':
- case '\f':
- case '\r':
- break;
- }
- else
- switch (c) {
- case '"':
- newLength += 5;
- someCharacterEscaped = true;
- break;
- case '&':
- case '\'':
- newLength += 4;
- someCharacterEscaped = true;
- break;
- case '<':
- case '>':
- newLength += 3;
- someCharacterEscaped = true;
- break;
- }
- }
- if (!someCharacterEscaped)
- return str;
- StringBuffer sb = new StringBuffer(newLength);
- for (int i = 0; i < length; i++) {
- char c = str.charAt(i);
- int cint = 0xffff & c;
- if (cint < 32)
- switch (c) {
- case '\t':
- case '\n':
- case '\f':
- case '\r':
- sb.append(c);
- break;
- }
- else
- switch (c) {
- case '"':
- sb.append(""");
- break;
- case '\'':
- sb.append("'");
- break;
- case '&':
- sb.append("&");
- break;
- case '<':
- sb.append("<");
- break;
- case '>':
- sb.append(">");
- break;
- default:
- sb.append(c);
- break;
- }
- }
- return sb.toString();
- }
测试:
- @Test
- public void test_001(){
- String input="<html><input type=\"button\" onlick=\"abc()\" > </html>";
- System.out.println(input);
- System.out.println(StringUtil.escapeHTML(input));
- }
运行结果:
(2)java 如何去除html标签,只留下文本
- /**
- * 删除input字符串中的html格式
- *
- * @param input
- * @param length
- * 显示的字符的个数
- * @return
- */
- public static String splitAndFilterString(String input, int length) {
- if (input == null || input.trim().equals("")) {
- return "";
- }
- // 去掉所有html元素,
- String str = input.replaceAll("\\&[a-zA-Z]{1,10};", "").replaceAll(
- "<[^>]*>", "");
- str = str.replaceAll("[(/>)<]", "");
- int len = str.length();
- if (len <= length) {
- return str;
- } else {
- str = str.substring(0, length);
- str += "......";
- }
- return str;
- }
- /**
- * 返回纯文本,去掉html的所有标签,并且去掉空行
- *
- * @param input
- * @return
- */
- public static String splitAndFilterString(String input) {
- if (input == null || input.trim().equals("")) {
- return "";
- }
- // 去掉所有html元素,
- String str = input.replaceAll("\\&[a-zA-Z]{1,10};", "").replaceAll(
- "<[^>]*>", "");
- str = str.replaceAll("[(/>)<]", "");
- return SystemHWUtil.deleteCRLF(str);
- }
- /***
- * Delete all spaces
- *
- * @param input
- * @return
- */
- public static String deleteAllCRLF(String input) {
- return input.replaceAll("((\r\n)|\n)[\\s\t ]*", "").replaceAll(
- "^((\r\n)|\n)", "");
- }
- /**
- * delete CRLF; delete empty line ;delete blank lines
- *
- * @param input
- * @return
- */
- public static String deleteCRLF(String input) {
- input = SystemHWUtil.deleteCRLFOnce(input);
- return SystemHWUtil.deleteCRLFOnce(input);
- }