读一个txt文件(UTF-8),分行读入,不乱码。和判断UTF-8编码
本帖最后由 null. 于 2025-5-11 15:07 编辑解决读一个txt文件(UTF-8),分行读入,不乱码。
e:\\123.txt
设置为二进制模式
设置3为二进制模式
设置为4二进制模式
static/image/hrline/1.gif
(defun read-utf8-file (sm / binarystream file filestream lines pos size str)
(setq file "e:\\123.txt")
(setq binarystream (vlax-create-object "Adodb.Stream"))
(vlax-invoke binarystream 'Open)
(vlax-invoke-Method binarystream 'LoadFromFile file)
(vlax-put-property binarystream 'Type 2) ; 2文本模式读取
(vlax-put-property binarystream 'Charset "utf-8")
(vlax-put-property binarystream 'Position 2);将位置重置为起始位置
(setq str (Vlax-Invoke-Method binarystream 'ReadText nil))
(vlax-invoke binarystream 'flush)
(vlax-invoke binarystream 'Close)
(vlax-release-object binarystream)
(while (setq pos (vl-string-search "\r\n" str))
(setq lines (cons (substr str 1 pos) lines))
(setq str (substr str (+ pos 3)))
)
(reverse (cons str lines))
)
static/image/hrline/1.gif
[*](read-utf8-file nil)
("设置为二进制模式" "设置3为二进制模式" "设置为4二进制模式")
加一个判断TXT是否为UTF-8
(defun c:CheckUTF8 (/ binarystream bomFlag byte1 byte2 byte3 check_continuation_byte fHandle filePath str1 str2 utf8Flag)
(defun check_continuation_byte ( dd / byte)
(setq byte (car str2))
(setq str2 (cdr str2))
(if (and byte (<= 128 byte 191)) t nil)
)
(setq filePath (getfiled "选择TXT文件" "" "txt" 16))
(if filePath
(progn
(setq binarystream (vlax-create-object "Adodb.Stream"))
(vlax-put-property binarystream 'Type 1)
(vlax-put-property binarystream 'Mode 3)
(vlax-invoke binarystream 'Open)
(vlax-put-property binarystream 'Position 0)
(vlax-invoke-Method binarystream 'LoadFromFile filePath)
(setq str1 (Vlax-Invoke-Method binarystream 'Read nil))
;(setq str1 (Vlax-Invoke-Method binarystream 'Read 3))
(vlax-invoke binarystream 'flush)
(vlax-invoke binarystream 'Close)
(vlax-release-object binarystream)
(setq str2 (vlax-safearray->list (vlax-variant-value str1) ))
;(setq fHandle (open filePath "rb")) ; 二进制方式打开文件
(setq bomFlag nil
utf8Flag t)
; 阶段一:BOM头判断 ()
(setq byte1 (nth 0 str2)
byte2 (nth 1 str2)
byte3 (nth 2 str2))
(if (and (= byte1 239) (= byte2 187) (= byte3 191))
(setq bomFlag t)
(progn
; 无BOM时回退到起始位置
(setq str2 (vlax-safearray->list (vlax-variant-value str1) ))
)
)
; 阶段二:UTF-8编码规则验证 ()
(while (and utf8Flag (setq byte1 (car str2)))
(setq str2 (cdr str2))
(cond
((< byte1 128) ; 单字节字符(0xxxxxxx)
)
((<= 194 byte1 222) ; 双字节字符(110xxxxx)
(if (null (check_continuation_byte fHandle))
(setq utf8Flag nil)
)
)
((<= 224 byte1 239) ; 三字节字符(1110xxxx)
(if (or (null (check_continuation_byte fHandle))
(null (check_continuation_byte fHandle)))
(setq utf8Flag nil)
)
)
((<= 240 byte1 244) ; 四字节字符(11110xxx)
(if (or (null (check_continuation_byte fHandle))
(null (check_continuation_byte fHandle))
(null (check_continuation_byte fHandle)))
(setq utf8Flag nil)
)
)
(t (setq utf8Flag nil))
)
)
; 输出结果
(alert
(strcat "文件编码:"
(cond
(bomFlag "UTF-8 with BOM")
(utf8Flag "UTF-8 without BOM")
(t "非UTF-8编码(可能是ANSI/GBK)")
)
)
)
)
)
(princ)
)
本帖最后由 null. 于 2025-5-11 23:20 编辑
static/image/hrline/4.gif
可以自动判断TXT格式,读取TXT内容。(仅区分 UTF-8和UTF-8 BOM和ANSI)
页:
[1]