admin管理员组文章数量:1429517
I'm writing a small Javascript function that gives out a preview of a file if its content can be inserted into a div.innerText
, so I'm looking for all text-based mime types.
For example, I know that text/*
all satisfy this requirement, but there's also application/json
for example.
The expected result is knowing whether a file contains enough English language alpha-numeric characters to justify opening it as a readable text file. It must contains words that make sense to an English language user/reader (such as the
, and
, of
etc... including JSON symbols like ]
etc).
I'm writing a small Javascript function that gives out a preview of a file if its content can be inserted into a div.innerText
, so I'm looking for all text-based mime types.
For example, I know that text/*
all satisfy this requirement, but there's also application/json
for example.
The expected result is knowing whether a file contains enough English language alpha-numeric characters to justify opening it as a readable text file. It must contains words that make sense to an English language user/reader (such as the
, and
, of
etc... including JSON symbols like ]
etc).
- The plete list can be found here. Which of them you will want to consider as plain text is your choice. – Bergi Commented Jun 18, 2017 at 23:56
- What about just looking at the the encoding? – toddmo Commented Jun 18, 2017 at 23:57
- @toddmo Well the idea is, given the mime type, I want to tell whether I should actually poll for more data from this file to display it in the preview, so I can't look at the encoding without actually copying the whole file from the server. – yelsayed Commented Jun 18, 2017 at 23:58
- 2 I would just start your white list and log every unknown type (not on either white or black list) . Inspect the log periodically and build up the white and black list. You'll get very high coverage in short order. – toddmo Commented Jun 19, 2017 at 0:05
-
6
@yelsayed is
application/javascript
orapplication/json
"readable"? Take someone who is not related to programming and show him/her a pacted JS- or json-file and a jpeg file in some text editor. What do you think, would he/she consider one of them "readable", or be able to tell them apart? It's up to you to define what you consider readable. Who's your audiance? What is the purpose of your application? What would your audiance expect of you to support? And what would be a "nice to have"? These are the questions you should ask yourself. – Thomas Commented Jun 19, 2017 at 0:14
2 Answers
Reset to default 0You can find this info at various resources like MDN docs, Wikipedia.
However, at times mime types may be incorrectly set.
Here's an alternative approach.
function isTextFile(file) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = function (event) {
const arrayBuffer = event.target.result;
const uint8Array = new Uint8Array(arrayBuffer);
// Check first 512 bytes (or entire file if smaller)
const maxBytesToCheck = Math.min(uint8Array.length, 512);
for (let i = 0; i < maxBytesToCheck; i++) {
const byte = uint8Array[i];
// Check for non-printable characters (excluding mon ones like newline)
if (byte < 32 && byte !== 9 && byte !== 10 && byte !== 13) {
resolve(false); // Likely a binary file
return;
}
}
resolve(true); // Likely a text file
};
reader.onerror = reject;
reader.readAsArrayBuffer(file);
});
}
Or a hybrid function
async function isTextContent(file) {
// Step 1: MIME type check
const textMimeTypes = [
"text/",
"application/json",
"application/javascript",
"application/xml",
"application/ld+json",
"application/yaml",
"message/"
];
if (textMimeTypes.some((type) => file.type.startsWith(type))) {
return true;
}
// Step 2: Content-based heuristic check
return await isTextFile(file);
}
A while ago, Google Chrome changed the behavior of their DevTools to no longer display multipart/mixed
payloads in the network trace, because their content is not guaranteed to be readable text. A bug report was quickly filed, and its oute was:
- The rules for determining text types (which you may want to follow, see function
isTextType
inMimeType.ts
) were changed to allow more mime types, including allmultipart
types. - The network trace offers a side-by-side view of non-text responses in hex and (attempted) UTF-8 decoding (to be joined by a base64 view in Chrome 132, see here).
The reasoning behind this is that certain non-textual mime types still contain parts that are decodable as UTF-8, and users will value being able to read these parts, like the XML fragment at the end of the image/webp
response in the example below. Compare its presentation in the div
and in the Chrome DevTools.
fetch("https://httpbin/image/webp")
.then(r => r.text())
.then(t => d.innerText = t);
<div id="d"></div>
To answer the original question: Simply loading any response into div.innerText
gives a useful preview.
本文标签: javascriptWhat mime types are plain textStack Overflow
版权声明:本文标题:javascript - What mime types are plain text? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745543745a2662610.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论