admin管理员组

文章数量:1429517

I'm writing a small Javascript function that gives out a preview of a file if its content can be inserted into a div.innerText, so I'm looking for all text-based mime types.

For example, I know that text/* all satisfy this requirement, but there's also application/json for example.

The expected result is knowing whether a file contains enough English language alpha-numeric characters to justify opening it as a readable text file. It must contains words that make sense to an English language user/reader (such as the, and, of etc... including JSON symbols like ] etc).

I'm writing a small Javascript function that gives out a preview of a file if its content can be inserted into a div.innerText, so I'm looking for all text-based mime types.

For example, I know that text/* all satisfy this requirement, but there's also application/json for example.

The expected result is knowing whether a file contains enough English language alpha-numeric characters to justify opening it as a readable text file. It must contains words that make sense to an English language user/reader (such as the, and, of etc... including JSON symbols like ] etc).

Share Improve this question edited Dec 26, 2024 at 12:28 VC.One 16k4 gold badges27 silver badges61 bronze badges asked Jun 18, 2017 at 23:45 yelsayedyelsayed 5,5626 gold badges34 silver badges39 bronze badges 10
  • The plete list can be found here. Which of them you will want to consider as plain text is your choice. – Bergi Commented Jun 18, 2017 at 23:56
  • What about just looking at the the encoding? – toddmo Commented Jun 18, 2017 at 23:57
  • @toddmo Well the idea is, given the mime type, I want to tell whether I should actually poll for more data from this file to display it in the preview, so I can't look at the encoding without actually copying the whole file from the server. – yelsayed Commented Jun 18, 2017 at 23:58
  • 2 I would just start your white list and log every unknown type (not on either white or black list) . Inspect the log periodically and build up the white and black list. You'll get very high coverage in short order. – toddmo Commented Jun 19, 2017 at 0:05
  • 6 @yelsayed is application/javascript or application/json "readable"? Take someone who is not related to programming and show him/her a pacted JS- or json-file and a jpeg file in some text editor. What do you think, would he/she consider one of them "readable", or be able to tell them apart? It's up to you to define what you consider readable. Who's your audiance? What is the purpose of your application? What would your audiance expect of you to support? And what would be a "nice to have"? These are the questions you should ask yourself. – Thomas Commented Jun 19, 2017 at 0:14
 |  Show 5 more ments

2 Answers 2

Reset to default 0

You can find this info at various resources like MDN docs, Wikipedia.

However, at times mime types may be incorrectly set.

Here's an alternative approach.

function isTextFile(file) {
    return new Promise((resolve, reject) => {
        const reader = new FileReader();
        reader.onload = function (event) {
            const arrayBuffer = event.target.result;
            const uint8Array = new Uint8Array(arrayBuffer);

            // Check first 512 bytes (or entire file if smaller)
            const maxBytesToCheck = Math.min(uint8Array.length, 512);
            for (let i = 0; i < maxBytesToCheck; i++) {
                const byte = uint8Array[i];
                // Check for non-printable characters (excluding mon ones like newline)
                if (byte < 32 && byte !== 9 && byte !== 10 && byte !== 13) {
                    resolve(false); // Likely a binary file
                    return;
                }
            }
            resolve(true); // Likely a text file
        };
        reader.onerror = reject;
        reader.readAsArrayBuffer(file);
    });
}

Or a hybrid function

async function isTextContent(file) {
    // Step 1: MIME type check
    const textMimeTypes = [
        "text/",
        "application/json",
        "application/javascript",
        "application/xml",
        "application/ld+json",
        "application/yaml",
        "message/"
    ];
    if (textMimeTypes.some((type) => file.type.startsWith(type))) {
        return true;
    }

    // Step 2: Content-based heuristic check
    return await isTextFile(file);
}

A while ago, Google Chrome changed the behavior of their DevTools to no longer display multipart/mixed payloads in the network trace, because their content is not guaranteed to be readable text. A bug report was quickly filed, and its oute was:

  • The rules for determining text types (which you may want to follow, see function isTextType in MimeType.ts) were changed to allow more mime types, including all multipart types.
  • The network trace offers a side-by-side view of non-text responses in hex and (attempted) UTF-8 decoding (to be joined by a base64 view in Chrome 132, see here).

The reasoning behind this is that certain non-textual mime types still contain parts that are decodable as UTF-8, and users will value being able to read these parts, like the XML fragment at the end of the image/webp response in the example below. Compare its presentation in the div and in the Chrome DevTools.

fetch("https://httpbin/image/webp")
.then(r => r.text())
.then(t => d.innerText = t);
<div id="d"></div>

To answer the original question: Simply loading any response into div.innerText gives a useful preview.

本文标签: javascriptWhat mime types are plain textStack Overflow