admin管理员组

文章数量:814956

我如何为原始文件中的每50000行打印出一个新文件

这是我当前的代码,它很简单。只需一次读取文件,每行就会打印到一个新文件,该文件是原始名称,但要附加_part,每5万行增加一次,一旦完成读取,便会将每个文件名运行到用于处理文件的函数中。但是由于某种原因,它只是抓住每行的末端并打印出10000次(原始文件中的行)。它起初是有效的,我更改了一些内容,开始执行此操作,然后即使我取消了这些更改,它仍会继续执行此操作

const fs = require('fs');
const csv = require('csv-parser');
//File containing unprocessed addresses
let fileName = ("Refinitiv_Address_GBR_10000.csv");
//Country we are looking at address of
let country = "UK";

let fileRead;
let fileWrite;
let fileNum = 1;

DivideFile();

async function DivideFile() {
    let lineNum = 0;

    fileWrite = fs.createWriteStream(`./Originals/${fileName.split('.')[0]}_part${fileNum}.${fileName.split('.')[1]}`);

    fileRead = fs.createReadStream(`./Originals/${fileName}`)
        .pipe(csv())
        //Indicate start of reading
        .on('resume', () => {
            console.log("Processing file");
        })
        .on('data', (data) => {
            lineNum++;
            console.log(Object.values(data).toString());
            fs.appendFile(`./Originals/${fileName.split('.')[0]}_part${fileNum}.${fileName.split('.')[1]}`, Object.values(data).toString() + '\n', () => {
                //Nothing to go here at the moment
            });

            if (lineNum == 50000) {
                fileNum++;
                lineNum = 0;
            }
        })
        .on('end', () => {
            for (var file in fileNum) {
                RunFunc(`${fileName.split('.')[0]}_part${file}.${fileName.split('.')[1]}`);
            }
        });
}

这是原始数据的样本。所有信息均来自公共来源,而非私人信息

,,,,GBR,
"Todd Campus, West of Scotland Science Park,Maryhill Road",GLASGOW,UNITED KINGDOM-NA,G20 0UA,GBR,GBR
,,,,GBR,GBR
,,,,GBR,
"Horsfield Way,, Bredbury Industrial Park",STOCKPORT,CHESHIRE,SK6 2SU,GBR,GBR
"Brunel Way, The Nucleus",Dartford,KENT,DA1 5GA,GBR,
,,,,GBR,
,,,,GBR,
5 New Street Square,London,London,EC4A 3TW,GBR,
"Pentwyn Farm, Huntingdon",,,HR5 3PQ,GBR,GBR
124 Horseferry Road,LONDON,UNITED KINGDOM-NA,SW1P 2TX,GBR,GBR
,,,,GBR,
Unit 700 Fareham Reach Fareham Road,,,,GBR,GBR
"Eastwood House, Glebe Road",CHELMSFORD,ESSEX,CM1 1RS,GBR,GBR
Fineshade Abbey,CORBY,NORTHAMPTONSHIRE,NN17 3BA,GBR,GBR
,,,,,GBR
,,,,GBR,
3 Hempstead Close,,ESSEX,IG9 5JQ,GBR,GBR
,,,,GBR,
,,,,,GBR
,,,,GBR,
,,,,GBR,
25 Farringdon Street,LONDON,UNITED KINGDOM-NA,EC4A 4AB,GBR,GBR
100 Wigmore St,London,X0,,GBR,GBR
,,,,GBR,

这是前25行,打印到_part1

GBR,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR
,GBR
GBR,GBR
GBR,GBR
,GBR
GBR,GBR
GBR,GBR
GBR,
,GBR
,GBR
GBR,GBR
GBR,
,GBR
GBR,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR

我什至去掉了代码,只打印了每一行,并且一直在这样做

回答如下:

这不是理想的方法,但基本上这是您的代码将其分成漂亮的小块。您应该使用csv-parse库而不是使用csv-parser,并在每次循环迭代时更新文件引用。正如其他人提到的,split Unix函数将是一个不错的选择。我用file.csv

本文标签: 我如何为原始文件中的每50000行打印出一个新文件