awk to compare fields of multiple lines and combine - Stack Overflow-软件玩家

admin管理员组
文章数量:1430083

I have data in file with pattern as shown below -

  TRSSOEC TRBSOEC      ADICOES                        DELETE            TROEC
  TRSSOEC TRBSOEC      ADICOES                        INSERT            TROEC
  TRSSOEC TRBSOEC      ADICOES                        SELECT            TROEC
  SOBCXYZ SCHEMA1      AGENTES_CARGA                  INSERT            TROEC
  SOBCXYZ SCHEMA1      AGENTES_CARGA                  SELECT            TROEC
  YASU999 ASDF123      BS_ADICOES_SISCOMEX            UPDATE            TRBSOEC
  USER123 WAREHOU      BS_ADICOES_TRIBUTOS_BKP        DELETE            TRBSOEC
  USER123 WAREHOU      BS_ADICOES_TRIBUTOS_BKP        INSERT            TRBSOEC

I would like to compare lines with field 1,2,3 and if they are same then capture field 4 to diplay as shown below -

grant DELETE,INSERT,SELECT on TRBSOEC.ADICOES to TRSSOEC;
grant INSERT,SELECT on SCHEMA1.AGENTES_CARGA to SOBCXYZ;
grant UPDATE on ASDF123.BS_ADICOES_SISCOMEX to YASU999;
grant DELETE,INSERT on WAREHOU.BS_ADICOES_TRIBUTOS_BKP to USER123;

Basically logic is to group based on field 1,2,3 and print distinct field 4 along with other common fields. grant <comma separated array of filed 4> on filed2.field3 to field1 ;

So far I tried below but not sure how to capture field 4, any guidance would be helpful.

awk '
    # Each line
    {
        if ($1 == prev[1] && $2 == prev[2] && $3 == prev[3])
        {
            # Capture and use field 4 of last match
            if(!nrMatched) { nrMatched = prev[4] }
            print "grant " $nrMatched " on " $2"."$3 " to " $1";"
        }
        else
        {
            # No match, reset matched flag and just copy record through
            nrMatched = 0
        }
    
        # Save fields from this line for next comparison
        split($0, prev)
    }
' test.txt

I have data in file with pattern as shown below -

  TRSSOEC TRBSOEC      ADICOES                        DELETE            TROEC
  TRSSOEC TRBSOEC      ADICOES                        INSERT            TROEC
  TRSSOEC TRBSOEC      ADICOES                        SELECT            TROEC
  SOBCXYZ SCHEMA1      AGENTES_CARGA                  INSERT            TROEC
  SOBCXYZ SCHEMA1      AGENTES_CARGA                  SELECT            TROEC
  YASU999 ASDF123      BS_ADICOES_SISCOMEX            UPDATE            TRBSOEC
  USER123 WAREHOU      BS_ADICOES_TRIBUTOS_BKP        DELETE            TRBSOEC
  USER123 WAREHOU      BS_ADICOES_TRIBUTOS_BKP        INSERT            TRBSOEC

I would like to compare lines with field 1,2,3 and if they are same then capture field 4 to diplay as shown below -

grant DELETE,INSERT,SELECT on TRBSOEC.ADICOES to TRSSOEC;
grant INSERT,SELECT on SCHEMA1.AGENTES_CARGA to SOBCXYZ;
grant UPDATE on ASDF123.BS_ADICOES_SISCOMEX to YASU999;
grant DELETE,INSERT on WAREHOU.BS_ADICOES_TRIBUTOS_BKP to USER123;

Basically logic is to group based on field 1,2,3 and print distinct field 4 along with other common fields. grant <comma separated array of filed 4> on filed2.field3 to field1 ;

So far I tried below but not sure how to capture field 4, any guidance would be helpful.

awk '
    # Each line
    {
        if ($1 == prev[1] && $2 == prev[2] && $3 == prev[3])
        {
            # Capture and use field 4 of last match
            if(!nrMatched) { nrMatched = prev[4] }
            print "grant " $nrMatched " on " $2"."$3 " to " $1";"
        }
        else
        {
            # No match, reset matched flag and just copy record through
            nrMatched = 0
        }
    
        # Save fields from this line for next comparison
        split($0, prev)
    }
' test.txt

Share Improve this question asked Nov 19, 2024 at 14:10 Yasser 777 bronze badges

is the input file guaranteed to be sorted by the first 3 fields? – markp-fuso Commented Nov 19, 2024 at 14:34
input file is NOT guaranteed to be sorted based on first 3 fields. – Yasser Commented Nov 19, 2024 at 15:32
I think @markp-fuso meant "grouped by" rather than "sorted by" in their question. Clearly the input is not "sorted" on the first 3 fields but is it grouped by them, i.e. will all lines with the same first-3-field values will be contiguous in the input? If not then please fix your example to reflect that reality as you're getting answers that assume your input IS grouped by the key fields since that's what your example shows and it's what the script you posted relies on. – Ed Morton Commented Nov 19, 2024 at 16:34
awk '{print "grant",$4,"on",$2 "." $3,"to",$1 ";"}' file – Cyrus Commented Nov 19, 2024 at 19:57
This could be a job for datamash (see my answer for an example), plus a bit of awk for the final formatting. Is datamash available? – Renaud Pacalet Commented Nov 20, 2024 at 7:18

Add a comment |

5 Answers 5

Sorted by: Reset to default 4

Using any awk and assuming the input is grouped by the first 3 fields as shown in the sample input and as the OPs code requires:

$ cat tst.awk
{ tgt = "on " $2 "." $3 " to " $1 ";" }
tgt != prev {
    if ( NR > 1 ) {
        print grant, prev
    }
    prev = tgt
    grant = "grant " $4
    next
}
{ grant = grant "," $4 }
END { print grant, prev }

$ awk -f tst.awk file
grant DELETE,INSERT,SELECT on TRBSOEC.ADICOES to TRSSOEC;
grant INSERT,SELECT on SCHEMA1.AGENTES_CARGA to SOBCXYZ;
grant UPDATE on ASDF123.BS_ADICOES_SISCOMEX to YASU999;
grant DELETE,INSERT on WAREHOU.BS_ADICOES_TRIBUTOS_BKP to USER123;

What I would do:

awk '
{
    key = $1"|"$2"|"$3
    if (!(key in ops)) {
        groups[key]=$1","$2","$3
    }
    if (!seen[key"|"$4]) {
        ops[key]=ops[key](ops[key] ? "," : "") $4
        seen[key"|"$4]=1
    }
}
END {
    for (key in groups) {
        split(groups[key], fields, ",")
        print "grant " ops[key] " on " fields[2] "." fields[3] " to " fields[1]";"
    }
}' file

If datamash is available we can use it for most of the work: sorting (-s option), grouping by fields 1, 2 and 3 (-g1,2,3), concatenating the (unique) privileges per group with a comma as separator (uniq 4)... Then we can use awk to do the final formating:

datamash -Wsg1,2,3 uniq 4 < test.txt |
awk '{print "grant " $4 " on " $2 "." $3 " to " $1 ";"}'

The -W option of datamash changes the default single tab separator for whitespace (one or more spaces and/or tabs), and ignores leading whitespaces.

If datamashis not available, and your file is not too large to fit in memory, with any awk, you can try:

awk '{ s = $2 "." $3 " to " $1 ";"
       a[s] = a[s] ? a[s] "," $4 : $4 }
END  { for(s in a) print "grant " a[s] " on " s }' test.txt

For each line we store the database.table to user; string in variable s, and we store the comma-separated list of privileges in a[s]. In the END block we print all grant commands.

If your file is really large this solution that stores everything before printing may exceed the available memory. If it is the case you can sort first:

sort -k1,4 test.txt | awk '
       { s = $2 "." $3 " to " $1 ";" }
s == t { p = p "," $4; next }
NR > 1 { print "grant " p " on " t }
       { t = s; p = $4 }
END    { print "grant " p " on " s }'

Overhauling OP's current code and taking into consideration the need to pre-sort the input file before feeding to awk:

awk '
function print_grant() {
    if (prev[1] != "")
       print "grant", perms, "on", prev[2] "." prev[3], "to", prev[1] ";"
    perms = ""
}

    { if ($1 != prev[1] || $2 != prev[2] || $3 != prev[3])  # if a different key then ...
         print_grant()                                      # print previous key to stdout
      perms = perms (perms=="" ? "" : ",") $4               # append field $4 to variable "perms"
      split($0,prev)                                        # save current fields in prev[] array
    }
END { print_grant() }                                       # flush last key to stdout
' <(sort test.txt)                                          # per comment from OP we cannot assume file is already sorted

This generates:

grant INSERT,SELECT on SCHEMA1.AGENTES_CARGA to SOBCXYZ;
grant DELETE,INSERT,SELECT on TRBSOEC.ADICOES to TRSSOEC;
grant DELETE,INSERT on WAREHOU.BS_ADICOES_TRIBUTOS_BKP to USER123;
grant UPDATE on ASDF123.BS_ADICOES_SISCOMEX to YASU999;

If OP needs the output sorted in a particular order we can either pipe this output to an additional sort call or we can modify the initial sorting of the input file, for example ...

Sorting input so that output is ordered by object name, schema name, grantee and then permissions we can change <(sort file.txt) to <(sort -k3,3 -k2,2 -k1,1 -k4,4 file.txt) to generate:

grant DELETE,INSERT,SELECT on TRBSOEC.ADICOES to TRSSOEC;
grant INSERT,SELECT on SCHEMA1.AGENTES_CARGA to SOBCXYZ;
grant UPDATE on ASDF123.BS_ADICOES_SISCOMEX to YASU999;
grant DELETE,INSERT on WAREHOU.BS_ADICOES_TRIBUTOS_BKP to USER123;

Here is a Ruby that allows that type of grouping:

ruby -e 'puts $<.map(&:split).
    group_by{|a| "#{a[1]}.#{a[2]} to #{a[0]}"}.
    map{|grp, op| [grp, op.map{|x| x[3]}] }.
    map{|grp, op| "grant #{op.join(",")} on #{grp};" }
' file

Prints:

grant DELETE,INSERT,SELECT on TRBSOEC.ADICOES to TRSSOEC;
grant INSERT,SELECT on SCHEMA1.AGENTES_CARGA to SOBCXYZ;
grant UPDATE on ASDF123.BS_ADICOES_SISCOMEX to YASU999;
grant DELETE,INSERT on WAREHOU.BS_ADICOES_TRIBUTOS_BKP to USER123;

本文标签： awk to compare fields of multiple lines and combineStack Overflow

版权声明：本文标题：awk to compare fields of multiple lines and combine - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1745556133a2663197.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

awk to compare fields of multiple lines and combine - Stack Overflow

5 Answers 5

更多相关文章