¿Cómo puedo convertir un archivo .TXT en un archivo .CSV sin perder el contenido de ese documento?

Question

¿Cómo puedo convertir un archivo .TXT en un archivo .CSV sin perder el contenido de ese documento?

Preguntado el 30 de Junio, 2015: Cuando se hizo la pregunta
1477 visitas: Cuantas visitas ha tenido la pregunta
1 Respuestas: Cuantas respuestas ha tenido la pregunta
Resuelta: Estado actual de la pregunta

Llevo unos 4 días intentando completar la aparentemente sencilla actividad de convertir un volcado de texto del llavero de OS X a un .csv correctamente formateado sin ningún progreso y mucho menos éxito. He publicado una pregunta diferente sobre mi situación ayer, que ha pasado sin comentarios ni sugerencias.

Simplemente estoy tratando de convertir un archivo .txt a un archivo .csv formateado, sin perder nada del contenido del documento original en el proceso.

Lo que ya he probado:

(1) Gist de RWest y Ben-Oni y christophervalles variaciones del mismo. El ingrediente principal de este script es un archivo keychain.rb y creo que esto funcionaría si sólo supiera cómo añadir campos de entrada para que coincidan con todos los campos del documento original. Sin embargo, no conozco la fórmula y no he podido encontrar una plantilla. Así, al ejecutar este script resulta en cientos de entradas perdidas porque las entradas carecen de un campo de salida correspondiente.

#!/usr/bin/env ruby                                                                                                                                                                                                          
#                                                                                                                                                                                                                            
# Usage:                                                                                                                                                                                                                     
#   security dump-keychain -d login.keychain > keychain_logins.txt                                                                                                                                                           
#   # Lots of clicking 'Always Allow', or just 'Allow', until it's done...                                                                                                                                                   
#   curl -O curl -O https://raw.github.com/gist/1224792/06fff24412311714ad6534ab700a7d603c0a56c9/keychain.rb
#   chmod a+x ./keychain.rb                                                                                                                                                                                                  
#   ./keychain.rb keychain_logins.txt | sort > logins.csv                                                                                                                                                                    
#                                                                                                                                                                                                                            
# Then import logins.csv in 1Password using the format:                                                                                                                                                                      
# Title, URL/Location, Username, Password                                                                                                                                                                                    
# Remember to check 'Fields are quoted', and the Delimiter character of 'Comma'.                                                                                                                                             
require 'date'

class KeychainEntry
  attr_accessor :fields

  def initialize(keychain)
    last_key = nil
    @fields = {}
    data = nil
    aggregate = nil
    lines = keychain.split("\n")
    lines.each do |line|
      # Everything after the 'data:' statement is data.

      if data != nil
        data << line
      elsif aggregate != nil
        if ( line[0] == 32 || line[0] == " " )
          keyvalue = line.split('=', 2).collect { |kv| kv.strip }
          aggregate[keyvalue.first] = keyvalue.last
        else
          @fields[last_key] = aggregate
          aggregate = nil
        end
      end

      if aggregate == nil
        parts = line.split(':').collect { |piece| piece.strip }
        if parts.length > 1
          @fields[parts.first] = parts.last
        else
          last_key = parts.first
          data = [] if parts.first == "data"
          aggregate = {}
        end
      end
    end
    @fields["data"] = data.join(" ") if data
  end
end

def q(string)
  "\"#{string}\""
end

def process_entry(entry_string)
  entry = KeychainEntry.new(entry_string)

  if entry.fields['class'] == '"genp"'
    title = entry.fields['attributes']['0x00000007 <blob>'].gsub!('"', '')
    user = entry.fields['attributes']['"acct"<blob>'].gsub!('"', '')
    location = entry.fields['attributes']['"svce"<blob>'].gsub!('"', '')
    pass = entry.fields['data'][1..-2]
    puts "\"#{title}\"\t\"#{location}\"\t\"#{user}\"\t\"#{pass}\""
  end

  if entry.fields['class'] == '"inet"' && ['"form"', '"dflt"'].include?(entry.fields['attributes']['"atyp"<blob>'])
    site = entry.fields['attributes']['"srvr"<blob>'].gsub!('"', '')
    path = entry.fields['attributes']['"path"<blob>'].gsub!('"', '')
    proto= entry.fields['attributes']['"ptcl"<uint32>'].gsub!('"', '')
    proto.gsub!('htps', 'https');
    user = entry.fields['attributes']['"acct"<blob>'].gsub!('"', '')
    #user = entry.fields['attributes']['0x00000007 <blob>'].gsub!('"', '')
    date_string = entry.fields['attributes']['"mdat"<timedate>'].gsub(/0x[^ ]+[ ]+/, '').gsub!('"', '')
    date = DateTime.parse(date_string)
    pass = entry.fields['data'][1..-2]
    path = '' if path == '<NULL>'
    url = "#{proto}://#{site}#{path}"

    puts "#{site}\t#{url}\t#{user}\t#{pass}\t#{date}"
    #puts "#{user}\t #{pass}\t #{date}"
  end
end

accum = ''
ARGF.each_line do |line|
  if line =~ /^keychain: /
    unless accum.empty?
      process_entry(accum)
      accum = ''
    end
  end
  accum += line
end

(2) Gist de RMondello que en cierto modo me funcionó una vez, aunque faltaban entradas, pero que no he podido hacer funcionar de nuevo sin ninguna razón explicable.

set the logFile to ((path to desktop) as string) & "Passwords"
set keychainPath to "/Users/Dad/Desktop/dad.keychain"

-- write_to_file taken from http://www.macosxautomation.com/applescript/sbrt/sbrt-09.html
on write_to_file(this_data, target_file, append_data)
    try
        set the target_file to the target_file as string
        set the open_target_file to open for access file target_file with write permission
        if append_data is false then set eof of the open_target_file to 0
        write this_data to the open_target_file starting at eof
        close access the open_target_file
        return true
    on error
        try
            close access file target_file
        end try
        return false
    end try
end write_to_file

tell application "Usable Keychain Scripting"
    set keychainItems to get every keychain item of keychain keychainPath
    repeat with keychainItem in keychainItems
        set aServer to server in keychainItem
        set anAccount to account in keychainItem
        set aPassword to password in keychainItem

        set csvEntry to aServer & "," & anAccount & "," & aPassword & "
"

        my write_to_file(csvEntry, logFile, true)
    end repeat
end tell

(3) El convertidor de llaveros a CSV de AgileBits OS X que ha dado lugar a archivos .csv con aún más entradas perdidas que el Gist de RWest.

# OS X Keychain text export converter
#
# Copyright 2014 Mike Cappella (mike@cappella.us)

package Converters::Keychain 1.02;

our @ISA    = qw(Exporter);
our @EXPORT     = qw(do_init do_import do_export);
our @EXPORT_OK  = qw();

use v5.14;
use utf8;
use strict;
use warnings;
#use diagnostics;

binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";

use Encode;
use Utils::PIF;
use Utils::Utils qw(verbose debug bail pluralize myjoin unfold_and_chop print_record);
use Time::Local qw(timelocal);
use Time::Piece;

my $max_password_length = 50;

my %card_field_specs = (
    login =>            { textname => undef, fields => [
    [ 'username',       0, qr/^username$/ ],
    [ 'password',       0, qr/^password$/ ],
    [ 'url',        0, qr/^url$/ ],
    ]},
    note =>         { textname => undef, fields => [
    ]},
);

my (%entry, $itype);

# The following table drives transformations or actions for an entry's attributes, or the class or
# data section (all are collected into a single hash).  Each ruleset is evaluated in order, as are
# each of the rules within a set.  The key 'c' points to a code reference, which is passed the data
# value for the given type being tested.  It can transform the value in place, or simply test it and
# return a string (for debug output).  When the key 'action' is set to 'SKIP', the entry being tested
# will be rejected from consideration for export when the 'c' code reference returns a TRUE value.
# And in that case, the code ref pointed to by 'msg' will be run to produce debug output, used to
# indicate the reason for the rejection.
#
# The table facilitates adding new transformations and rejection rules, as necessary,
# through empirical discover based on user feedback.
my @rules = (
    CLASS => [
        { c => sub { $_[0] !~ /^inet|genp$/ }, action => 'SKIP', msg => sub { debug "\tskipping non-password class: ", $_[0] } },
    ],
    svce => [
        { c => sub { $_[0] =~ s/^0x([A-F\d]+)\s+".*"$/pack "H*", $1/ge } },
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/ } },
        { c => sub { $_[0] =~ /^Apple Persistent State Encryption$/ or 
                 $_[0] =~ /^Preview Signature Privacy$/ or
                 $_[0] =~ /^Safari Session State Key$/ or
                 $_[0] =~ /^Call History User Data Key$/}, action => 'SKIP',
            msg => sub { debug "\t\tskipping non-password record: $entry{'CLASS'}: ", $_[0] } },
    ],
    srvr => [
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/ } },
        { c => sub { $_[0] =~ s/\.((?:_afpovertcp|_smb)\._tcp\.)?local// } },
    ],
    path => [
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/ } },
        { c => sub { $_[0] =~ s/^<NULL>$// } },
    ],
    ptcl => [
        { c => sub { $_[0] =~ s/htps/https/ } },
        { c => sub { $_[0] =~ s/^"(\S+)\s*"$/$1/ } },
    ],
    acct => [
        { c => sub { $_[0] =~ s/^0x([A-F\d]+)\s+".*"$/pack "H*", $1/ge } },
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/ } },
    ],
    mdat => [
        { c => sub { $_[0] =~ s/^0x\S+\s+"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})Z.+"$/$1-$2-$3 $4:$5:$6/g } },
    ],
    cdat => [
        { c => sub { $_[0] =~ s/^0x\S+\s+"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})Z.+"$/$1-$2-$3 $4:$5:$6/g } },
    ],
    # desc must come before DATA so that 'secure note' type can be used as a condition in DATA below
    desc => [
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/; $itype = 'note' if $_[0] eq 'secure note'; $_[0] } },
    ],
    DATA => [
        # secure note data, early terminates rule list testing
        { c => sub { $itype eq 'note' and $_[0] =~ s/^.*<key>NOTE<\/key>\\012\\011<string>(.+?)<\/string>.*$/$1/ }, action => 'BREAK',
            msg => sub { debug "\t\tskipping non-password record: $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },

        { c => sub { $_[0] !~ s/^"(.+)"$/$1/ }, action => 'SKIP',
            msg => sub { debug "\t\tskipping non-password record: $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },
        { c => sub { $_[0] =~ /^[A-Z\d]{8}-[A-Z\d]{4}-[A-Z\d]{4}-[A-Z\d]{4}-[A-Z\d]{12}$/ }, action => 'SKIP',
            msg => sub { debug "\t\tskipping non-password record: $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },
        { c => sub { length $_[0] > $max_password_length }, action => 'SKIP',
            msg => sub { debug "\t\tskipping record with improbably long password: $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },
        { c => sub { join '', "\trecord: class = $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },  # debug output only
    ],
);

$DB::single = 1;                    # triggers breakpoint when debugging

sub do_init {
    return {
    'specs'     => \%card_field_specs,
    'imptypes'      => undef,
    'opts'      => [ [ q{-m or --modified           # set item's last modified date },
                   'modified|m' ],
               ],
    };
}

sub do_import {
    my ($file, $imptypes) = @_;
    my (%Cards, %dup_check);
    my $contents = $_;;

    {
    local $/;
    open my $fh, '<:encoding(utf8)', $file or bail "Unable to open file: $file\n$!";
    $contents = <$fh>;
    close $fh;
    }

    my ($n, $examined, $skipped, $duplicates) = (1, 0, 0, 0);
    my ($npre_explode, $npost_explode);

KEYCHAIN_ENTRY:
    while ($contents) {
    if ($contents =~ s/\Akeychain: (.*?)\n+(?=$|^keychain: ")//ms) {
        local $_ = $1; my $orig = $1;
        $itype = 'login';

        $examined++;
        debug "Entry ", $examined;

        s/\A"(.*?)"\n^(.+)/$2/ms;
        my $keychain = $1;
        #debug 'Keychain: ', $keychain;

        s/\Aclass: "?(.*?)"? ?\n//ms;
        my $class = $1;

        # attributes
        s/\Aattributes:\n(.*?)(?=^data:)//ms;
        %entry = map { clean_attr_name(split /=/, $_) } split /\n\s*/, $1 =~ s/^\s+//r;

        $entry{'CLASS'} = $class;

        # data
        s/\Adata:\n(.+)\z//ms;
        $entry{'DATA'}  = defined $1 ? $1 : '';

        # run the rules in the rule set above
        # for each set of rules for an entry key...
RULE:
        for (my $i = 0;  $i < @rules; $i += 2) {
        my ($key, $ruleset) = ($rules[$i], $rules[$i + 1]);

        debug "  considering rules for ", $key;
        next if not exists $entry{$key};

        # run the entry key's rules...
        my $rulenum = 1;
        for my $rule (@$ruleset) {
            debug "\t    rule $rulenum: called with ", unfold_and_chop $entry{$key};

            my $ret = ($rule->{'c'})->($entry{$key});

            debug "\t    rule $rulenum: returns ", $ret || 0, '   ', unfold_and_chop $entry{$key};

            if (exists $rule->{'action'}) {
            if ($ret) {
                if ($rule->{'action'} eq 'SKIP') {
                $skipped++;
                ($rule->{'msg'})->($entry{$key})        if exists $rule->{'msg'};
                next KEYCHAIN_ENTRY;
                }
                elsif ($rule->{'action'} eq 'BREAK') {
                debug "\t    breaking out of rule chain";
                next RULE;
                }
            }
            }

            $rulenum++;
        }
        }

        for (keys %entry) {
        debug sprintf "\t    %-12s : %s", $_, $entry{$_}    if exists $entry{$_};
        }

        #my $itype = find_card_type(\%entry);

        my %h;
        my ($notes, $card_modified);
        if ($itype eq 'login') {
        $h{'password'}  = $entry{'DATA'};
        $h{'username'}  = $entry{'acct'}                        if exists $entry{'acct'};
        $h{'url'}   = $entry{'ptcl'} . '://' . $entry{'srvr'} . $entry{'path'}  if exists $entry{'srvr'};
        }
        elsif ($itype eq 'note') {
        # convert ascii string DATA, which contains \### octal escapes, into UTF-8
        my $octets = encode("ascii", $entry{'DATA'});
        $octets =~ s/\\(\d{3})/"qq|\\$1|"/eeg;
        $notes = decode("UTF-8", $octets);
        }
        else {
        die "Unexpected itype: $itype";
        }

        # will be added to notes
        $h{'protocol'}  = $entry{'ptcl'}                    if exists $entry{'ptcl'} and $entry{'ptcl'} =~ /^afp|smb$/;
        $h{'created'}   = $entry{'cdat'}                    if exists $entry{'cdat'};
        if (exists $entry{'mdat'}) {
        if ($main::opts{'modified'}) {
            $card_modified = date2epoch($entry{'mdat'});
        }
        else {
            $h{'modified'}  = $entry{'mdat'};
        }
        }

        for (keys %h) {
        debug sprintf "\t    %-12s : %s", $_, $h{$_}                if exists $h{$_};
        }

        # don't set/use $sv before $entry{''svce'} is removed of _afp*, _smb*, and .local, since it defeats dup detection
        my $sv = $entry{'svce'} // $entry{'srvr'};

        my $s = join ':::', 'sv', $sv,
        map { exists $h{$_} ? "$_ => $h{$_}" : 'URL => none' } qw/url username password/;

        if (exists $dup_check{$s}) {
        debug "  *skipping duplicate entry for ", $sv;
        $duplicates++;
        next
        }
        $dup_check{$s}++;

        # From the card input, place it in the converter-normal format.
        # The card input will have matched fields removed, leaving only unmatched input to be processed later.
        my $normalized = normalize_card_data($itype, \%h, 
        { title     => $sv,
          tags      => undef,
          notes     => $notes,
          folder    => undef,
          modified  => $card_modified });

        # Returns list of 1 or more card/type hashes; one input card may explode into multiple output cards
        my $cardlist = explode_normalized($itype, $normalized);

        my @k = keys %$cardlist;
        if (@k > 1) {
        $npre_explode++; $npost_explode += @k;
        debug "\tcard type $itype expanded into ", scalar @k, " cards of type @k"
        }
        for (@k) {
        print_record($cardlist->{$_});
        push @{$Cards{$_}}, $cardlist->{$_};
        }
        $n++;
    }
    else {
        bail "Keychain parse failed, after entry $examined; unexpected: ", substr $contents, 0, 2000;
    }
    }

    $n--;
    verbose "Examined $examined record", pluralize($examined);
    verbose "Skipped $skipped non-login record", pluralize($skipped);
    verbose "Skipped $duplicates duplicate record", pluralize($duplicates);

    verbose "Imported $n record", pluralize($n) ,
    $npre_explode ? " ($npre_explode card" . pluralize($npre_explode) .  " expanded to $npost_explode cards)" : "";
    return \%Cards;
}

sub do_export {
    create_pif_file(@_);
}

sub find_card_type {
    my $eref = shift;

    my $type = (exists $eref->{'desc'} and $eref->{'desc'} eq 'secure note') ? 'note' : 'login';
    debug "\t\ttype set to '$type'";
    return $type;
}

# Places card data into a normalized internal form.
#
# Basic card data passed as $norm_cards hash ref:
#    title
#    notes
#    tags
#    folder
#    modified
# Per-field data hash {
#    inkey  => imported field name
#    value  => field value after callback processing
#    valueorig  => original field value
#    outkey => exported field name
#    outtype    => field's output type (may be different than card's output type)
#    keep   => keep inkey:valueorig pair can be placed in notes
#    to_title   => append title with a value from the narmalized card
# }
sub normalize_card_data {
    my ($type, $carddata, $norm_cards) = @_;

    for my $def (@{$card_field_specs{$type}{'fields'}}) {
    my $h = {};
    for my $key (keys %$carddata) {
        if ($key =~ /$def->[2]/) {
        next if not defined $carddata->{$key} or $carddata->{$key} eq '';
        my ($inkey, $value) = ($key, $carddata->{$key});
        my $origvalue = $value;

        if (exists $def->[3] and exists $def->[3]{'func'}) {
            #         callback(value, outkey)
            my $ret = ($def->[3]{'func'})->($value, $def->[0]);
            $value = $ret   if defined $ret;
        }
        $h->{'inkey'}       = $inkey;
        $h->{'value'}       = $value;
        $h->{'valueorig'}   = $origvalue;
        $h->{'outkey'}      = $def->[0];
        $h->{'outtype'}     = $def->[3]{'type_out'} || $card_field_specs{$type}{'type_out'} || $type; 
        $h->{'keep'}        = $def->[3]{'keep'} // 0;
        $h->{'to_title'}    = ' - ' . $h->{$def->[3]{'to_title'}}   if $def->[3]{'to_title'};
        push @{$norm_cards->{'fields'}}, $h;
        delete $carddata->{$key};
        }
    }
    }

    # map remaining keys to notes
    $norm_cards->{'notes'} .= "\n"        if defined $norm_cards->{'notes'} and length $norm_cards->{'notes'} > 0 and keys %$carddata;
    for my $key (keys %$carddata) {
    $norm_cards->{'notes'} .= "\n"  if defined $norm_cards->{'notes'} and length $norm_cards->{'notes'} > 0;
    $norm_cards->{'notes'} .= join ': ', $key, $carddata->{$key};
    }

    return $norm_cards;
}

# sort logins as the last to check
sub by_test_order {
    return  1 if $a eq 'login';
    return -1 if $b eq 'login';
    $a cmp $b;
}

sub clean_attr_name {
    return ($_[0] =~ /"?([^<"]+)"?<\w+>$/, $_[1]);
}

# Date converters
# LastModificationTime field:    yyyy-mm-dd hh:mm:ss
sub parse_date_string {
    local $_ = $_[0];
    my $when = $_[1] || 0;                  # -1 = past only, 0 = assume this century, 1 = future only, 2 = 50-yr moving window

    if (my $t = Time::Piece->strptime($_, "%Y-%m-%d %H:%M:%S")) {
    return $t;
    }

    return undef;
}

sub date2epoch {
    my $t = parse_date_string @_;
    return defined $t->year ? 0 + timelocal($t->sec, $t->minute, $t->hour, $t->mday, $t->mon - 1, $t->year): $_[0];
}

1;

(4) Navaja suiza parecía prometedor, ya que tiene un comando incorporado para convertir texto separado por tabulaciones en texto separado por comas. Sin embargo, la ejecución del comando dio lugar a docenas y docenas y docenas de errores de "formato incorrecto". Supongo que mi archivo .txt tendría que ser un archivo .tsv para que el comando funcione realmente y no he podido encontrar ninguna forma de convertir un .txt en un .tsv.

Así que aquí es donde estoy ahora, días después, en el mismo lugar donde empecé. ¿Puede alguien, por favor, decirme cómo alterar los campos de entrada del primer script de RWest para que todo el contenido pase al .csv o sugerir un método alternativo, script, aplicación, danza de la lluvia, etc. que me permita simplemente que todos los datos del archivo de texto aparezcan en un archivo csv, con las columnas y filas adecuadas separando cada entrada?

EDIT: Esto es lo que se obtiene cuando sólo se cambia la extensión del archivo: enter image description here

Preguntado el 30 de Junio, 2015 por Locutus

Answer 1

1 Respuestas

Answer 2

2voto

CousinCocaine Puntos 3615

Simplemente estoy tratando de convertir un archivo .txt a un .csv, sin perder el contenido de ese documento

Sólo tienes que cambiar la extensión .txt en .csv .

Los archivos de "valores separados por comas" son simplemente archivos de texto sin formato con las columnas de datos separadas por comas (o ; o tab ). Los archivos 'txt' están pensados para el texto, y los archivos 'csv' están pensados para el almacenamiento de datos, pero ambos son sólo archivos de texto.

Añade: Los datos parecen organizados como un tipo de diccionario (o como un array multidimensional). Un diccionario no puede convertirse en una estructura de marco de datos 2D (como csv o excel ) ya que puede tener más dimensiones que 2. Lo que quieres no es posible. Pero sustituir csv por txt hace que sea legible sin embargo...

Este es un texto ilegible: @1?H????A?!A??-m??I??u?

Respondido el 30 de Junio, 2015 por CousinCocaine (3615 Puntos )

¿Cómo puedo convertir un archivo .TXT en un archivo .CSV sin perder el contenido de ese documento?

Respuesta

Preguntas Destacadas

Etiquetas mas usadas

AppleAyuda.com

Powered by:

¿Cómo puedo convertir un archivo .TXT en un archivo .CSV sin perder el contenido de ese documento?

Respuesta

Preguntas relacionadas

Preguntas Destacadas

Etiquetas mas usadas

En nuestra red

AppleAyuda.com

Powered by: