Help - Search - Members - Calendar
Full Version: Gear table extractor/converters
Dumpshock Forums > Discussion > Community Projects
Blade
Ok, it's finally done.
You'll find everything here

SR4Light can be used to convert the PDF to HTML files. More info here.
Extractor.pl is used to convert the gear tables in the HTML files to XML files. It works with Augmentation but doesn't work with Arsenal (because most of the gear tables are images rather than text).
Gear-1.xml is the XML file extracted from Augmentation.
XMLToDae is used to convert the XML files into Daegann's Character Generator's .Dae files.

To use XMLToDae you'll need a Java Virtual Machine on your computer, chances are you already have one. If you can run the XMLToDae.jar file, you have one.

Once inside the application, choose File->Open and choose the XML file you want to convert.
A popup window should appear. It appears each time there is a new category of item (GlobalType in the XML file) to ask you what kind of items are in this category in order to export the items to the right .dae file. For example in Augmentation, the first one is Cyberware, so you'll need to choose Cyberware.

You can choose to Export or Skip this category.

If you choose Export, you'll be back in the main window, where the details of each item is displayed. You can freely modify the fields. Once you're done press "Export Item" to export the item or "Skip item" to ignore it. Once you've exported (or skipped) all the items of a category, the data is saved in the .dae file according to your choice.
If you already have a .dae file of the same name in the directory where XMLToDae.jar is located, the item will be appended at the end of the file, if not a new file will be created.
If you quit before finishing a category, nothing will be saved.
Known bug: when starting a new category, the program will show you all the items you've already exported before the items of the new category. So right now the only way to do it is to convert a category, then close the program, start it again and skip the category you've already converted. I'll try to fix that if it's not too hard.

All items inside the same "GlobalType" in the XML file will be exported to the same .dae file. If you want some items to be exported to another .dae file, you'll have to move them inside the XML file.

I guess that covers it. If you've got any question feel free to ask. The source code is included in the zip file if anyone's interested. Feel free to do whatever you want with it.

Original post:

QUOTE
Recently I wanted to create a new character using Daegann's character generator, but the generator lacked Augmentation and Arsenal's gear.
I remember spending a lot of time manually adding most of SR4's BBB gear and I didn't want to do it again.
Then I realized that my SR PDF to HTML converter I could access the tables data in a format I could automatically parse to extract the data.

So I decided to program a little script to convert the HTML pages with gear tables into XML files with all the gear data neatly stored.
Looks like it's working well. So far, I've got it working with Augmentation's tables. Arsenal is a little bit more of a problem, as nearly 2/3 of the text in the tables can't be recovered. I'll try to ask Adam about it.

Some of the data will still need some manual processing: for example the items where the cost field is "rating x n:nuyen:" and the data doesn't include the item description (I can probably manage to add it though), but it's still far less work for those who want to add the gear to their character generators (or other SR4 program).

If anyone is interested, I'll upload it (I just have a few things to fix first). Next step will be to program a .XML<->.DAE (daegann's generator data files) converter, which shouldn't be too hard to pull.
Cadmus
sounds cool
Blade
Ok, the script is finished.
It works with Augmentation after a few adjustments to the HTML files.

So I've got a nice XML file, nearly 3000 lines long, with all the data from Augmentation's gear tables inside. I don't know if I'm allowed to post it here.
sloejack
QUOTE (Blade @ Mar 8 2008, 10:13 AM) *
Ok, the script is finished.
It works with Augmentation after a few adjustments to the HTML files.

So I've got a nice XML file, nearly 3000 lines long, with all the data from Augmentation's gear tables inside. I don't know if I'm allowed to post it here.


Would you be willing to share your script?
Blade
Sure, it's a perl script, meant to be applied on the HTML files you get from SR4Light.
You'll need to manually edit the html files first to fix some problems, essentially troubles with columns alignment.
For example you'll need to change lines like this:
CODE
</span></nobr></DIV><DIV style="position:absolute;top:706;left:522;"<nobr><span class="ft5">(Rating x 6)FRating x 30,000¥

to
CODE
</span></nobr></DIV><DIV style="position:absolute;top:706;left:522;"<nobr><span class="ft5">(Rating x 6)F
</span></nobr></DIV><DIV style="position:absolute;top:706;left:670;"<nobr><span class="ft5">Rating x 30,000¥


(changing the "left" attribute shouldn't be necessary in most cases)

The script also doesn't support the case where an item has base attributes and "sub-items" with modifiers. For instance, you have to manually remove the base attributes of the altskin and insert them into ech of the sub-items.

You might also need to remove some lines which aren't really useful and can mess up with the rest.

The best way to fix your html file is just to run the script, look at the xml output file for problems, find what causes this problem in the html and fix it.

Here is the script:

CODE
#TODO: Handle the case where an item has both base attributes and subitems
#        Replace baaaad global variables with good local variables.

my $intype = 0;
my $incategory = 0;
my $begintype = 0;
my $page = 167;
my $calcul = 0;
$numattributes = -1;
$currentattribute = -1;

    #For getting the value of a rated attribute
sub getRating {
    my $rating = $_[0];
    my $Formula =  $_[1];
    my $type = 0;
    
    if ($Formula !~ /Rating/i) {
        return $Formula;
    }
    
        # Yes, I know, doing this each time for the same formula isn't an optimized way to do it. I don't care.
    if ($Formula =~ /¥/) {
        $type=1;
    }
    elsif ($Formula =~ /\[(.+)\]/) {
        $type=2;
    }
    elsif ($Formula =~ /\(.*\)(R|F|-)/) {
        $type=$1;
    }

    $Formula =~ s/¥//;
    $Formula =~ s/\(//;
    $Formula =~ s/\)//;
    $Formula =~ s/,//;
    $Formula =~ s/ //;
    $Formula =~ s/Rating//;
    $Formula =~ s/\[//;
    $Formula =~ s/\]//;
    if ($Formula =~ /x/) {
        $Formula =~ s/x//;
        $value = $rating * $Formula;
    }
    elsif ($Formula =~ /\+/) {
        $Formula =~ s/\+//;
        $value = $rating + $Formula;
    }
    else {
        $value = $rating;
    }
    if ($type==0) {    return $value; }
    elsif ($type==1) { return $value."¥"; }
    elsif ($type==2) { return "\[".$value."\]"; }
    else { return "(".$value.")".$type; }
}

    #For creating one entry per rating of an item
sub exportRating() {

    if ($initem == 1) {
        $itemname =~ /Rating.*([0-9]+).([0-9]+)/;
        $minrating = $1;
        $maxrating = $2;
        $itemname =~ s/\(Rating.*//;
    }
    else {
        $subitemname =~ /Rating.*([0-9]+).([0-9]+)/;
        $minrating = $1;
        $maxrating = $2;
        $subitemname =~ s/\(Rating.*//;
    }

    for ($i=$minrating;$i<=$maxrating;$i++) {
        if ($initem == 1) {
            print FO "<item name=\"".$itemname."(Rating ".$i.")\">\n";
        }
        else {
            print FO "<subitem name=\"".$subitemname."(Rating ".$i.")\">\n";
        }
        for ($j=0;$j<$numattributes;$j++) {
            print FO "<attribute name=\"".$Attribute[$j]."\">".&getRating($i, $attributevalue[$j])."<\/attribute>\n";
        }
        if ($initem == 1) {
            print FO "<\/item>\n";
        }
        else {
            print FO "<\/subitem>\n";
        }
    }
    $hasRating = 0;
}

open (FI, "<" . "dAug-167.html");
open (FO, ">" . "Gear-1.xml");

while ($calcul !~ /y/i && $calcul !~ /n/i) {
    print "Do you want rating calculations? (Y/N)";
    $calcul = <STDIN>
}

print FO "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n";
print FO "<Book name=\"Augmentation\">";

do {
    $ligne = <FI>;
        while ($ligne) {
            if($ligne =~ /^<\/span>/) {
                chomp($ligne);

                    #Fix for the last item of each page
                $ligne =~ s/<\/DIV>$//;
                    
                    # Fix for Arsenal Pages 168 to 170 (included) , 174, 175, 177 and 178
                if (($page > 167 && $page < 171) || $page == 174 || $page == 175 || $page == 177 || $page == 178) {
                    $ligne =~ /"ft(.)/;
                    $mod = $1+1;
                        # Fix for new types in these pages
                    if ($mod == 6) {
                        $mod = 3;
                    }
                    $ligne =~ s/"ft./"ft$mod/;
                }
                    # Fix for Arsenal Pages 171 and 172
                elsif ($page == 171 || $page == 172) {
                    $ligne =~ s/.*"ft3.*//;
                }
                    
                    # Global type (cyberware, bioware...)
                if ($ligne =~ /ft3/) {
                    $typename = substr($ligne, rindex($ligne,">")+1);
                    if ($begintype != 0) {
                        if ($initem == 2) {
                            if ($hasRating == 1) {
                                &exportRating();
                            }
                            else {
                                print FO "<\/subitem>\n";
                            }
                            $initem = 0;
                        }
                        if ($hasRating == 1) {
                                &exportRating();
                            }
                            else {
                                print FO "<\/item>\n";
                            }
                        print FO "<\/category> \n";
                        print FO "<\/GlobalType> \n";
                    }
                    else {
                        $begintype = 1;
                    }
                    print FO "<GlobalType name=\"".$typename."\"> \n";
                    $intype=1;
                }
                
                    # Category (headware, bodyware...) or Attributes
                elsif ($ligne =~ /ft4/) {
                        # Category
                    if ($incategory==0) {
                        $categoryname = substr($ligne, rindex($ligne,">")+1);
                        if ($intype==1) {
                            $intype=0;
                        }
                        else {
                            if ($initem == 2) {
                                if ($hasRating == 1) {
                                    &exportRating();
                                }
                                else {
                                    print FO "<\/subitem>\n";
                                }
                                $initem = 0;
                            }
                            if ($hasRating == 1) {
                                &exportRating();
                            }
                            else {
                                print FO "<\/item>\n";
                            }
                            print FO "<\/category> \n";
                        }
                        print FO "<category name=\"".$categoryname."\"> \n";
                        $incategory=1;
                        $currentattribute = 0;
                        $numattributes=-1;
                    }
                        # Attributes
                    else {
                        $Attribute[$currentattribute] = substr($ligne, rindex($ligne,">")+1);
                        $Attribute[$currentattribute] =~ s/ /_/g;
                        $currentattribute++;
                    }
                }
                    # Items and Attributes
                elsif ($ligne =~ /ft5/) {
                
                        #Check if we're out of an item with subitem
                    if ($initem == 2) {
                        $ligne =~ /left:(.*);/;
                        if ($left == $1) {
                            if ($hasRating == 1) {
                                &exportRating();
                            }
                            else {
                                print FO "<\/subitem>\n";
                            }
                            $initem = 0;
                        }
                    }
                
                        # New item
                    if ($initem != 2 && ($currentattribute == -1 || $numattributes == -1)) {
                            #First Item
                        if ($incategory==1) {
                            $incategory = 0;
                            $numattributes = $currentattribute;
                            $currentattribute=-1;
                            $hasRating = 0;
                        }
                        else {
                            if ($hasRating == 1) {
                                &exportRating();
                            }
                            else {
                                print FO "<\/item>\n";
                            }
                        }
                        $ligne =~ /left:(.*);/;
                        $left = $1;
                        $itemname = substr($ligne, rindex($ligne,">")+1);
                        $initem = 1;
                        if ($itemname =~ /\(Rating/ && $calcul =~ /y/i) {
                            $hasRating = 1;
                        }
                        else {
                            $hasRating = 0;
                            print FO "<item name=\"".$itemname."\"> \n";
                        }
                    }
                        # Attribute
                    else {
                            #Check if it's an attribute or a subclass of the item
                        $ligne =~ /left:(.*);/;
                            #Subclass
                        if ($1 < $left+30) {
                            if ($initem == 1) {
                                $initem = 2;
                            }
                            else {
                                if ($hasRating == 1) {
                                    &exportRating();
                                }
                                else {
                                    print FO "<\/subitem>\n";
                                }
                            }
                            $subitemname = substr($ligne, rindex($ligne,">")+1);
                            if ($subitemname =~ /\(Rating/ && $calcul =~ /y/i) {
                                $hasRating = 1;
                            }
                            else {
                                $hasRating = 0;
                                print FO "<subitem name=\"".$subitemname."\">\n";
                            }
                            $currentattribute = -1;
                        }
                            #Attribute
                        else {
                            $attributevalue[$currentattribute] = substr($ligne, rindex($ligne,">")+1);
                            if ($hasRating == 0) {
                                print FO "<attribute name=\"".$Attribute[$currentattribute]."\">".$attributevalue[$currentattribute]."<\/attribute>\n";
                            }
                        }
                    }
                    $currentattribute++;
                    
                    if ($currentattribute >= $numattributes) { $currentattribute = -1; }
                }
                
            }

            $ligne = <FI>;
        }
        $page++;
        #$initem = 0;
        #$incategory = 0;
        
    } while (open(FI,"<"."dAug-".$page.".html"));
    print FO "<\/item>\n";
    print FO "<\/category>\n";
    print FO "<\/GlobalType>\n";
    print FO "</Book>";
Blade
Updated ! (Post above has been updated with latest script)

* Now use a generic "attribute" tag instead of a tag per attribute, so that a schema can be easily set (and maybe easier parsing with some languages/librairies).
* Option to export items/subitems with Rating into different items, with automatic attribute calculation!
* Known minor bug: you'll need to correct the "Retinal Adjusters" entry to get a correct XML file. Just replace <subitem with <item and remove the </subitem after the entry.

---

To do/status:
* I've reverse engineered Daegann's Character Generator's .dae files, so I should be able to export all Augmentation's gear into Daegann's Character Generator.
* Waiting for Adam's answer for Arsenal's tables and public distribution of the XML files.
Blade
Updated ! (First post has been updated)

* XML To Dae converter done.
Dumori
If you've made the full updated Aug and Arsenal .dat files it would be a good Idea to post a link to them in the DnCrg SR4 Character Generator (Early Dev) thread and/or in this one to save people the time of doing whats already been done.
Blade
I don't have a xml to dat converter. Right now, it's only for Daegann's chargen.
I also don't have anything for Arsenal.
Dumori
that's what I meant. It would still be useful posting the aug updated files.
Blade
There's no Aug updated files for DnCrg, it's for Daegann's Character Generator which is another generator...
And I don't even have the file, just the converter.
Dumori
Oh by the way if you run the text reconisation tool in adobe acrobat 8 it will turn the text on the images in to proper text not done of on the full arsenal document but it looks like it could work.
Blade
Is it available in Reader?
Dumori
I have no idea ive pro if you send my scrpits for arsenal I could make the XML but I can't get my sr4 lite to work
Blade
Updated!

* Crude .Dae files with Augmentation's cyberware, bioware and other equipments are now available. There are just temporary fix until someone does a more serious conversion work. Nanotech isn't in yet (I'll work on it) and there are no descriptions.
* Corrected a few bugs in XMLToDae: it's now possible to open a xml file that's not in the same directory as the program and a bug with legality codes has been fixed. It's still not perfect though, check the "known bugs" in the first post for more information.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Dumpshock Forums © 2001-2012