Current symbol scanning issues

Discussion in 'Dxbx Official Discussion' started by patrickvl, Dec 4, 2010.

  1. patrickvl

    patrickvl Emu author Emu Author

    Messages:
    422
    Likes Received:
    0
    Perhaps this post is more a public note-to-self, but it doesn't hurt sharing this information;

    Recently, I looked into our symbol scanning engine again, as it's not yet performing as well as I would have wanted.
    What I discovered is that our scanning is hindered by three things:
    1. Not having version-exact patterns. We approximate with other versions we do have patterns for, but length- and content differences cause incorrect behaviour. (I don't need to remind you that we would really *love* to get more XDK patterns!)
    2. Identical patterns with different names. These are currently aliassed together into one symbol, but in reality each of those can exists separately. Because we register only one of these, the other locations stay unrecognized and can even trigger a failure when detecting higher level symbols.
    3. A new category has sprung up last week : Some libraries (like libcmt) contain multiple symbols with the same name, but with a different pattern! This causes yet another type of failure on higher-level symbols, as we only keep one declaration for any given symbol name - so a reference to the second declaration will always fail.

    I don't know yet how we're going to fix these issues, but at least we now know what issues we're dealing with in this very important piece of code!
  2. NGEmu.com Advertisement

  3. ObiKKa

    ObiKKa Member

    Messages:
    69
    Likes Received:
    0
    Looks to be a long time to fix all of these problems. Will you fix them by version 0.5 or 0.6?
  4. patrickvl

    patrickvl Emu author Emu Author

    Messages:
    422
    Likes Received:
    0
    Well, problem 1 (missing XDK versions) is just a given - the aproximations we use shall have to do (and don't forget : with the latest XbeExplorer everyone can extract patterns manually from a game).

    Problem 2 (identical patterns with different names) is currently handled using explicit aliassing, removing this piece of code causes ambiguity problems however - it could be better to solve those instead of using aliassing, time will tell...

    Problem 3 (identical symbol-names with different patterns) is rather new, maybe the simplest way to fix those is to exclude all those from the detection, so they cannot conflict anymore.

    In any case, symbol scanning IS important to get right, as it will give us more titles to work with. But on the other hand, our emulation is still lacking when compared to Cxbx; Even while our symbol-scan is almost identical (even better) for the titles that Cxbx runs, we crash on them for various reasons. So improving our emulation code might be more valuable, as it could well be that with just the right fixes in that area we could already support more titles. (This is a bit of a lie though - some titles like Blade where running better with our previous scanning code, so it's going to be a little bit of both.)
  5. Bill_gates

    Bill_gates Linux's worst nightmare..

    Messages:
    1,510
    Likes Received:
    0
    I cant believe this isnt a more popular topic given how the symbol scanning issues are a major roadblock to greater compatibility.
    I spent some time today looking at the symbol scanning code here
    and while its lengthy it doesnt seem hopelessly complicated.
    Last edited: Feb 18, 2012
  6. LoRd_SnOw

    LoRd_SnOw Member

    Messages:
    178
    Likes Received:
    0
    I just looked at the code, while it looks fairly easy, I have to ask what is it exactly? So here is a typical question, what exactly is symbol scanning? and Does cxbx use this same technique?
  7. Bill_gates

    Bill_gates Linux's worst nightmare..

    Messages:
    1,510
    Likes Received:
    0
    this thread should suffice: http://forums.ngemu.com/showthread.php?t=139472

    cxbx doesnt use the same technique. cxbx uses a custom data structure called OOPVA
  8. LoRd_SnOw

    LoRd_SnOw Member

    Messages:
    178
    Likes Received:
    0
    Thank you, let me see if i understand this correctly.

    From what it sounds in patricvl's post, symbol scanning seems to be a method that allows for detecting library patterns; With these library patterns, you can obtain the locations for these functions found in Xbox Executables (.Xbe). Now if the pattern matches to what was found in the Xbox Executable, then these functions can perhaps be retrieved through a variety of XDK's, or am I getting this backwards?

    If i ever advance enough with my project (doubtful), I wouldn't mind comparing the two methods just to have a better idea how they both work. As of yet, i have never bother to look at the cxbx source or even knew that dxbx was open source until today.
  9. Bill_gates

    Bill_gates Linux's worst nightmare..

    Messages:
    1,510
    Likes Received:
    0
    Close. The Patterns are premade using IDA pro and some other tools.
    The patterns (well the first 32 bytes of the function part) are loaded into a
    Tree-based data structure called a Trie

    In Dxbx's API scanning code, Dxbx scans over the code sections of the XBE and marks all the functions that match the patterns stored in the Trie

    These marked functions are used to make the symbols.

    Finally, DXBX visits the locations of the symbols and does the interception here

Share This Page