Author Topic: Auto tagging folder - just a thought  (Read 18235 times)

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #15 on: January 16, 2009, 01:24:23 am »
I did log a ticket in the old mantis system, and there was some discussion there - I believe that hari may still have access to that. But I never relogged it in trac. You are already far deeper in than I have ever got, so I will be of limited use to you except as a sounding board!

From your description of that method, it looks like the inode is the key issue here. If matching is done on that, then you would expect it to work on the local drives, and possibly/probably even on remote NFS shares. But on a remote NTFS/Windows share? We have to accept that a very large percentage of people will have their media on such a share. I hesitate to say perhaps even the majority?

So where the hell does it get these from on NTFS? I just looked at File's inode field - it clearly is not valid! And this is the reason it doesn't work. The inodes are different and probably unique, however they are allocated sequentially in the order that UM scanned them in with usually a gap of 2 or 3 (ie 1,3,5,7,9, etc or 1000,1003,1006,etc) Not only is that obviously not going to be the arrangement on disk, but also the gap between them is the same whether the file is a small jpg or an entire DVD. This could only work if they where indexes to pointers in a table. But my understanding was that inodes are absolute pointers to a physical block on disk, is that not the case??

jthodges

  • Veteran
  • ***
  • Posts: 60
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #16 on: February 22, 2009, 04:48:04 pm »
I'm just getting back to the auto tagger after a long period of various distractions.  I have the basics in place, where it will automatically tag every new media file in a particular directory.  Short attributes, long attributes, and File columns (e.g., media sub-type, file format) are implemented.  The rules are stored in a configuration file for now, and there is not a friendly front-end for it.  I still think the rules will probably need to go in the database, but I'm waiting until the configuration is a little more well-defined to work that out. 

My next steps (in no particular order):
  • Add pattern-matching as a rule condition (e.g., set FileFormat to 720p if filename contains "720p")
  • Add pattern-matching to rule actions (e.g., extract season, episode, title, etc. from filename for television shows)
  • Run now functionality (currently only applies to new files)
  • Web admin configuration - I'm kind of putting this off because I have no php experience

I'm hoping that others will find this useful as well, so please let me know if this direction makes sense or if you'd like to try it out.  I'm finding it's not incredibly useful yet without the pattern matching (I'm basically just using it to set media subtype on my prod environment right now), but once that and 'run rules now' are in place I think it will be pretty helpful.

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #17 on: February 22, 2009, 10:32:21 pm »
so cool! thanks for all your effort jt!! can we get it into 0810??

jthodges

  • Veteran
  • ***
  • Posts: 60
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #18 on: February 23, 2009, 05:13:43 am »
so cool! thanks for all your effort jt!! can we get it into 0810??

That would be great, I'm just not really sure how I go about doing that (especially if the config data ends up in the database).  I will follow up with the devs when I wrap up the other items and see if that's a possibility.

In the meantime, I implemented the pattern-matching as a condition and 'run rules now' functionality.  I gave it some basic rules at /home/public/data/videos to tag anything with 720p / 1080p / dvd in the filename as the appropriate format, and let it run through my production system.  All looks good so far, and I think its in a much more useful state.  Now it just needs a front end and a better configuration store... that end is fairly ghetto at the moment :).



colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #19 on: February 23, 2009, 05:52:00 am »
best to have a word with Thom, Hari or Zaerc about getting something checked in.... I will mention it to them now...

jthodges

  • Veteran
  • ***
  • Posts: 60
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #20 on: February 23, 2009, 10:16:00 pm »
Thom is understandably a little nervous about including auto-tagging.  It sounds like the best direction is to just get people using it.  I also was toying with the idea of having the autotagger queue updates for approval rather than executing them directly.  This might help keep users from shooting themselves in the foot with a rule that is too broad, and there could be a hidden or heavily-disclaimered option to bypass the approval process.  I'm not sure if this kind of feature would help with the comfort level or would maybe just get in the way?

As far as getting others to use it, I think the main things in the way of that right now are lack of front-end and my ghetto config file.  Currently I'm using something like this:

Code: [Select]
[/home/public/data/videos/import/movies]
FileColumn, FK_MediaSubType, 2

[/home/public/data/videos/import/TV]
FileColumn, FK_MediaSubType, 1

[/home/public/data/videos]
FileColumn, FK_FileFormat, 4, 720p
FileColumn, FK_FileFormat, 5, 1080p
FileColumn, FK_FileFormat, 2, dvd
FileColumn, FK_FileFormat, 2, \.iso$

[/home/public/data/videos/wifesmovies]
ShortAttribute, 8, Chick Flick, true

To explain these examples:
  • Adds a rule to tag all files under import/movies as movie, and files under TV as TV
  • Adds a rule global to public videos that tags as 720p, 1080p, or dvd based on expressions matched against the filename
  • Adds a rule to tag all files under wifesmovies as Genre="Chick Flick".  The true indicates that it will attempt to reuse an existing attribute if possible

Its basically a hacked up CSV file that I used as a temporary solution so I could get rolling on the functionality.  I'm not sure that I really see anything wrong with it right now, so maybe it can stay.  One possible issue is that it could get a little messy if any tag types allowed variable parameters.  Any thoughts?  Is there any advantage to having this stuff in the database?

As for the front-end, I'm getting ready to start looking at the admin website.  Fair warning though, I have very limited experience with php and UI is probably not my forte... if anyone wants to help with this portion it would be more than welcome!

Zaerc

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 2256
  • Department of Redundancy Department.
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #21 on: February 23, 2009, 11:02:18 pm »
Don't we have something like this in 0810 already?
"Change is inevitable. Progress is optional."
-- Anonymous


jthodges

  • Veteran
  • ***
  • Posts: 60
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #22 on: February 23, 2009, 11:08:00 pm »
Don't we have something like this in 0810 already?

I'm not sure... the closest I know if is this: http://forum.linuxmce.org/index.php?topic=6571.0.  But I believe that is for applying tags recursively, rather than automatically tagging new media as it is added to the system. 

Please do let me know if this already exists, I'd rather not spend more time on it if it is already implemented...

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #23 on: February 24, 2009, 02:47:25 am »
I think the recursive one is the only other thing there currently, and that is quiet a different beast.

1) How does your code handle pre-existing tags? ie if it already has a genre tag, does it add this as an additional tag or replace the existing one? Perhaps a flag that says replace|add|no-action when a tag already exists? Perhaps even an extra, new, attribute, boolean called Autotagged, that could be used to identify those entries that have been modded, which could allow for a rudimentary undo function...
2) Have you tested that once the db is updated, does updatemedia then correctly write those changes back out to both embedded id3 tags, and the .id3 tag files (depending on whether it is an mp3 or not)
3) Can you use your process to apply the '*' attribute that marks pictures as elligible for screen saver?
4) How does the process work? Is it a mod to UpdateMedia and happens as part of the scan, or a separate, new, process that runs at some other time?
5) How's your php? Is it possible to add a rough web page interface that provides a simple UI that writes out the config file?

sorry for all the q's!

jthodges

  • Veteran
  • ***
  • Posts: 60
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #24 on: February 24, 2009, 03:30:03 am »
1) I'm working on the replace/add/no-action option right now, actually.  This didn't come up until I implemented the 'run rules now' feature, since new files wouldn't have existing tags.  And I like the idea of being able to undo, but I don't think I follow how the new attribute would allow it.  Can you expand on that?
2) When new files are tagged, yes (though I haven't specifically tested mp3s).  The db is updated just before the sync occurs, so id3 files are written out as UpdateMedia continues.  The 'run rules now' is a separate executable though, and the id3s aren't synced until the next UpdateMedia run.  I didn't think that was a big deal, but we could consider adding in a call to fire off UpdateMedia at the end if necessary.
3) I didn't know what that attribute was for until you asked the question, but yes the backend supports it.  It would be a rule like:
Code: [Select]
[/home/public/data/pictures/myscreensaverpics]
ShortAttribute, 30, *, true
4) Both.  First, an update to UpdateMedia for new files, which updates the new File row before the id3 is synced.  Second, the 'run rules now' is a standalone that uses the same library to load and execute rules against only files that are already in the db.  Note that when I say 'new file', I'm referring to new according to UpdateMedia. Files that are moved to the directory and recovered by UpdateMedia will have their paths updated in the database, but it will not trigger the auto-tagging.  The standalone would need to be used in this case.
5) Shaky, but yeah that's probably my next step.  I was thinking I would  extend the existing media sync screen, but it's starting to look like the rule definition will be different enough from the attribute creation that it will warrant a separate section or screen.

No problem, good questions and you are giving me some additional ideas for testing.  BTW, do you think the approval process is worthwhile?  I'm seeing it as something you use as you work through and test your auto-tagging rules, and eventually disable when things are running smoothly.  It's probably a significant extra bit of work though, so it's not worth it if it's just going to get in the way.

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #25 on: February 24, 2009, 03:51:42 am »
jt

1) Great. For undo, I haven't thought through the logic fully, but my idea was - if the new tag was added, when any tags are changed/added/deleted by your code, that would mean that with a simple SQL query you can select all the files that have been modified by your code. Then based on the folder location, you can determine what rules were applied to that file - so for a file that has an autotag attribute and an "add" rule, you can remove that tag added by the rule. A no-action rule only needs remove the tag if there is only exactly one instance of the tag and its value matches. Obviously, there isn't much you can do if the rule is a replace rule without getting tricky with history attributes (sure fire way of it not getting added to 0810!) Does that make sense?

One other point is, often "new" files will have tags. mp3s ripped from other sources will already have embedded tags, usually.

2) Probably should test the mp3s then as this is a different type of tag... I'm sure it will work if the .id3 ones do, though. I don't think it would be necessary to trigger UM, as the convergence time is never going to be much more than 2 mins.

3) Great!

4) Understood. In addition to moved files not being retagged, perhaps there should be some way of overriding the autotag, so that although you want a particular file to be filed with all the others in one of these folders, you could easily want a one off manual exception. Not sure that the file isn't "new" to UM is enough here, as manually moving a file, or even renaming it in situ will make UpdateMedia think it is new again... even manually editing attributes for mp3s I believe does this....

I'm not sure that the effort involved for the manual approval process is worth it. Its a great idea (perhaps use a db table as a queue like pnpqueue does), but perhaps the 'add' rule and even the 'undo' function provides enough security?

jthodges

  • Veteran
  • ***
  • Posts: 60
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #26 on: February 24, 2009, 05:28:37 am »
A no-action rule only needs remove the tag if there is only exactly one instance of the tag and its value matches. Obviously, there isn't much you can do if the rule is a replace rule without getting tricky with history attributes (sure fire way of it not getting added to 0810!) Does that make sense?
Yeah, I see what you are getting at.  I think that the reuse option for short attributes would screw up the logic for reverting a no-action as well, because it would not be clear if the tag was already associated with the file or not.  I think an approach that will adapt to more complex rules (as I expect these to get more complex) would probably be easier in the long run.  I could possibly extend the existing rules to log a description and sql necessary to revert the changes as they execute.  This could be used to provide a log of changes made as well as a method to revert them. 

Quote
One other point is, often "new" files will have tags. mp3s ripped from other sources will already have embedded tags, usually.
Good point.  I was thinking in terms of what what is in the database at the point the tags are applied, but that doesn't really matter.  I need to see how things are handled if existing tags conflict with a rule, and make sure the new replace/add/no-action rule handles it properly.

Quote
2) Probably should test the mp3s then as this is a different type of tag... I'm sure it will work if the .id3 ones do, though.
Yeah, I did look through that code before starting this and they are handled the same.  Still worth running some through and making sure.

Quote
4) Understood. In addition to moved files not being retagged, perhaps there should be some way of overriding the autotag, so that although you want a particular file to be filed with all the others in one of these folders, you could easily want a one off manual exception.  Not sure that the file isn't "new" to UM is enough here, as manually moving a file, or even renaming it in situ will make UpdateMedia think it is new again... even manually editing attributes for mp3s I believe does this....
I hadn't thought about exceptions ... I'll mull this one over.  It might make sense to add a new rule type for exceptions and allow it to be applied to a file or directory.  This would allow you to cut off another rule's recursion as well as handle the one-off files.

Quote
I'm not sure that the effort involved for the manual approval process is worth it. Its a great idea (perhaps use a db table as a queue like pnpqueue does), but perhaps the 'add' rule and even the 'undo' function provides enough security?
I agree, the undo is sounding like a better option.  It offers the security without getting in the way of the automation that the feature is supposed to offer. 

jthodges

  • Veteran
  • ***
  • Posts: 60
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #27 on: February 27, 2009, 04:59:22 pm »
Update:

  • I implemented the add/overwrite/no-action (calling it the 'Conflict Action' since there may be other options in the future, and 'add' does not apply to some rule types).  Default is "NoAction".
  • I implemented the backup feature.  I'm logging to a csv file (so I could reuse my existing file format), with each row containing timestamp, tag id (unique per run only), change description, and sql to revert.  You run the sql in the reverse order to revert, and it seems to work pretty well.  Of course, this will require a front end as well...
  • I'm currently working through a bunch of test scenarios.  My notes attached, in case anyone would like to add some missing test cases.  (Sorry about the image format.)
  • The web admin front end is still looming ... I am hoping to dive in this weekend, but I would still welcome any help on this piece.  This would work out especially well now, since the backup management and rule management are fairly separate features.


colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #28 on: February 27, 2009, 09:45:43 pm »
jt - cool! Have a word with Thom in IRC and see if he is happier about putting this into 0810 now... looking good!

colinjones

  • Alumni
  • LinuxMCE God
  • *
  • Posts: 3003
    • View Profile
Re: Auto tagging folder - just a thought
« Reply #29 on: March 09, 2009, 11:06:33 pm »
jt - how are you going with this?