Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: MythTV: Users

myth and unicode..

 

 

MythTV users RSS feed   Index | Next | Previous | View Threaded


shabba at skynet

May 11, 2008, 6:33 AM

Post #1 of 14 (577 views)
Permalink
myth and unicode..

Hi all,

I have some avis in mythvideo that have accentuated chars in them. I have
a script that adds the plot etc to them. Thing is they appear in mythvideo
without this data and coverfile but appear in mythweb fine. The database
entry in the DB looks fine too. Is there a known problem that cases this?

eg :
mysql> select filename,title from videometadata where filename like
'/TV%Shark%Season 1%Again.avi';
+--------------------------------------------------------------+-------------------------------+
| filename | title
|
+--------------------------------------------------------------+-------------------------------+
| /TVSeries/Shark/Season 1/1x07 - Déjà Vu All Over Again.avi | 07 - Déjà
Vu All Over Again |
+--------------------------------------------------------------+-------------------------------+
1 row in set (0.01 sec)


The filename appears in the title in mythvideo. Looks like mythvideo
cannot parse any further when it hits the filename.

Thanks,

Damian.


mtdean at thirdcontact

May 11, 2008, 9:46 AM

Post #2 of 14 (549 views)
Permalink
Re: myth and unicode.. [In reply to]

On 05/11/2008 09:33 AM, Damian O'Sullivan wrote:
> I have some avis in mythvideo that have accentuated chars in them. I
> have a script that adds the plot etc to them. Thing is they appear in
> mythvideo without this data and coverfile but appear in mythweb fine.
> The database entry in the DB looks fine too. Is there a known problem
> that cases this?
>
> eg :
> mysql> select filename,title from videometadata where filename like
> '/TV%Shark%Season 1%Again.avi';
> +--------------------------------------------------------------+-------------------------------+
>
> | filename | title |
> +--------------------------------------------------------------+-------------------------------+
>
> | /TVSeries/Shark/Season 1/1x07 - Déjà Vu All Over Again.avi | 07 -
> Déjà Vu All Over Again |
> +--------------------------------------------------------------+-------------------------------+
>
> 1 row in set (0.01 sec)
>
>
> The filename appears in the title in mythvideo. Looks like mythvideo
> cannot parse any further when it hits the filename.

All characters in the 0.21 and below MythTV database /must/ be encoded
as latin1. That means that special characters need to be written as
multiple latin1 characters such that when "unencoded" by MythTV, they
will turn into the proper (non-latin1) characters. Putting utf8 (or
whatever your script is putting into the DB) breaks the data.

Mike
_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


udovdh at xs4all

May 11, 2008, 10:46 AM

Post #3 of 14 (548 views)
Permalink
Re: myth and unicode.. [In reply to]

Michael T. Dean wrote:
> will turn into the proper (non-latin1) characters. Putting utf8 (or
> whatever your script is putting into the DB) breaks the data.

Any reasons for choosing latin1 in this unicode era?
I mean: fix unicode for MythTV and no worries anymore about weird
characters.
_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


mtdean at thirdcontact

May 11, 2008, 10:59 AM

Post #4 of 14 (548 views)
Permalink
Re: myth and unicode.. [In reply to]

On 05/11/2008 01:46 PM, Udo van den Heuvel wrote:
> Michael T. Dean wrote:
>
>> will turn into the proper (non-latin1) characters. Putting utf8 (or
>> whatever your script is putting into the DB) breaks the data.
>>
> Any reasons for choosing latin1 in this unicode era?
>

Yes.

> I mean: fix unicode for MythTV and no worries anymore about weird
> characters.

Long story. Don't worry about it, 0.22 is completely different.

Mike
_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


shabba at skynet

May 11, 2008, 11:51 AM

Post #5 of 14 (548 views)
Permalink
Re: myth and unicode.. [In reply to]

On Sun, 11 May 2008, Michael T. Dean wrote:
> All characters in the 0.21 and below MythTV database /must/ be encoded
> as latin1. That means that special characters need to be written as
> multiple latin1 characters such that when "unencoded" by MythTV, they
> will turn into the proper (non-latin1) characters. Putting utf8 (or
> whatever your script is putting into the DB) breaks the data.
>
> Mike

Thanks Mike -

Hmm. How come mythweb works ok? Also the file appears in mythvideo just
does not use the extra videometadata. Myth seems to have no problem with
the filename and the way it is stored in the DB. A little confused as I
don't know too much about locales and encoding etc..

PS. These files are on a smb mounted fs and I had to use convmv to convert
them to UTF8.

D.
_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


awithers at anduin

May 11, 2008, 2:52 PM

Post #6 of 14 (533 views)
Permalink
Re: myth and unicode.. [In reply to]

> Hmm. How come mythweb works ok?

It may do something wrong (at the minimum, at least in the sense that it
doesn't work like MythVideo). It also may work exactly how MythVideo can,
you just have different settings.

> Also the file appears in mythvideo just
> does not use the extra videometadata. Myth seems to have no problem with
> the filename and the way it is stored in the DB.

It depends on your settings, if you have it set to read from the file system
and match metadata the only link is the file name. If they do not match, and
you run the video manager, you should get a prompt to remove the entry from
the DB.

> A little confused as I
> don't know too much about locales and encoding etc..

More useful would be the hexadecimal representation for Déjà in the DB
(though unless you modified your paste it is wrong, you should see gibberish
for é and à if the column is using the "right" charset in .21).

If you were to "SELECT hex(filename) FROM videometadata WHERE filename LIKE
'% Vu %'" I bet you would see Déjà Vu as 44E96AE0205675, in UTF-8 it should
be 44C3A96AC3A0205675.

--
Anduin Withers

_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


shabba at skynet

May 11, 2008, 3:16 PM

Post #7 of 14 (533 views)
Permalink
Re: myth and unicode.. [In reply to]

On Sun, 11 May 2008, Anduin Withers wrote:
> It may do something wrong (at the minimum, at least in the sense that it
> doesn't work like MythVideo). It also may work exactly how MythVideo can,
> you just have different settings.

How do you mean by differnet settings?

> It depends on your settings, if you have it set to read from the file system
> and match metadata the only link is the file name. If they do not match, and
> you run the video manager, you should get a prompt to remove the entry from
> the DB.

I delete the line in mysql and then run an import.

> More useful would be the hexadecimal representation for Déjà in the DB
> (though unless you modified your paste it is wrong, you should see gibberish
> for é and à if the column is using the "right" charset in .21).
>
> If you were to "SELECT hex(filename) FROM videometadata WHERE filename LIKE
> '% Vu %'" I bet you would see Déjà Vu as 44E96AE0205675, in UTF-8 it should
> be 44C3A96AC3A0205675.

SELECT hex(filename) FROM videometadata WHERE filename LIKE '% Vu %';
+----------------------------------------------------------------------------------------------------------------------------------+
| hex(filename)
|
+----------------------------------------------------------------------------------------------------------------------------------+
|
2F54565365726965732F536861726B2F536561736F6E20312F31783037202D2044C383C2A96AC383C2A020567520416C6C204F76657220416761696E2E617669
|
+----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

I can't see either. I am just looking for a way that when I run my perl
import script that does a tv.com lookup that I can import the data into
mysql right. So should I assume that I just enter the data into the DB
without encoding it or should I encode it? For prosperity this is what ls
-b looks like :

damian[at]mythtv-box:~/SVN/trunk/mythtv$ ls -b /TVSeries/Shark/Season\
1/1x07\ -\ Déjà\ Vu\ All\ Over\ Again.avi
/TVSeries/Shark/Season\ 1/1x07\ -\ Déjà\ Vu\ All\ Over\ Again.avi
This is utf8 encoded.

damian[at]mythtv-box:~/SVN/trunk/mythtv$ echo $LANG
en_IE.UTF-8


Thanks,

D.


awithers at anduin

May 11, 2008, 5:00 PM

Post #8 of 14 (529 views)
Permalink
Re: myth and unicode.. [In reply to]

> > you just have different settings.
>
> How do you mean by differnet settings?

In the General settings tab for MythVideo, there is a '[view type] browses
files' checkbox. When checked, what you see comes from the file system (so
is always "correct" in that there is some file there). If you have 'Video
List Loads Video Meta Data' the file system name is used to search that
metadata list. If you don't have '[blah] browses files' then what you see is
generated entirely from the metadata in the DB.

> I delete the line in mysql and then run an import.

I don't use MythWeb (for videos), if you mean import = Video Manager in the
frontend, I have no explanation. If you mean something in MythWeb, well once
clear, we can start fixing the bug.

> > '% Vu %'" I bet you would see Déjà Vu as 44E96AE0205675, in UTF-8 it
> > should be 44C3A96AC3A0205675.
> [...]44C383C2A96AC383C2A0205675[...]

What you see is the result of taking the utf8 representation, treating it as
if it were latin1, and then converting it to utf8. There should be no way to
do this using MythVideo (where MythVideo is limited to the plugin only,
libmythvideo.so).

If you want to see for yourself:

$ echo 44C3A96AC3A0205675 | xxd -r -p | iconv -f latin1 -t utf8 | xxd

yields:

44c3 83c2 a96a c383 c2a0 2056 75

The interesting bit is how you accomplished this and where to start fixing
things. I do not see how it can be done using only libmythvideo.so.

--
Anduin Withers

_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


shabba at skynet

May 12, 2008, 12:29 AM

Post #9 of 14 (514 views)
Permalink
Re: myth and unicode.. [In reply to]

> In the General settings tab for MythVideo, there is a '[view type] browses
> files' checkbox. When checked, what you see comes from the file system (so
> is always "correct" in that there is some file there). If you have 'Video
> List Loads Video Meta Data' the file system name is used to search that
> metadata list. If you don't have '[blah] browses files' then what you see is
> generated entirely from the metadata in the DB.

Ok gotcha..

> I don't use MythWeb (for videos), if you mean import = Video Manager in the
> frontend, I have no explanation. If you mean something in MythWeb, well once
> clear, we can start fixing the bug.

I use sync.pl which I think I got from mythtools. I always believed it was
the same as doing an import. I will try the manual way later. I do not use
mythweb. I was just checking to see if the behaviour was the same and was
suprised when it displayed the plot etc fine.

> What you see is the result of taking the utf8 representation, treating it as
> if it were latin1, and then converting it to utf8. There should be no way to
> do this using MythVideo (where MythVideo is limited to the plugin only,
> libmythvideo.so).
>
> If you want to see for yourself:
>
> $ echo 44C3A96AC3A0205675 | xxd -r -p | iconv -f latin1 -t utf8 | xxd
>
yields:
>
> 44c3 83c2 a96a c383 c2a0 2056 75
>
> The interesting bit is how you accomplished this and where to start fixing
> things. I do not see how it can be done using only libmythvideo.so.
>

Ok steps I used were pretty much make sure I encoded the file name as
UTF8. When an ls on the dir (over samba) showed the filename correctly I
moved to myth. I then import the files. I then run a script that checks
for the new files and tries to get plot etc from tv.com. I have tried
various methods of decode("utf8",$filename) etc in perl and managed to get
it into the DB in what looks good from a select sql statement but cannot
get to appear in mythvideo.

Thanks,

D.
_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


awithers at anduin

May 12, 2008, 9:13 AM

Post #10 of 14 (493 views)
Permalink
Re: myth and unicode.. [In reply to]

> I use sync.pl which I think I got from mythtools. I always believed it was
> the same as doing an import.

I've never looked at it. I did a search and the 0.2 version I found isn't
correct (my perl may be rusty, but I don't see character (de/en)coding in
there).

> I have tried various methods of decode("utf8",$filename) etc in perl and
> managed to get it into the DB in what looks good from a select sql
> statement but cannot get to appear in mythvideo.

Looking good from a select may be a bad sign. Look at the hex for the field.
Until you see the utf8 sequence I posted, the tool you are using is still
broken.

--
Anduin Withers

_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


shabba at skynet

May 12, 2008, 10:23 AM

Post #11 of 14 (493 views)
Permalink
Re: myth and unicode.. [In reply to]

On Mon, 12 May 2008, Anduin Withers wrote:

>> I use sync.pl which I think I got from mythtools. I always believed it was
>> the same as doing an import.
>
> I've never looked at it. I did a search and the 0.2 version I found isn't
> correct (my perl may be rusty, but I don't see character (de/en)coding in
> there).

You are right. I imported from mythui and it worked. So if I wanted to fix
sync.pl (it is very handy) I would need to encode the data in latin1
before inserting?

Thanks muchly,

D.
_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


awithers at anduin

May 12, 2008, 11:13 AM

Post #12 of 14 (486 views)
Permalink
Re: myth and unicode.. [In reply to]

> You are right. I imported from mythui and it worked. So if I wanted to fix
> sync.pl (it is very handy) I would need to encode the data in latin1
> before inserting?

No. In 0.21 the column encoding is latin1 but you should ignore this, the
data in that field must be in utf8.

--
Anduin Withers

_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


shabba at skynet

May 20, 2008, 1:21 PM

Post #13 of 14 (382 views)
Permalink
Re: myth and unicode.. [In reply to]

On Mon, 12 May 2008, Anduin Withers wrote:

>> You are right. I imported from mythui and it worked. So if I wanted to fix
>> sync.pl (it is very handy) I would need to encode the data in latin1
>> before inserting?
>
> No. In 0.21 the column encoding is latin1 but you should ignore this, the
> data in that field must be in utf8.
>
> --
> Anduin Withers

I forgot to mention that I was using SVN. What is the difference here
then?

Damian O'Sullivan Tel: 087 2241456 damian[at]linux.ie
_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users


awithers at anduin

May 20, 2008, 4:06 PM

Post #14 of 14 (378 views)
Permalink
Re: myth and unicode.. [In reply to]

> I forgot to mention that I was using SVN. What is the difference here
> then?

What changed? The column type is now correct. Actual bytes of the data did
not change (mostly). All of this is extremely easy to check by looking at
the table and commits.

--
Anduin Withers

_______________________________________________
mythtv-users mailing list
mythtv-users[at]mythtv.org
http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users

MythTV users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.