Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: MythTV: Mythtvnz

XMLTV headers changed

 

 

MythTV mythtvnz RSS feed   Index | Next | Previous | View Threaded


g8ecj at gilks

Jul 7, 2012, 12:25 AM

Post #1 of 13 (1418 views)
Permalink
XMLTV headers changed

Greetings

It seems that the data contained in http://nzepg.org/freeview.xml.gz had a
change of header last weekend.

I followed the conversation about mhegsnoop having a header change but
didn't realise it would propagate to the online data.

This is a real problem for me as I merge data from epgsnoop with the
online stuff (which has more detail for FreeView) using 'tv_cat' from the
xmltv package and it barfs with:
"/tmp/listings-freeview-31681.xml: this file's encoding utf-8 differs from
others' ISO-8859-1 - aborting"

So the online data now has utf-8 in the header but the epgsnoop data is
ISO-8859-1. I tried changing outputter.py (in epgsnoop) to utf-8 but the
data from satellite EPG has some interesting 8 bit characters (which I
assume really are ISO-8859-1 codes).

So who is right - utf-8 or ISO-8859-1 and how can I merge the two
different encodings if they are both right!!

Cheers

--
Robin Gilks



_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


dmoo1790 at ihug

Jul 7, 2012, 1:39 AM

Post #2 of 13 (1367 views)
Permalink
Re: XMLTV headers changed [In reply to]

On 07/07/12 19:25, Robin Gilks wrote:
> Greetings
>
> It seems that the data contained in http://nzepg.org/freeview.xml.gz had a
> change of header last weekend.
>
> I followed the conversation about mhegsnoop having a header change but
> didn't realise it would propagate to the online data.
>
> This is a real problem for me as I merge data from epgsnoop with the
> online stuff (which has more detail for FreeView) using 'tv_cat' from the
> xmltv package and it barfs with:
> "/tmp/listings-freeview-31681.xml: this file's encoding utf-8 differs from
> others' ISO-8859-1 - aborting"
>
> So the online data now has utf-8 in the header but the epgsnoop data is
> ISO-8859-1. I tried changing outputter.py (in epgsnoop) to utf-8 but the
> data from satellite EPG has some interesting 8 bit characters (which I
> assume really are ISO-8859-1 codes).
>
> So who is right - utf-8 or ISO-8859-1 and how can I merge the two
> different encodings if they are both right!!
>
> Cheers
>

Try iconv to convert the encoding of one file to match the other. For
example:

iconv -c -f UTF-8 -t ISO_8859-1 this_file -o that_file

The -c will skip invalid chars.

Also change the header or delete it. And check the encoding with "file
-i that_file".

UTF-8 vs ISO_8859-1? Well UTF-8 is a more universal char set. ISO_8859-1
is mostly for Western European or Latin languages. I believe UTF-8 is or
is becoming the preferred encoding.

Interestingly you may have revealed a bug. UTF-8 encoding created by
mhegepgsnoop is displayed properly (e.g., by "less file" and myth) but
iconv choked on one character. Seems the byte order might be backwards
for this char but most apps handle it because bytes in multi-byte UTF-8
chars are unambiguous so order doesn't really matter. iconv may be less
tolerant and simply abort if it gets bytes in the wrong order.

_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


dmoo1790 at ihug

Jul 7, 2012, 2:06 AM

Post #3 of 13 (1365 views)
Permalink
Re: XMLTV headers changed [In reply to]

On 07/07/12 20:39, David Moore wrote:
> Interestingly you may have revealed a bug. UTF-8 encoding created by
> mhegepgsnoop is displayed properly (e.g., by "less file" and myth) but
> iconv choked on one character. Seems the byte order might be backwards
> for this char but most apps handle it because bytes in multi-byte
> UTF-8 chars are unambiguous so order doesn't really matter. iconv may
> be less tolerant and simply abort if it gets bytes in the wrong order.

I take it back. Seems the bytes in the character iconv choked on are in
the correct order. Bug in iconv?

_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


g8ecj at gilks

Jul 7, 2012, 2:16 AM

Post #4 of 13 (1368 views)
Permalink
Re: XMLTV headers changed [In reply to]

> On 07/07/12 19:25, Robin Gilks wrote:
>> Greetings
>>
>> It seems that the data contained in http://nzepg.org/freeview.xml.gz had
>> a
>> change of header last weekend.
>>
>> I followed the conversation about mhegsnoop having a header change but
>> didn't realise it would propagate to the online data.
>>
>> This is a real problem for me as I merge data from epgsnoop with the
>> online stuff (which has more detail for FreeView) using 'tv_cat' from
>> the
>> xmltv package and it barfs with:
>> "/tmp/listings-freeview-31681.xml: this file's encoding utf-8 differs
>> from
>> others' ISO-8859-1 - aborting"
>>
>> So the online data now has utf-8 in the header but the epgsnoop data is
>> ISO-8859-1. I tried changing outputter.py (in epgsnoop) to utf-8 but the
>> data from satellite EPG has some interesting 8 bit characters (which I
>> assume really are ISO-8859-1 codes).
>>
>> So who is right - utf-8 or ISO-8859-1 and how can I merge the two
>> different encodings if they are both right!!
>>
>> Cheers
>>
>
> Try iconv to convert the encoding of one file to match the other. For
> example:
>
> iconv -c -f UTF-8 -t ISO_8859-1 this_file -o that_file
>
> The -c will skip invalid chars.
>
> Also change the header or delete it. And check the encoding with "file
> -i that_file".
>
> UTF-8 vs ISO_8859-1? Well UTF-8 is a more universal char set. ISO_8859-1
> is mostly for Western European or Latin languages. I believe UTF-8 is or
> is becoming the preferred encoding.
>
> Interestingly you may have revealed a bug. UTF-8 encoding created by
> mhegepgsnoop is displayed properly (e.g., by "less file" and myth) but
> iconv choked on one character. Seems the byte order might be backwards
> for this char but most apps handle it because bytes in multi-byte UTF-8
> chars are unambiguous so order doesn't really matter. iconv may be less
> tolerant and simply abort if it gets bytes in the wrong order.


So the online data is now created by mhegepgsnoop? I'm surprised it
doesn't follow the existing (as of the last 4 years at least) encoding of
ISO_8859-1 from epgsnoop or at least checked for compatibility with an
existing schema.



--
Robin Gilks



_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


stephen_agent at jsw

Jul 7, 2012, 2:22 AM

Post #5 of 13 (1368 views)
Permalink
Re: XMLTV headers changed [In reply to]

On Sat, 7 Jul 2012 19:25:53 +1200, you wrote:

>Greetings
>
>It seems that the data contained in http://nzepg.org/freeview.xml.gz had a
>change of header last weekend.
>
>I followed the conversation about mhegsnoop having a header change but
>didn't realise it would propagate to the online data.
>
>This is a real problem for me as I merge data from epgsnoop with the
>online stuff (which has more detail for FreeView) using 'tv_cat' from the
>xmltv package and it barfs with:
>"/tmp/listings-freeview-31681.xml: this file's encoding utf-8 differs from
>others' ISO-8859-1 - aborting"
>
>So the online data now has utf-8 in the header but the epgsnoop data is
>ISO-8859-1. I tried changing outputter.py (in epgsnoop) to utf-8 but the
>data from satellite EPG has some interesting 8 bit characters (which I
>assume really are ISO-8859-1 codes).
>
>So who is right - utf-8 or ISO-8859-1 and how can I merge the two
>different encodings if they are both right!!
>
>Cheers

UTF-8 is the default for xmltv files, so I think the best thing would
be to find a way to convert the epgsnoop data to UTF-8. I am sure
Python can do it somehow. iconv can do the raw encoding conversion,
but does not understand xml and does not fix the headers. So maybe a
change to the epgsnoop code would be best, to get it to output in
UTF-8.

_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


dmoo1790 at ihug

Jul 7, 2012, 2:27 AM

Post #6 of 13 (1362 views)
Permalink
Re: XMLTV headers changed [In reply to]

On 07/07/12 21:06, David Moore wrote:
> On 07/07/12 20:39, David Moore wrote:
>> Interestingly you may have revealed a bug. UTF-8 encoding created by
>> mhegepgsnoop is displayed properly (e.g., by "less file" and myth)
>> but iconv choked on one character. Seems the byte order might be
>> backwards for this char but most apps handle it because bytes in
>> multi-byte UTF-8 chars are unambiguous so order doesn't really
>> matter. iconv may be less tolerant and simply abort if it gets bytes
>> in the wrong order.
>
> I take it back. Seems the bytes in the character iconv choked on are
> in the correct order. Bug in iconv?

As usual I didn't RTFM. What you need is:

iconv -f UTF-8 -t ISO_8859-1//TRANSLIT this_file -o that_file

"man iconv" for explanation of "//TRANSLIT".

_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


dmoo1790 at ihug

Jul 7, 2012, 2:33 AM

Post #7 of 13 (1373 views)
Permalink
Re: XMLTV headers changed [In reply to]

On 07/07/12 21:22, Stephen Worthington wrote:
> On Sat, 7 Jul 2012 19:25:53 +1200, you wrote:
>
>> Greetings
>>
>> It seems that the data contained in http://nzepg.org/freeview.xml.gz had a
>> change of header last weekend.
>>
>> I followed the conversation about mhegsnoop having a header change but
>> didn't realise it would propagate to the online data.
>>
>> This is a real problem for me as I merge data from epgsnoop with the
>> online stuff (which has more detail for FreeView) using 'tv_cat' from the
>> xmltv package and it barfs with:
>> "/tmp/listings-freeview-31681.xml: this file's encoding utf-8 differs from
>> others' ISO-8859-1 - aborting"
>>
>> So the online data now has utf-8 in the header but the epgsnoop data is
>> ISO-8859-1. I tried changing outputter.py (in epgsnoop) to utf-8 but the
>> data from satellite EPG has some interesting 8 bit characters (which I
>> assume really are ISO-8859-1 codes).
>>
>> So who is right - utf-8 or ISO-8859-1 and how can I merge the two
>> different encodings if they are both right!!
>>
>> Cheers
>
> UTF-8 is the default for xmltv files, so I think the best thing would
> be to find a way to convert the epgsnoop data to UTF-8. I am sure
> Python can do it somehow. iconv can do the raw encoding conversion,
> but does not understand xml and does not fix the headers. So maybe a
> change to the epgsnoop code would be best, to get it to output in
> UTF-8.
>

It's dead easy to output xml as UTF-8 in Python:

ET.ElementTree(root_element).write(outfile, encoding="utf-8")

Writing the correct header is not so simple to do automagically but easy
to write manually.

_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


dmoo1790 at ihug

Jul 7, 2012, 2:40 AM

Post #8 of 13 (1362 views)
Permalink
Re: XMLTV headers changed [In reply to]

On 07/07/12 21:16, Robin Gilks wrote:
>
>> On 07/07/12 19:25, Robin Gilks wrote:
>>> Greetings
>>>
>>> It seems that the data contained in http://nzepg.org/freeview.xml.gz had
>>> a
>>> change of header last weekend.
>>>
>>> I followed the conversation about mhegsnoop having a header change but
>>> didn't realise it would propagate to the online data.
>>>
>>> This is a real problem for me as I merge data from epgsnoop with the
>>> online stuff (which has more detail for FreeView) using 'tv_cat' from
>>> the
>>> xmltv package and it barfs with:
>>> "/tmp/listings-freeview-31681.xml: this file's encoding utf-8 differs
>>> from
>>> others' ISO-8859-1 - aborting"
>>>
>>> So the online data now has utf-8 in the header but the epgsnoop data is
>>> ISO-8859-1. I tried changing outputter.py (in epgsnoop) to utf-8 but the
>>> data from satellite EPG has some interesting 8 bit characters (which I
>>> assume really are ISO-8859-1 codes).
>>>
>>> So who is right - utf-8 or ISO-8859-1 and how can I merge the two
>>> different encodings if they are both right!!
>>>
>>> Cheers
>>>
>>
>> Try iconv to convert the encoding of one file to match the other. For
>> example:
>>
>> iconv -c -f UTF-8 -t ISO_8859-1 this_file -o that_file
>>
>> The -c will skip invalid chars.
>>
>> Also change the header or delete it. And check the encoding with "file
>> -i that_file".
>>
>> UTF-8 vs ISO_8859-1? Well UTF-8 is a more universal char set. ISO_8859-1
>> is mostly for Western European or Latin languages. I believe UTF-8 is or
>> is becoming the preferred encoding.
>>
>> Interestingly you may have revealed a bug. UTF-8 encoding created by
>> mhegepgsnoop is displayed properly (e.g., by "less file" and myth) but
>> iconv choked on one character. Seems the byte order might be backwards
>> for this char but most apps handle it because bytes in multi-byte UTF-8
>> chars are unambiguous so order doesn't really matter. iconv may be less
>> tolerant and simply abort if it gets bytes in the wrong order.
>
>
> So the online data is now created by mhegepgsnoop? I'm surprised it
> doesn't follow the existing (as of the last 4 years at least) encoding of
> ISO_8859-1 from epgsnoop or at least checked for compatibility with an
> existing schema.
>

No, I believe nzepg is only using mhegepgsnoop for some channels, e.g.,
ChoiceTV?

_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


g8ecj at gilks

Jul 7, 2012, 2:40 AM

Post #9 of 13 (1364 views)
Permalink
Re: XMLTV headers changed [In reply to]

> On 07/07/12 21:06, David Moore wrote:
>> On 07/07/12 20:39, David Moore wrote:
>>> Interestingly you may have revealed a bug. UTF-8 encoding created by
>>> mhegepgsnoop is displayed properly (e.g., by "less file" and myth)
>>> but iconv choked on one character. Seems the byte order might be
>>> backwards for this char but most apps handle it because bytes in
>>> multi-byte UTF-8 chars are unambiguous so order doesn't really
>>> matter. iconv may be less tolerant and simply abort if it gets bytes
>>> in the wrong order.
>>
>> I take it back. Seems the bytes in the character iconv choked on are
>> in the correct order. Bug in iconv?
>
> As usual I didn't RTFM. What you need is:
>
> iconv -f UTF-8 -t ISO_8859-1//TRANSLIT this_file -o that_file
>
> "man iconv" for explanation of "//TRANSLIT".


I've put in

iconv -f UTF-8 -t ISO_8859-1//TRANSLIT//IGNORE this_file -o that_file

and it seems to work fine (I'm converting the epgsnoop output) but that
still doesn't explain to me why ISO_8859-1 has worked fine for a number of
years in epgsnoop but now it has to change...


--
Robin Gilks




_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


dmoo1790 at ihug

Jul 7, 2012, 2:53 AM

Post #10 of 13 (1379 views)
Permalink
Re: XMLTV headers changed [In reply to]

On 07/07/12 21:40, Robin Gilks wrote:
>
>> On 07/07/12 21:06, David Moore wrote:
>>> On 07/07/12 20:39, David Moore wrote:
>>>> Interestingly you may have revealed a bug. UTF-8 encoding created by
>>>> mhegepgsnoop is displayed properly (e.g., by "less file" and myth)
>>>> but iconv choked on one character. Seems the byte order might be
>>>> backwards for this char but most apps handle it because bytes in
>>>> multi-byte UTF-8 chars are unambiguous so order doesn't really
>>>> matter. iconv may be less tolerant and simply abort if it gets bytes
>>>> in the wrong order.
>>>
>>> I take it back. Seems the bytes in the character iconv choked on are
>>> in the correct order. Bug in iconv?
>>
>> As usual I didn't RTFM. What you need is:
>>
>> iconv -f UTF-8 -t ISO_8859-1//TRANSLIT this_file -o that_file
>>
>> "man iconv" for explanation of "//TRANSLIT".
>
>
> I've put in
>
> iconv -f UTF-8 -t ISO_8859-1//TRANSLIT//IGNORE this_file -o that_file
>
> and it seems to work fine (I'm converting the epgsnoop output) but that
> still doesn't explain to me why ISO_8859-1 has worked fine for a number of
> years in epgsnoop but now it has to change...
>
>
UTF-8 is betterer. :) No, seriously, see Stephen's comments and do some
googling. UTF-8 is a more robust and future-proof char set so you'll
come across it more often than ISO_8859-1. Why change now? It was time I
suppose.

_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


andrew at etc

Jul 7, 2012, 4:47 AM

Post #11 of 13 (1368 views)
Permalink
Re: XMLTV headers changed [In reply to]

On Sat, 2012-07-07 at 21:53 +1200, David Moore wrote:
>
> UTF-8 is betterer. :) No, seriously, see Stephen's comments and do some
> googling. UTF-8 is a more robust and future-proof char set so you'll
> come across it more often than ISO_8859-1. Why change now? It was time I
> suppose.

Why change now? The episode description for The Block NZ on next
Wednesday has a UTF-8 hyphen in it. I just had to fix a bug in
mythtv-status where I wasn't encoding & decoding UTF-8 correctly.

Cheers!

--
Andrew Ruthven
Wellington, New Zealand
At home: andrew [at] etc | linux.conf.au 2013
| Come join the party...
| http://linux.conf.au
Attachments: signature.asc (0.19 KB)


dmoo1790 at ihug

Jul 7, 2012, 6:57 PM

Post #12 of 13 (1355 views)
Permalink
Re: XMLTV headers changed [In reply to]

On 07/07/12 23:47, Andrew Ruthven wrote:
> On Sat, 2012-07-07 at 21:53 +1200, David Moore wrote:
>>
>> UTF-8 is betterer. :) No, seriously, see Stephen's comments and do some
>> googling. UTF-8 is a more robust and future-proof char set so you'll
>> come across it more often than ISO_8859-1. Why change now? It was time I
>> suppose.
>
> Why change now? The episode description for The Block NZ on next
> Wednesday has a UTF-8 hyphen in it. I just had to fix a bug in
> mythtv-status where I wasn't encoding& decoding UTF-8 correctly.
>
> Cheers!
>

Andrew, I'm not sure I fully grasp what you mean however, no criticism
intended, the MHEG EPG data has contained UTF-8 multi-byte chars since I
started working on it. Probably from the day freeview started but I
haven't read the freeview spec so I could be wrong.

_______________________________________________
mythtvnz mailing list
mythtvnz [at] lists
http://lists.ourshack.com/mailman/listinfo/mythtvnz
Archives http://www.gossamer-threads.com/lists/mythtv/mythtvnz/


andrew at etc

Jul 7, 2012, 8:52 PM

Post #13 of 13 (1352 views)
Permalink
Re: XMLTV headers changed [In reply to]

On Sun, 2012-07-08 at 13:57 +1200, David Moore wrote:
> Andrew, I'm not sure I fully grasp what you mean however, no criticism
> intended, the MHEG EPG data has contained UTF-8 multi-byte chars since I
> started working on it. Probably from the day freeview started but I
> haven't read the freeview spec so I could be wrong.

Oh? I stand corrected then. First time I'd noticed it locally.

--
Andrew Ruthven
Wellington, New Zealand
At home: andrew [at] etc | linux.conf.au 2013
| Come join the party...
| http://linux.conf.au
Attachments: signature.asc (0.19 KB)

MythTV mythtvnz RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.