View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0013187 | phpList 3 plugins | General | public | 31-01-08 17:24 | 06-02-19 11:35 |
Reporter | flohack | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | always |
Status | new | Resolution | open | ||
Summary | 0013187: RSS Import: getrss truncates links, then when checking for new items oversized items get imported again | ||||
Description | Using phplist and serendipity for feeding... When getrss retrieves the feed items, the SQL statement checks if title, link and listnumber are equal. It then imports the item in the database. The title field is only 100 characters long, which leads to truncation in the actual case where the title is very long or the rss feed component creates long links. the next scheduled import picks up the truncated title in the database and compares it to the original - the compare does not match and the item gets imported again, and again, and again... FIX: change table rssitem field link to varchar(254) or text | ||||
Additional Information | It was a very hard task to find this one! we examined the database and the code so many times. The titles are not truncated in the mails and rss db views, beacuase they are generated from rssitem_data, where every attribute is a text field. finally a link, count(link) select-statement on the rssitem table got me closer, but it was first revealed when I pressed on the "show me full columns" button in phpmyadmin. The truncation vs. the count of duplicates then proofed the case. | ||||
Tags | No tags attached. | ||||
|
as I found out today, also the field title is affected (yes we have news items with titles longer than 100 chars :)). Therefore it also should be changed to text or expanded to 255 chars... |
|
For an immediate solution, I patched getrss.php on line #102 from: Sql_Query(sprintf('select * from %s where title = "%s" and link = "%s"', $tables["rssitem"], addslashes($item["title"]), addslashes($item["link"]))); To: Sql_Query(sprintf('select * from %s where title = "%s" and link = "%s"', $tables["rssitem"], substr(addslashes($item["title"]),0,100), substr(addslashes($item["link"]),0,100))); Essentially, this just compares the first 100 characters of the title and link field, eliminating the problem. While increasing the database field size to 255 chars will fix the problem most of the time, it's possible that somebody will STILL have URLs / titles longer than 255 characters. Whatever the decision re length of DB field is, I propose that the above "substr" chop be done to prevent accidental RSS item duplication ;) See my full post here : http://www.funkypenguin.info/howto/how-use-phplists-rss-features-when-feed-urls-exceed-100-characters (and that SEO-focused URL is the reason WHY we need to address this ;) - David |
11-03-08 05:17
|
phplist_rss_long_urls_patch_0.diff (848 bytes)
--- phplist_orig/admin/getrss.php 2007-08-28 01:01:27.000000000 +1200 +++ phplist/admin/getrss.php 2008-03-11 12:23:16.000000000 +1300 @@ -99,7 +99,8 @@ else ProcessError($GLOBALS['I18N']->get('Process Killed by other process')); $itemcount++; - Sql_Query(sprintf('select * from %s where title = "%s" and link = "%s"', $tables["rssitem"], addslashes($item["title"]), addslashes($item["link"]))); + // Funky Penguin Mod - chop the title and the URL at 100 characters, since that's all the database will hold. SEO = long urls! + Sql_Query(sprintf('select * from %s where title = "%s" and link = "%s"', $tables["rssitem"], substr(addslashes($item["title"]),0,100), substr(addslashes($item["link"]),0,100))); if (!Sql_Affected_Rows()) { $newitemcount++; Sql_Query(sprintf('insert into %s (title,link,source,list,added) |
|
David, thanks so far, just a remark, I think that the chance that URLs that are longer than 100 chars are unique on the very first 100 is rather high, so I would prefer an alternative to your patch, although this one will fix most of the user who experience issues with that. FLoh |
|
Hi FLoh, I'm not so sure, I can see it happening with certain URL conventions, where the "uniqueness" is at the end of the URL, for example: http://my-favourite-tv-shows.com/episode-guide/comedy-and-drama/friends/season-three/the-one-with-the-extraordinarily-long-urls/page1.htm http://my-favourite-tv-shows.com/episode-guide/comedy-and-drama/friends/season-three/the-one-with-the-extraordinarily-long-urls/page2.htm etc... |
|
Hi David, what did you mean with not so sure, your example is very unique on the first 100 and therefore will suffer from the truncation and only the first entry will be imported.... that´s what I meant, 100 chars may not be enough fo form a unique key, better extend the DB fields *and* truncate to 255 or so... FLoh |
|
I may be wrong, but wouldn't you use the values 0,99 in the substr function to get the 1st 100 characters? |
|
' Surely the query should be using two values minimum - such as the URL and the publish date/time - that would increase the uniqueness exponentially. . |
|
Same problem here. Also seem to have problem with non-standard characters in the title - maybe unicode characters are cut off improperly? (cannot reproduce it at the moment) Wouldn't it be better to compare via a checksum (md5) from the full url that is saved in a separate field? So you needn't bother about the length. |
|
@chrisa: I've the same problem with non standard characters, I've solved it changing line #108 from $tables["rssitem"],addslashes($item["title"]),addslashes($item['link']),addslashes($feed[0]),$feed[1])); to $tables["rssitem"],addslashes(substr($item["title"],0,100)),addslashes(substr($item["link"],0,100)), addslashes($feed[0]),$feed[1])); |
Date Modified | Username | Field | Change |
---|---|---|---|
31-01-08 17:24 | flohack | New Issue | |
03-02-08 18:07 | flohack | Note Added: 0040340 | |
11-03-08 05:16 | funkypenguin | Note Added: 0042398 | |
11-03-08 05:17 | funkypenguin | File Added: phplist_rss_long_urls_patch_0.diff | |
17-03-08 16:37 |
|
Target Version | => 2.10.6 |
17-03-08 16:45 | flohack | Note Added: 0042978 | |
17-03-08 19:24 | funkypenguin | Note Added: 0042987 | |
09-04-08 11:29 | flohack | Note Added: 0044443 | |
10-07-08 20:52 | kerberos | Note Added: 0050096 | |
20-04-10 04:26 | h2b2 | Target Version | 2.10.7 => 2.11.4 |
02-10-10 18:18 | gazouteast | Note Added: 0051107 | |
23-05-11 15:57 | chrisa | Note Added: 0051359 | |
05-09-11 09:29 | apug | Note Added: 0051399 | |
23-05-12 02:00 | michiel | Project | phpList 3 application => rssmanager |
23-05-12 02:00 | michiel | Category | RSS => General |
06-02-19 11:35 |
|
Project | rssmanager => phpList 3 plugins |